From WikiChip
Difference between revisions of "intel/microarchitectures/broadwell (client)"
(→Architecture) |
|||
Line 108: | Line 108: | ||
== Architecture== | == Architecture== | ||
− | Broadwell is for the most part identical to {{\\|Haswell}} with several enhancements. | + | Broadwell is for the most part identical to {{\\|Haswell}} with several enhancements, including new instruction set extensions. |
=== Key changes from {{\\|Haswell}} === | === Key changes from {{\\|Haswell}} === |
Revision as of 12:23, 17 April 2016
Edit Values | |
Broadwell µarch | |
General Info |
Broadwell (BDW) is Intel's microarchitecture based on the 14 nm process for mobile, desktops, and servers. Introduced in early 2015, Broadwell is a process shrink of Haswell which introduced several enhancements. Broadwell is named after Broadwell, Illinois.
Contents
Codenames
Core | Abbrev | Target |
---|---|---|
Broadwell Y | BDW-Y | Core M family, SoC for Smartphones, 2-in-1s Tablets, and notebooks |
Broadwell U | BDW-U | Core ultrabooks |
Broadwell H | BDW-H | IoT (QM87, HM86/HM87 Chipsets), All-in-ones |
Broadwell DT | BDW-DT | Unlocked desktop MPUs |
Broadwell EP | BDW-EP | Xeon E5, Dual-Processor platform |
Broadwell EX | BDW-EX | Xeon E5, Multi-Processor platform, QPI |
Broadwell E | BDW-E | High-End Desktops (HEDT) |
Architecture
Broadwell is for the most part identical to Haswell with several enhancements, including new instruction set extensions.
Key changes from Haswell
- ~5% IPC improvement
- FP multiplication instructions has reduced latency (3 cycles, down from 5)
- Affects AVX, SSE, and FP instructions
- CLMUL instructions are now a single μop, improving latency and throughput
- The second-level TLB (STLB)
- Table was enlarged (1,536 entries, up from 1024)
- 1GB page mode (16 entries, 4-ways set associative)
- Larger out-of-order scheduler
- Faster store-to-load forwarding
- Address prediction for branches and returns was improved
- Improved cryptography acceleration instructions
Core features maintained a 2:1 ratio of performance:power.
Graphics
- 50% higher sampler throughput
- Improvements for increased geometry, Z, Pixel Fill
- Direct X 11.2, OpenGL 4.3
- OpenCL 1.2 and 2.0 (with Shared Virtual Memory)
- Up to 24 EUs (20% addition, up from 20 in Haswell), 48 EUs on Iris Pro Graphics
Block Diagram
Memory Hierarchy
- Cache
- Hardware prefetchers
- L1 Cache:
- 32 KB 8-way set associative instruction, 64 B line size
- 32 KB 8-way set associative data, 64 B line size
- Write-back policy
- Per core
- L2 Cache:
- 256 KB 8-way set associative, 64 B line size
- Write-back policy
- Per core
- L3 Cache:
- 1.5 MB
- L4 Cache:
- 128 MB
- eDRAM
- shared with GPU (Crystal Well)
- Iris Pro models only