From WikiChip
Difference between revisions of "intel/microarchitectures/broadwell (client)"
Line 1: | Line 1: | ||
{{intel title|Broadwell|arch}} | {{intel title|Broadwell|arch}} | ||
{{microarchitecture | {{microarchitecture | ||
− | | name | + | | name = Broadwell |
− | | manufacturer | + | | manufacturer = Intel |
− | | introduction | + | | introduction = October, 2014 |
− | | phase-out | + | | phase-out = |
− | | process | + | | process = 14 nm |
− | | cores | + | | cores = 2 |
− | | cores 2 | + | | cores 2 = 4 |
− | | cores 3 | + | | cores 3 = 6 |
− | | cores 4 | + | | cores 4 = 8 |
− | | cores 5 | + | | cores 5 = 16 |
− | | cores 6 | + | | cores 6 = 32 |
− | | pipeline | + | | pipeline = Yes |
− | | type | + | | type = Superscalar |
− | | OoOE | + | | OoOE = Yes |
− | | speculative | + | | speculative = Yes |
− | | renaming | + | | renaming = Yes |
− | | isa | + | | isa = IA-32 |
− | | isa 2 | + | | isa 2 = x86-64 |
− | | stages min | + | | stages min = 14 |
− | | stages max | + | | stages max = 19 |
− | | issues | + | | issues = 4 |
− | | inst | + | | inst = Yes |
− | | feature | + | | feature = |
− | | extension | + | | extension = MOVBE |
− | | extension 2 | + | | extension 2 = MMX |
− | | extension 3 | + | | extension 3 = SSE |
− | | extension 4 | + | | extension 4 = SSE2 |
− | | extension 5 | + | | extension 5 = SSE3 |
− | | extension 6 | + | | extension 6 = SSSE3 |
− | | extension 7 | + | | extension 7 = SSE4.1 |
− | | extension 8 | + | | extension 8 = SSE4.2 |
− | | extension 9 | + | | extension 9 = POPCNT |
− | | extension 10 | + | | extension 10 = AVX |
− | | extension 11 | + | | extension 11 = AVX2 |
− | | extension 12 | + | | extension 12 = AES |
− | | extension 13 | + | | extension 13 = PCLMUL |
− | | extension 14 | + | | extension 14 = FSGSBASE |
− | | extension 15 | + | | extension 15 = RDRND |
− | | extension 16 | + | | extension 16 = FMA |
− | | extension 17 | + | | extension 17 = BMI |
− | | extension 18 | + | | extension 18 = BMI2 |
− | | extension 19 | + | | extension 19 = F16C |
− | | extension 20 | + | | extension 20 = RDSEED |
− | | extension 21 | + | | extension 21 = ADCX |
− | | extension 22 | + | | extension 22 = PREFETCHW |
− | | cache | + | | cache = Yes |
− | | l1i | + | | l1i = 32 KB |
− | | l1i per | + | | l1i per = core |
− | | l1i desc | + | | l1i desc = 8-way set associative |
− | | l1d | + | | l1d = 32 KB |
− | | l1d per | + | | l1d per = core |
− | | l1d desc | + | | l1d desc = 8-way set associative |
− | | l2 | + | | l2 = 256 KB |
− | | l2 per | + | | l2 per = core |
− | | l2 desc | + | | l2 desc = 8-way set associative |
− | | l3 | + | | l3 = 1.5 MB |
− | | l3 per | + | | l3 per = core |
− | | l3 desc | + | | l3 desc = |
− | | l4 | + | | l4 = 128 MB |
− | | l4 per | + | | l4 per = package |
− | | l4 desc | + | | l4 desc = on Iris Pro GPUs only |
| core names = Yes | | core names = Yes |
Revision as of 17:48, 13 April 2016
Edit Values | |
Broadwell µarch | |
General Info |
Broadwell (BDW) is Intel's microarchitecture based on the 14 nm process for mobile, desktops, and servers. Introduced in early 2015, Broadwell is a process shrink of Haswell which introduced several enhancements.
Contents
Codenames
Core | Target |
---|---|
Broadwell Y (BDW-Y) | Core M family, SoC for Smartphones, 2-in-1s Tablets, and notebooks |
Broadwell U (BDW-U) | Core ultrabooks |
Broadwell H (BDW-H) | IoT (QM87, HM86/HM87 Chipsets), All-in-ones |
Broadwell DT (BDW-DT) | Unlocked desktop MPUs |
Broadwell EP (BDW-EP) | Xeon E5, Dual-Processor platform |
Broadwell EX (BDW-EX) | Xeon E5, Multi-Processor platform, QPI |
Broadwell E (BDW-E) | High-End Desktops (HEDT) |
Architecture
Broadwell is for the most part identical to Haswell with server enhancements.
Key changes from Haswell
- ~5% IPC improvement
- FP multiplication instructions has reduced latency (3 cycles, down from 5)
- Affects AVX, SSE, and FP instructions
- CLMUL instructions are now a single μop, improving latency and throughput
- The second-level TLB (STLB)
- Table was enlarged (1,536 entries, up from 1024)
- 1GB page mode (16 entries, 4-ways set associative)
- Larger out-of-order scheduler
- Faster store-to-load forwarding
- Address prediction for branches and returns was improved
- Improved cryptography acceleration instructions
Core features maintained a 2:1 ratio of performance:power.
Graphics
- 50% higher sampler throughput
- Improvements for increased geometry, Z, Pixel Fill
- Direct X 11.2, OpenGL 4.3
- OpenCL 1.2 and 2.0 (with Shared Virtual Memory)
- Up to 24 EUs (20% addition, up from 20 in Haswell), 48 EUs on Iris Pro Graphics
Block Diagram
Memory Hierarchy
- Cache
- Hardware prefetchers
- L1 Cache:
- 32 KB 8-way set associative instruction, 64 B line size
- 32 KB 8-way set associative data, 64 B line size
- Write-back policy
- Per core
- L2 Cache:
- 256 KB 8-way set associative, 64 B line size
- Write-back policy
- Per core
- L3 Cache:
- 1.5 MB
- L4 Cache:
- 128 MB
- eDRAM
- shared with GPU (Crystal Well)
- Iris Pro models only