From WikiChip
Difference between revisions of "intel/microarchitectures/broadwell (client)"
< intel‎ | microarchitectures

Line 124: Line 124:
 
* OpenCL 1.2 and 2.0 (with Shared Virtual Memory)
 
* OpenCL 1.2 and 2.0 (with Shared Virtual Memory)
 
* Up to 24 EUs (20% addition, up from 20 in {{\\|Haswell}}), 48 EUs on {{intel|Iris Pro Graphics}}
 
* Up to 24 EUs (20% addition, up from 20 in {{\\|Haswell}}), 48 EUs on {{intel|Iris Pro Graphics}}
 +
 +
=== Block Diagram ===
 +
[[File:haswell block diagram.svg]]
 +
 +
=== Memory Hierarchy ===
 +
* Cache
 +
** Hardware prefetchers
 +
** L1 Cache:
 +
*** 32 KB 8-way [[set associative]] instruction, 64 B line size
 +
*** 32 KB 8-way set associative data, 64 B line size
 +
*** Write-back policy
 +
*** Per core
 +
** L2 Cache:
 +
*** 256 KB 8-way set associative, 64 B line size
 +
*** Write-back policy
 +
*** Per core
 +
** L3 Cache:
 +
*** 1.5 MB
 +
** L4 Cache:
 +
*** 128 MB
 +
*** [[eDRAM]]
 +
*** shared with GPU ({{intel|Crystal Well}})
 +
*** {{intel|Iris Pro}} models only

Revision as of 01:16, 13 April 2016

Edit Values
Broadwell µarch
General Info
ERROR: "atype" is missing!

Broadwell (BDW) is Intel's microarchitecture based on the 14 nm process for mobile, desktops, and servers. Introduced in early 2015, Broadwell is a process shrink of Haswell which introduced several enhancements.

Codenames

Core Target
Broadwell Y (BDW-Y) Core M family, SoC for Smartphones, 2-in-1s Tablets, and notebooks
Broadwell U (BDW-U) Core ultrabooks
Broadwell H (BDW-H) IoT (QM87, HM86/HM87 Chipsets), All-in-ones
Broadwell DT (BDW-DT) Unlocked desktop MPUs
Broadwell EP (BDW-EP) Xeon E5, Dual-Processor platform
Broadwell EX (BDW-EX) Xeon E5, Multi-Processor platform, QPI
Broadwell E (BDW-E) High-End Desktops (HEDT)

Architecture

Broadwell is for the most part identical to Haswell with server enhancements.

Key changes from Haswell

  • ~5% IPC improvement
  • FP multiplication instructions has reduced latency (3 cycles, down from 5)
    • Affects AVX, SSE, and FP instructions
  • CLMUL instructions are now a single μop, improving latency and throughput
  • The second-level TLB (STLB)
    • Table was enlarged (1,536 entries, up from 1024)
    • 1GB page mode (16 entries, 4-ways set associative)
  • Larger out-of-order scheduler
  • Faster store-to-load forwarding
  • Address prediction for branches and returns was improved
  • Improved cryptography acceleration instructions

Core features maintained a 2:1 ratio of performance:power.

Graphics

  • 50% higher sampler throughput
  • Improvements for increased geometry, Z, Pixel Fill
  • Direct X 11.2, OpenGL 4.3
  • OpenCL 1.2 and 2.0 (with Shared Virtual Memory)
  • Up to 24 EUs (20% addition, up from 20 in Haswell), 48 EUs on Iris Pro Graphics

Block Diagram

haswell block diagram.svg

Memory Hierarchy

  • Cache
    • Hardware prefetchers
    • L1 Cache:
      • 32 KB 8-way set associative instruction, 64 B line size
      • 32 KB 8-way set associative data, 64 B line size
      • Write-back policy
      • Per core
    • L2 Cache:
      • 256 KB 8-way set associative, 64 B line size
      • Write-back policy
      • Per core
    • L3 Cache:
      • 1.5 MB
    • L4 Cache: