From WikiChip
Difference between revisions of "intel/microarchitectures/broadwell (client)"
< intel‎ | microarchitectures

(Architecture)
Line 108: Line 108:
  
 
== Architecture==
 
== Architecture==
Broadwell is for the most part identical to {{\\|Haswell}} with several enhancements.
+
Broadwell is for the most part identical to {{\\|Haswell}} with several enhancements, including new instruction set extensions.
  
 
=== Key changes from {{\\|Haswell}} ===
 
=== Key changes from {{\\|Haswell}} ===

Revision as of 12:23, 17 April 2016

Edit Values
Broadwell µarch
General Info
ERROR: "atype" is missing!

Broadwell (BDW) is Intel's microarchitecture based on the 14 nm process for mobile, desktops, and servers. Introduced in early 2015, Broadwell is a process shrink of Haswell which introduced several enhancements. Broadwell is named after Broadwell, Illinois.

Codenames

Core Abbrev Target
Broadwell Y BDW-Y Core M family, SoC for Smartphones, 2-in-1s Tablets, and notebooks
Broadwell U BDW-U Core ultrabooks
Broadwell H BDW-H IoT (QM87, HM86/HM87 Chipsets), All-in-ones
Broadwell DT BDW-DT Unlocked desktop MPUs
Broadwell EP BDW-EP Xeon E5, Dual-Processor platform
Broadwell EX BDW-EX Xeon E5, Multi-Processor platform, QPI
Broadwell E BDW-E High-End Desktops (HEDT)

Architecture

Broadwell is for the most part identical to Haswell with several enhancements, including new instruction set extensions.

Key changes from Haswell

  • ~5% IPC improvement
  • FP multiplication instructions has reduced latency (3 cycles, down from 5)
    • Affects AVX, SSE, and FP instructions
  • CLMUL instructions are now a single μop, improving latency and throughput
  • The second-level TLB (STLB)
    • Table was enlarged (1,536 entries, up from 1024)
    • 1GB page mode (16 entries, 4-ways set associative)
  • Larger out-of-order scheduler
  • Faster store-to-load forwarding
  • Address prediction for branches and returns was improved
  • Improved cryptography acceleration instructions

Core features maintained a 2:1 ratio of performance:power.

Graphics

  • 50% higher sampler throughput
  • Improvements for increased geometry, Z, Pixel Fill
  • Direct X 11.2, OpenGL 4.3
  • OpenCL 1.2 and 2.0 (with Shared Virtual Memory)
  • Up to 24 EUs (20% addition, up from 20 in Haswell), 48 EUs on Iris Pro Graphics

Block Diagram

haswell block diagram.svg

Memory Hierarchy

  • Cache
    • Hardware prefetchers
    • L1 Cache:
      • 32 KB 8-way set associative instruction, 64 B line size
      • 32 KB 8-way set associative data, 64 B line size
      • Write-back policy
      • Per core
    • L2 Cache:
      • 256 KB 8-way set associative, 64 B line size
      • Write-back policy
      • Per core
    • L3 Cache:
      • 1.5 MB
    • L4 Cache: