From WikiChip
Difference between revisions of "intel/microarchitectures/broadwell (client)"
< intel‎ | microarchitectures

(Blanked the page)
Line 1: Line 1:
{{intel title|Broadwell|arch}}
 
{{microarchitecture
 
|atype=CPU
 
|name=Broadwell
 
|designer=Intel
 
|manufacturer=Intel
 
|introduction=October, 2014
 
|process=14 nm
 
|cores=2
 
|cores 2=4
 
|cores 3=6
 
|cores 4=8
 
|cores 5=10
 
|cores 6=12
 
|cores 7=14
 
|cores 8=16
 
|cores 9=18
 
|cores 10=20
 
|cores 11=22
 
|type=Superscalar
 
|speculative=Yes
 
|renaming=Yes
 
|stages min=14
 
|stages max=19
 
|isa=IA-32
 
|isa 2=x86-64
 
|extension=MOVBE
 
|extension 2=MMX
 
|extension 3=SSE
 
|extension 4=SSE2
 
|extension 5=SSE3
 
|extension 6=SSSE3
 
|extension 7=SSE4.1
 
|extension 8=SSE4.2
 
|extension 9=POPCNT
 
|extension 10=AVX
 
|extension 11=AVX2
 
|extension 12=AES
 
|extension 13=PCLMUL
 
|extension 14=FSGSBASE
 
|extension 15=RDRND
 
|extension 16=FMA3
 
|extension 17=F16C
 
|extension 18=BMI
 
|extension 19=BMI2
 
|extension 20=VT-x
 
|extension 21=VT-d
 
|extension 22=TXT
 
|extension 23=TSX
 
|extension 24=RDSEED
 
|extension 25=ADCX
 
|extension 26=PREFETCHW
 
|l1i=32 KiB
 
|l1i per=core
 
|l1i desc=8-way set associative
 
|l1d=32 KiB
 
|l1d per=core
 
|l1d desc=8-way set associative
 
|l2=256 KiB
 
|l2 per=core
 
|l2 desc=8-way set associative
 
|l3=1.5 MiB
 
|l3 per=core
 
|l4=128 MiB
 
|l4 per=package
 
|l4 desc=on Iris Pro GPUs only
 
|core name=Broadwell Y
 
|core name 2=Broadwell U
 
|core name 3=Broadwell H
 
|core name 4=Broadwell DT
 
|core name 5=Broadwell EP
 
|core name 6=Broadwell EX
 
|core name 7=Broadwell E
 
|predecessor=Haswell
 
|predecessor link=intel/microarchitectures/haswell
 
|successor=Skylake (client)
 
|successor link=intel/microarchitectures/skylake (client)
 
|successor 2=Skylake (server)
 
|successor 2 link=intel/microarchitectures/skylake (server)
 
|pipeline=Yes
 
|OoOE=Yes
 
|issues=4
 
|inst=Yes
 
|cache=Yes
 
|core names=Yes
 
|succession=Yes
 
}}
 
'''Broadwell''' ('''BDW''') is [[Intel]]'s  [[microarchitecture]] based on the [[14 nm process]] for mobile, desktops, and servers. Introduced in early 2015, Broadwell is a [[process shrink]] of {{\\|Haswell}} which introduced several enhancements. Broadwell is named after [[wikipedia:Broadwell, Illinois|Broadwell, Illinois]].
 
  
For desktop and mobile, Broadwell is branded as 5th Generation Intel {{intel|Core}} processors. For server class processors, Intel branded it as {{intel|Xeon E3|Xeon E3 v4}}, {{intel|Xeon E5|Xeon E5 v4}}, and {{intel|Xeon E7|Xeon E7 v4}}.
 
== Codenames ==
 
{| class="wikitable"
 
|-
 
! Core !! Abbrev !! Target
 
|-
 
| Broadwell Y || BDW-Y || {{intel|Core M|Core M family}}, SoC for Smartphones, 2-in-1s Tablets, and notebooks
 
|-
 
| Broadwell U || BDW-U || {{intel|Core}} ultrabooks
 
|-
 
| Broadwell H || BDW-H || IoT (QM87, HM86/HM87 Chipsets), All-in-ones
 
|-
 
| Broadwell DT || BDW-DT || Unlocked desktop MPUs
 
|-
 
| Broadwell EP || BDW-EP || {{intel|Xeon E5}}, Dual-Processor platform
 
|-
 
| Broadwell EX || BDW-EX || {{intel|Xeon E7}}, Multi-Processor platform, QPI
 
|-
 
| Broadwell E || BDW-E || High-End Desktops (HEDT)
 
|}
 
 
== Process Technology ==
 
{| class="wikitable" style="float: right;"
 
! colspan="2" | 14 nm Manufacturing Fabs
 
|-
 
! Fab !! Location
 
|-
 
| D1X || Hillsboro, Oregon
 
|-
 
| D1D || Hillsboro, Oregon
 
|-
 
| D1C || Hillsboro, Oregon
 
|-
 
| Fab 32 || Chandler, Arizona
 
|-
 
| Fab 24 || Leixlip, Ireland
 
|}
 
Broadwell is designed to be manufactured using [[14 nm]] Tri-gate [[FinFET]] transistors. This correlates to 8 nm Fin width and a 42 nm Fin pitch (shown below). SRAM cell is at 0.0706 µm² and 0.0499 µm² for high performance and high density.
 
 
 
[[Scaling]]:
 
 
{| class="wikitable"
 
|-
 
! !! Haswell !! Broadwell !! Δ !! rowspan="8" | [[File:intel 14nm fin.png|250px]]
 
|-
 
| || [[22 nm]] || [[14 nm]] ||
 
|-
 
| Fin Pitch || 60 nm || 42 || 0.70x
 
|-
 
| Fin Width​ || 8 nm || 8 nm || 1x
 
|-
 
| Fin Height​ || 34 nm || 42 nm || 1.24x
 
|-
 
| Gate Pitch || 90 nm || 70 nm || 0.78x
 
|-
 
| Interconnect Pitch || 80 nm || 52 nm || 0.65x
 
|-
 
| Cell Height || 840 nm || 399 nm || 0.48x
 
|}
 
 
== Architecture==
 
Broadwell is for the most part identical to {{\\|Haswell}} with several enhancements, including new instruction set extensions.
 
 
=== Key changes from {{\\|Haswell}} ===
 
[[File:broadwell buffer window.png|right|350px]]
 
* ~5% IPC improvement
 
* FP multiplication instructions has reduced latency (3 cycles, down from 5)
 
** Affects AVX, SSE, and FP instructions
 
* {{x86|CLMUL}} instructions are now a single [[μop]], improving latency and throughput
 
* The second-level TLB (STLB)
 
** Table was enlarged (1,536 entries, up from 1024)
 
** 1GB page mode (16 entries, 4-ways set associative)
 
* Execution Engine
 
** Larger scheduler (64 entries, up from 60)
 
** Larger instruction queue (25 entries/thread, up from 20)
 
** Faster store-to-load forwarding
 
** Address prediction for branches and returns was improved
 
** Improved cryptography acceleration instructions
 
 
Core features maintained a 2:1 ratio of performance:power.
 
 
==== Graphics ====
 
* 50% higher sampler throughput
 
* Improvements for increased geometry, Z, Pixel Fill
 
* Direct X 11.2, OpenGL 4.3
 
* OpenCL 1.2 and 2.0 (with Shared Virtual Memory)
 
* Up to 24 EUs (20% addition, up from 20 in {{\\|Haswell}}), 48 EUs on {{intel|Iris Pro Graphics}}
 
 
==== New instructions ====
 
{{main|#Added instructions|l1=See #Added_instructions for the complete list}}
 
Broadwell introduced a number of new instructions:
 
* {{x86|RDSEED|<code>RDSEED</code>}} - Generates 16, 32 or 64 bit random numbers seeds ([[NIST SP 800-90B]] & [[NIST SP 800-90C]])
 
* {{x86|ADCX|<code>ADCX</code>}} - Arbitrary precision integer operations
 
* {{x86|PREFETCHW|<code>PREFETCHW</code>}} - Prefetch data into caches, hinting a write is expected in the future
 
 
=== Block Diagram ===
 
[[File:broadwell block diagram.svg]]
 
 
=== Memory Hierarchy ===
 
[[File:Intel-Xeon-processor-D-1500-wafer.jpg|right|thumb|350px|Broadwell {{intel|Xeon D}} wafer]]
 
* Cache
 
** L1 Cache:
 
*** 32 KiB 8-way [[set associative]] instruction, 64 B line size
 
*** 32 KiB 8-way set associative data, 64 B line size
 
*** Write-back policy
 
*** Per core
 
** L2 Cache:
 
*** 256 KiB 8-way set associative, 64 B line size
 
*** Write-back policy
 
*** Per core
 
** L3 Cache:
 
*** 1.5 - 3 MiB per core, 64 B line size
 
*** 16-20 -way set associative
 
*** Write-back policy
 
** L4 Cache:
 
*** 128 MiB
 
*** [[eDRAM]]
 
*** shared with GPU ({{intel|Crystal Well}})
 
*** {{intel|Iris Pro}} models only
 
 
Broadwell TLB consists of dedicated level one TLB for instruction cache and another one for data cache. Additionally there is a unified second level TLB.
 
* TLBs:
 
** ITLB
 
*** 4 KiB page translations:
 
**** 128 entries; 4-way set associative
 
**** dynamic partition; divided between the two threads
 
*** 2 MiB / 4MiB page translations:
 
**** 8 entries; fully associative
 
**** Duplicated for each thread
 
** DTLB
 
*** 4 KiB page translations:
 
**** 64 entries; 4-way set associative
 
**** fixed partition; divided between the two threads
 
*** 2 MiB / 4 MiB page translations:
 
**** 32 entries; 4-way set associative
 
*** 1 GiB page translations:
 
**** 4 entries; 4-way set associative
 
** STLB
 
*** 4 KiB + 2 MiB page translations:
 
**** 1536 entries; 6-way set associative
 
**** shared
 
*** 1 GiB page translations:
 
**** 16 entries; 4-way set associative
 
 
=== Pipeline ===
 
{{main|intel/microarchitectures/haswell#Pipeline|l1=Haswell's Pipeline}}
 
Broadwell's pipeline is identical to Haswell.
 
 
== High Core count (EP) ==
 
* Key Changes from {{\\|Haswell}}:
 
** Up to 22 cores (up from 18)
 
** Up to 44 threads (up from 36)
 
** Up to 55 MiB [[last level cache|LLC]] (up from 45 MiB)
 
** Up to 2400 DDR (from 2133)
 
 
{{expand section}}
 
 
=== Snoop Modes ===
 
Broadwell EP has four snoop modes: Home Snoop (HS), Early Snoop (ES), Cluster-on-Die (COD) and Home Snoop with Directory and Opportunistic Snoop Broadcast (HS with DIR + OSB).
 
 
{| class="wikitable tc2 tc3 tc4 tc5"
 
|-
 
! Performance Metric !! HS \w DIR+OSB !! COD !! Home Snoop !! Early Snoop
 
|-
 
! colspan="5" | System configured as [[NUMA]]
 
|-
 
| LCC Hit Latency || Low || Lowest || Low || Low
 
|-
 
| Local Memory Latency || Low || Lowest || High<sup>1</sup> || Medium<sup>1</sup>
 
|-
 
| Remote Memory Latency || Low || Low-High<sup>1</sup> || Low || Lowest
 
|-
 
| Local Memory Bandwidth || High || High || High || Low
 
|-
 
| Remote Memory Bandwidth || High || Medium || High || Medium
 
|-
 
! colspan="5" | System configured as [[UMA]]
 
|-
 
| Memory Latency || Low || rowspan="2" | Not an advised configuration || Low || Lowest
 
|-
 
| Memory Bandwidth || High || High || Medium
 
|}
 
 
<sup>1</sup> - Performance depends on the directory state. Expect low latency with a clean directory and high latency with a dirty directory.
 
 
=== Die Stats ===
 
{| class="wikitable" style="text-align: center;"
 
|-
 
! colspan="3" style="background:#D6D6FF;" | Layout
 
|-
 
! Low Core Count (LCC) !! Medium Core Count (MCC) !! High Core Count (HCC)
 
|-
 
| Up to 10 Cores || 12-14 Cores || 16+ Cores
 
|-
 
| 246.24 mm² || 306.18 mm² || 456.12 mm²
 
|-
 
| ~3,200,000,000 Transistors || ~4,700,000,000 Transistors || ~7,200,000,000 Transistors
 
|-
 
|[[File:E5 v4 LCC.png|300px]] || [[File:E5 v4 MCC.png|300px]] || [[File:E5 v4 HCC.png|300px]]
 
|}
 
 
== Die ==
 
===Dual-core Broadwell die===
 
 
* [[14 nm process]]
 
* 13 metal layers
 
* 1,300,000,000 transistors
 
* 82 mm<sup>2</sup> die size
 
* [[2 cores]]
 
 
: [[File:broadwell die (dual-core).jpg|850px]]
 
 
 
===Dual-core Broadwell with {{intel|Iris Pro}} die===
 
 
* [[14 nm process]]
 
* 13 metal layers
 
* 1,900,000,000 transistors
 
* 133 mm<sup>2</sup> die size
 
* [[2 cores]]
 
 
: [[File:broadwell with iris pro die (dual-core).png|850px]]
 
 
 
===Quad-core Broadwell with {{intel|Iris Pro}} die===
 
 
Die shot of the {{intel|Core i7-5775C}} microprocessor.
 
 
* [[14 nm process]]
 
* 13 metal layers
 
* ? transistors
 
* ? mm<sup>2</sup> die size
 
* [[4 cores]]
 
 
: [[File:broadwell core i7-5775C die.jpg|650px]]
 
 
 
===Deca-core Broadwell ===
 
 
Die shot of the {{intel|Core i7-6950X}} microprocessor.
 
 
* [[14 nm process]]
 
* ? metal layers
 
* 3,400,000,000 transistors
 
* 246 mm<sup>2</sup> die size
 
* [[10 cores]]
 
 
:[[File:broadwell (deca-core) die shot.png|650px]]
 
 
:[[File:broadwell (deca-core) die shot (annotated).png|650px]]
 
 
== Added instructions ==
 
'''{{x86|RDSEED}}''' - Generates 16, 32 or 64 bit random numbers seeds (both [[NIST SP 800-90B]] and [[NIST SP 800-90C]] compliant)
 
 
{{collist
 
| count = 1
 
| width = 150px
 
|
 
* {{x86|RDSEED}}
 
}}
 
 
'''{{x86|ADCX}}''' - Arbitrary precision integer operations
 
 
{{collist
 
| count = 1
 
| width = 150px
 
|
 
* {{x86|ADCX}}
 
* {{x86|ADOX}}
 
}}
 
 
'''{{x86|PREFETCHW}}''' - Prefetch data into caches, hinting a write is expected in the future.
 
 
{{collist
 
| count = 1
 
| width = 150px
 
|
 
* {{x86|PREFETCHW}}
 
}}
 
 
== Cores ==
 
{{empty section}}
 
 
== All Broadwell Chips ==
 
<!-- NOTE:
 
          This table is generated automatically from the data in the actual articles.
 
          If a microprocessor is missing from the list, an appropriate article for it needs to be
 
          created and tagged accordingly.
 
 
          Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 
-->
 
{{comp table start}}
 
<table class="comptable sortable tc6 tc7 tc20 tc21 tc22 tc23 tc24 tc25">
 
<tr class="comptable-header"><th>&nbsp;</th><th colspan="19">List of Broadwell Processors</th></tr>
 
<tr class="comptable-header"><th>&nbsp;</th><th colspan="9">Main processor</th><th colspan="5">{{intel|Turbo Boost}}</th><th>Mem</th><th colspan="3">IGP</th></tr>
 
{{comp table header 1|cols=Launched, Price, Family, Core Name, Cores, Threads, %L2$, %L3$, TDP, %Frequency, 1 Core, 2 Cores, 3 Cores, 4 Cores, Max Mem, GPU, %Frequency, Turbo}}
 
<tr class="comptable-header comptable-header-sep"><th>&nbsp;</th><th colspan="20">[[Uniprocessors]]</th></tr>
 
{{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Broadwell]] [[max cpu count::1]]
 
|?full page name
 
|?model number
 
|?first launched
 
|?release price
 
|?microprocessor family
 
|?core name
 
|?core count
 
|?thread count
 
|?l2$ size
 
|?l3$ size
 
|?tdp
 
|?base frequency#GHz
 
|?turbo frequency (1 core)#GHz
 
|?turbo frequency (2 cores)#GHz
 
|?turbo frequency (3 cores)#GHz
 
|?turbo frequency (4 cores)#GHz
 
|?max memory#GiB
 
|?integrated gpu
 
|?integrated gpu base frequency
 
|?integrated gpu max frequency
 
|format=template
 
|template=proc table 3
 
|searchlabel=
 
|sort=microprocessor family, model number
 
|order=asc,asc
 
|userparam=20
 
|mainlabel=-
 
|limit=200
 
}}
 
<tr class="comptable-header comptable-header-sep"><th>&nbsp;</th><th colspan="20">[[Multiprocessors]] (2-way)</th></tr>
 
{{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Broadwell]] [[max cpu count::2]]
 
|?full page name
 
|?model number
 
|?first launched
 
|?release price
 
|?microprocessor family
 
|?core name
 
|?core count
 
|?thread count
 
|?l2$ size
 
|?l3$ size
 
|?tdp
 
|?base frequency#GHz
 
|?turbo frequency (1 core)#GHz
 
|?turbo frequency (2 cores)#GHz
 
|?turbo frequency (3 cores)#GHz
 
|?turbo frequency (4 cores)#GHz
 
|?max memory#GiB
 
|?integrated gpu
 
|?integrated gpu base frequency
 
|?integrated gpu max frequency
 
|format=template
 
|template=proc table 3
 
|searchlabel=
 
|sort=microprocessor family, model number
 
|order=asc,asc
 
|userparam=20
 
|mainlabel=-
 
|limit=200
 
}}
 
<tr class="comptable-header comptable-header-sep"><th>&nbsp;</th><th colspan="20">[[Multiprocessors]] (4-way)</th></tr>
 
{{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Broadwell]] [[max cpu count::4]]
 
|?full page name
 
|?model number
 
|?first launched
 
|?release price
 
|?microprocessor family
 
|?core name
 
|?core count
 
|?thread count
 
|?l2$ size
 
|?l3$ size
 
|?tdp
 
|?base frequency#GHz
 
|?turbo frequency (1 core)#GHz
 
|?turbo frequency (2 cores)#GHz
 
|?turbo frequency (3 cores)#GHz
 
|?turbo frequency (4 cores)#GHz
 
|?max memory#GiB
 
|?integrated gpu
 
|?integrated gpu base frequency
 
|?integrated gpu max frequency
 
|format=template
 
|template=proc table 3
 
|searchlabel=
 
|sort=microprocessor family, model number
 
|order=asc,asc
 
|userparam=20
 
|mainlabel=-
 
|limit=200
 
}}
 
{{comp table count|ask=[[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Broadwell]]}}
 
</table>
 
{{comp table end}}
 

Revision as of 02:05, 26 October 2017

codenameBroadwell +
core count2 +, 4 +, 6 +, 8 +, 10 +, 12 +, 14 +, 16 +, 18 +, 20 + and 22 +
designerIntel +
first launchedOctober 2014 +
full page nameintel/microarchitectures/broadwell (client) +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameBroadwell +
pipeline stages (max)19 +
pipeline stages (min)14 +
process14 nm (0.014 μm, 1.4e-5 mm) +