|
|
Line 1: |
Line 1: |
− | {{intel title|Broadwell|arch}}
| |
− | {{microarchitecture
| |
− | |atype=CPU
| |
− | |name=Broadwell
| |
− | |designer=Intel
| |
− | |manufacturer=Intel
| |
− | |introduction=October, 2014
| |
− | |process=14 nm
| |
− | |cores=2
| |
− | |cores 2=4
| |
− | |cores 3=6
| |
− | |cores 4=8
| |
− | |cores 5=10
| |
− | |cores 6=12
| |
− | |cores 7=14
| |
− | |cores 8=16
| |
− | |cores 9=18
| |
− | |cores 10=20
| |
− | |cores 11=22
| |
− | |type=Superscalar
| |
− | |speculative=Yes
| |
− | |renaming=Yes
| |
− | |stages min=14
| |
− | |stages max=19
| |
− | |isa=IA-32
| |
− | |isa 2=x86-64
| |
− | |extension=MOVBE
| |
− | |extension 2=MMX
| |
− | |extension 3=SSE
| |
− | |extension 4=SSE2
| |
− | |extension 5=SSE3
| |
− | |extension 6=SSSE3
| |
− | |extension 7=SSE4.1
| |
− | |extension 8=SSE4.2
| |
− | |extension 9=POPCNT
| |
− | |extension 10=AVX
| |
− | |extension 11=AVX2
| |
− | |extension 12=AES
| |
− | |extension 13=PCLMUL
| |
− | |extension 14=FSGSBASE
| |
− | |extension 15=RDRND
| |
− | |extension 16=FMA3
| |
− | |extension 17=F16C
| |
− | |extension 18=BMI
| |
− | |extension 19=BMI2
| |
− | |extension 20=VT-x
| |
− | |extension 21=VT-d
| |
− | |extension 22=TXT
| |
− | |extension 23=TSX
| |
− | |extension 24=RDSEED
| |
− | |extension 25=ADCX
| |
− | |extension 26=PREFETCHW
| |
− | |l1i=32 KiB
| |
− | |l1i per=core
| |
− | |l1i desc=8-way set associative
| |
− | |l1d=32 KiB
| |
− | |l1d per=core
| |
− | |l1d desc=8-way set associative
| |
− | |l2=256 KiB
| |
− | |l2 per=core
| |
− | |l2 desc=8-way set associative
| |
− | |l3=1.5 MiB
| |
− | |l3 per=core
| |
− | |l4=128 MiB
| |
− | |l4 per=package
| |
− | |l4 desc=on Iris Pro GPUs only
| |
− | |core name=Broadwell Y
| |
− | |core name 2=Broadwell U
| |
− | |core name 3=Broadwell H
| |
− | |core name 4=Broadwell DT
| |
− | |core name 5=Broadwell EP
| |
− | |core name 6=Broadwell EX
| |
− | |core name 7=Broadwell E
| |
− | |predecessor=Haswell
| |
− | |predecessor link=intel/microarchitectures/haswell
| |
− | |successor=Skylake (client)
| |
− | |successor link=intel/microarchitectures/skylake (client)
| |
− | |successor 2=Skylake (server)
| |
− | |successor 2 link=intel/microarchitectures/skylake (server)
| |
− | |pipeline=Yes
| |
− | |OoOE=Yes
| |
− | |issues=4
| |
− | |inst=Yes
| |
− | |cache=Yes
| |
− | |core names=Yes
| |
− | |succession=Yes
| |
− | }}
| |
− | '''Broadwell''' ('''BDW''') is [[Intel]]'s [[microarchitecture]] based on the [[14 nm process]] for mobile, desktops, and servers. Introduced in early 2015, Broadwell is a [[process shrink]] of {{\\|Haswell}} which introduced several enhancements. Broadwell is named after [[wikipedia:Broadwell, Illinois|Broadwell, Illinois]].
| |
| | | |
− | For desktop and mobile, Broadwell is branded as 5th Generation Intel {{intel|Core}} processors. For server class processors, Intel branded it as {{intel|Xeon E3|Xeon E3 v4}}, {{intel|Xeon E5|Xeon E5 v4}}, and {{intel|Xeon E7|Xeon E7 v4}}.
| |
− | == Codenames ==
| |
− | {| class="wikitable"
| |
− | |-
| |
− | ! Core !! Abbrev !! Target
| |
− | |-
| |
− | | Broadwell Y || BDW-Y || {{intel|Core M|Core M family}}, SoC for Smartphones, 2-in-1s Tablets, and notebooks
| |
− | |-
| |
− | | Broadwell U || BDW-U || {{intel|Core}} ultrabooks
| |
− | |-
| |
− | | Broadwell H || BDW-H || IoT (QM87, HM86/HM87 Chipsets), All-in-ones
| |
− | |-
| |
− | | Broadwell DT || BDW-DT || Unlocked desktop MPUs
| |
− | |-
| |
− | | Broadwell EP || BDW-EP || {{intel|Xeon E5}}, Dual-Processor platform
| |
− | |-
| |
− | | Broadwell EX || BDW-EX || {{intel|Xeon E7}}, Multi-Processor platform, QPI
| |
− | |-
| |
− | | Broadwell E || BDW-E || High-End Desktops (HEDT)
| |
− | |}
| |
− |
| |
− | == Process Technology ==
| |
− | {| class="wikitable" style="float: right;"
| |
− | ! colspan="2" | 14 nm Manufacturing Fabs
| |
− | |-
| |
− | ! Fab !! Location
| |
− | |-
| |
− | | D1X || Hillsboro, Oregon
| |
− | |-
| |
− | | D1D || Hillsboro, Oregon
| |
− | |-
| |
− | | D1C || Hillsboro, Oregon
| |
− | |-
| |
− | | Fab 32 || Chandler, Arizona
| |
− | |-
| |
− | | Fab 24 || Leixlip, Ireland
| |
− | |}
| |
− | Broadwell is designed to be manufactured using [[14 nm]] Tri-gate [[FinFET]] transistors. This correlates to 8 nm Fin width and a 42 nm Fin pitch (shown below). SRAM cell is at 0.0706 µm² and 0.0499 µm² for high performance and high density.
| |
− |
| |
− |
| |
− | [[Scaling]]:
| |
− |
| |
− | {| class="wikitable"
| |
− | |-
| |
− | ! !! Haswell !! Broadwell !! Δ !! rowspan="8" | [[File:intel 14nm fin.png|250px]]
| |
− | |-
| |
− | | || [[22 nm]] || [[14 nm]] ||
| |
− | |-
| |
− | | Fin Pitch || 60 nm || 42 || 0.70x
| |
− | |-
| |
− | | Fin Width || 8 nm || 8 nm || 1x
| |
− | |-
| |
− | | Fin Height || 34 nm || 42 nm || 1.24x
| |
− | |-
| |
− | | Gate Pitch || 90 nm || 70 nm || 0.78x
| |
− | |-
| |
− | | Interconnect Pitch || 80 nm || 52 nm || 0.65x
| |
− | |-
| |
− | | Cell Height || 840 nm || 399 nm || 0.48x
| |
− | |}
| |
− |
| |
− | == Architecture==
| |
− | Broadwell is for the most part identical to {{\\|Haswell}} with several enhancements, including new instruction set extensions.
| |
− |
| |
− | === Key changes from {{\\|Haswell}} ===
| |
− | [[File:broadwell buffer window.png|right|350px]]
| |
− | * ~5% IPC improvement
| |
− | * FP multiplication instructions has reduced latency (3 cycles, down from 5)
| |
− | ** Affects AVX, SSE, and FP instructions
| |
− | * {{x86|CLMUL}} instructions are now a single [[μop]], improving latency and throughput
| |
− | * The second-level TLB (STLB)
| |
− | ** Table was enlarged (1,536 entries, up from 1024)
| |
− | ** 1GB page mode (16 entries, 4-ways set associative)
| |
− | * Execution Engine
| |
− | ** Larger scheduler (64 entries, up from 60)
| |
− | ** Larger instruction queue (25 entries/thread, up from 20)
| |
− | ** Faster store-to-load forwarding
| |
− | ** Address prediction for branches and returns was improved
| |
− | ** Improved cryptography acceleration instructions
| |
− |
| |
− | Core features maintained a 2:1 ratio of performance:power.
| |
− |
| |
− | ==== Graphics ====
| |
− | * 50% higher sampler throughput
| |
− | * Improvements for increased geometry, Z, Pixel Fill
| |
− | * Direct X 11.2, OpenGL 4.3
| |
− | * OpenCL 1.2 and 2.0 (with Shared Virtual Memory)
| |
− | * Up to 24 EUs (20% addition, up from 20 in {{\\|Haswell}}), 48 EUs on {{intel|Iris Pro Graphics}}
| |
− |
| |
− | ==== New instructions ====
| |
− | {{main|#Added instructions|l1=See #Added_instructions for the complete list}}
| |
− | Broadwell introduced a number of new instructions:
| |
− | * {{x86|RDSEED|<code>RDSEED</code>}} - Generates 16, 32 or 64 bit random numbers seeds ([[NIST SP 800-90B]] & [[NIST SP 800-90C]])
| |
− | * {{x86|ADCX|<code>ADCX</code>}} - Arbitrary precision integer operations
| |
− | * {{x86|PREFETCHW|<code>PREFETCHW</code>}} - Prefetch data into caches, hinting a write is expected in the future
| |
− |
| |
− | === Block Diagram ===
| |
− | [[File:broadwell block diagram.svg]]
| |
− |
| |
− | === Memory Hierarchy ===
| |
− | [[File:Intel-Xeon-processor-D-1500-wafer.jpg|right|thumb|350px|Broadwell {{intel|Xeon D}} wafer]]
| |
− | * Cache
| |
− | ** L1 Cache:
| |
− | *** 32 KiB 8-way [[set associative]] instruction, 64 B line size
| |
− | *** 32 KiB 8-way set associative data, 64 B line size
| |
− | *** Write-back policy
| |
− | *** Per core
| |
− | ** L2 Cache:
| |
− | *** 256 KiB 8-way set associative, 64 B line size
| |
− | *** Write-back policy
| |
− | *** Per core
| |
− | ** L3 Cache:
| |
− | *** 1.5 - 3 MiB per core, 64 B line size
| |
− | *** 16-20 -way set associative
| |
− | *** Write-back policy
| |
− | ** L4 Cache:
| |
− | *** 128 MiB
| |
− | *** [[eDRAM]]
| |
− | *** shared with GPU ({{intel|Crystal Well}})
| |
− | *** {{intel|Iris Pro}} models only
| |
− |
| |
− | Broadwell TLB consists of dedicated level one TLB for instruction cache and another one for data cache. Additionally there is a unified second level TLB.
| |
− | * TLBs:
| |
− | ** ITLB
| |
− | *** 4 KiB page translations:
| |
− | **** 128 entries; 4-way set associative
| |
− | **** dynamic partition; divided between the two threads
| |
− | *** 2 MiB / 4MiB page translations:
| |
− | **** 8 entries; fully associative
| |
− | **** Duplicated for each thread
| |
− | ** DTLB
| |
− | *** 4 KiB page translations:
| |
− | **** 64 entries; 4-way set associative
| |
− | **** fixed partition; divided between the two threads
| |
− | *** 2 MiB / 4 MiB page translations:
| |
− | **** 32 entries; 4-way set associative
| |
− | *** 1 GiB page translations:
| |
− | **** 4 entries; 4-way set associative
| |
− | ** STLB
| |
− | *** 4 KiB + 2 MiB page translations:
| |
− | **** 1536 entries; 6-way set associative
| |
− | **** shared
| |
− | *** 1 GiB page translations:
| |
− | **** 16 entries; 4-way set associative
| |
− |
| |
− | === Pipeline ===
| |
− | {{main|intel/microarchitectures/haswell#Pipeline|l1=Haswell's Pipeline}}
| |
− | Broadwell's pipeline is identical to Haswell.
| |
− |
| |
− | == High Core count (EP) ==
| |
− | * Key Changes from {{\\|Haswell}}:
| |
− | ** Up to 22 cores (up from 18)
| |
− | ** Up to 44 threads (up from 36)
| |
− | ** Up to 55 MiB [[last level cache|LLC]] (up from 45 MiB)
| |
− | ** Up to 2400 DDR (from 2133)
| |
− |
| |
− | {{expand section}}
| |
− |
| |
− | === Snoop Modes ===
| |
− | Broadwell EP has four snoop modes: Home Snoop (HS), Early Snoop (ES), Cluster-on-Die (COD) and Home Snoop with Directory and Opportunistic Snoop Broadcast (HS with DIR + OSB).
| |
− |
| |
− | {| class="wikitable tc2 tc3 tc4 tc5"
| |
− | |-
| |
− | ! Performance Metric !! HS \w DIR+OSB !! COD !! Home Snoop !! Early Snoop
| |
− | |-
| |
− | ! colspan="5" | System configured as [[NUMA]]
| |
− | |-
| |
− | | LCC Hit Latency || Low || Lowest || Low || Low
| |
− | |-
| |
− | | Local Memory Latency || Low || Lowest || High<sup>1</sup> || Medium<sup>1</sup>
| |
− | |-
| |
− | | Remote Memory Latency || Low || Low-High<sup>1</sup> || Low || Lowest
| |
− | |-
| |
− | | Local Memory Bandwidth || High || High || High || Low
| |
− | |-
| |
− | | Remote Memory Bandwidth || High || Medium || High || Medium
| |
− | |-
| |
− | ! colspan="5" | System configured as [[UMA]]
| |
− | |-
| |
− | | Memory Latency || Low || rowspan="2" | Not an advised configuration || Low || Lowest
| |
− | |-
| |
− | | Memory Bandwidth || High || High || Medium
| |
− | |}
| |
− |
| |
− | <sup>1</sup> - Performance depends on the directory state. Expect low latency with a clean directory and high latency with a dirty directory.
| |
− |
| |
− | === Die Stats ===
| |
− | {| class="wikitable" style="text-align: center;"
| |
− | |-
| |
− | ! colspan="3" style="background:#D6D6FF;" | Layout
| |
− | |-
| |
− | ! Low Core Count (LCC) !! Medium Core Count (MCC) !! High Core Count (HCC)
| |
− | |-
| |
− | | Up to 10 Cores || 12-14 Cores || 16+ Cores
| |
− | |-
| |
− | | 246.24 mm² || 306.18 mm² || 456.12 mm²
| |
− | |-
| |
− | | ~3,200,000,000 Transistors || ~4,700,000,000 Transistors || ~7,200,000,000 Transistors
| |
− | |-
| |
− | |[[File:E5 v4 LCC.png|300px]] || [[File:E5 v4 MCC.png|300px]] || [[File:E5 v4 HCC.png|300px]]
| |
− | |}
| |
− |
| |
− | == Die ==
| |
− | ===Dual-core Broadwell die===
| |
− |
| |
− | * [[14 nm process]]
| |
− | * 13 metal layers
| |
− | * 1,300,000,000 transistors
| |
− | * 82 mm<sup>2</sup> die size
| |
− | * [[2 cores]]
| |
− |
| |
− | : [[File:broadwell die (dual-core).jpg|850px]]
| |
− |
| |
− |
| |
− | ===Dual-core Broadwell with {{intel|Iris Pro}} die===
| |
− |
| |
− | * [[14 nm process]]
| |
− | * 13 metal layers
| |
− | * 1,900,000,000 transistors
| |
− | * 133 mm<sup>2</sup> die size
| |
− | * [[2 cores]]
| |
− |
| |
− | : [[File:broadwell with iris pro die (dual-core).png|850px]]
| |
− |
| |
− |
| |
− | ===Quad-core Broadwell with {{intel|Iris Pro}} die===
| |
− |
| |
− | Die shot of the {{intel|Core i7-5775C}} microprocessor.
| |
− |
| |
− | * [[14 nm process]]
| |
− | * 13 metal layers
| |
− | * ? transistors
| |
− | * ? mm<sup>2</sup> die size
| |
− | * [[4 cores]]
| |
− |
| |
− | : [[File:broadwell core i7-5775C die.jpg|650px]]
| |
− |
| |
− |
| |
− | ===Deca-core Broadwell ===
| |
− |
| |
− | Die shot of the {{intel|Core i7-6950X}} microprocessor.
| |
− |
| |
− | * [[14 nm process]]
| |
− | * ? metal layers
| |
− | * 3,400,000,000 transistors
| |
− | * 246 mm<sup>2</sup> die size
| |
− | * [[10 cores]]
| |
− |
| |
− | :[[File:broadwell (deca-core) die shot.png|650px]]
| |
− |
| |
− | :[[File:broadwell (deca-core) die shot (annotated).png|650px]]
| |
− |
| |
− | == Added instructions ==
| |
− | '''{{x86|RDSEED}}''' - Generates 16, 32 or 64 bit random numbers seeds (both [[NIST SP 800-90B]] and [[NIST SP 800-90C]] compliant)
| |
− |
| |
− | {{collist
| |
− | | count = 1
| |
− | | width = 150px
| |
− | |
| |
− | * {{x86|RDSEED}}
| |
− | }}
| |
− |
| |
− | '''{{x86|ADCX}}''' - Arbitrary precision integer operations
| |
− |
| |
− | {{collist
| |
− | | count = 1
| |
− | | width = 150px
| |
− | |
| |
− | * {{x86|ADCX}}
| |
− | * {{x86|ADOX}}
| |
− | }}
| |
− |
| |
− | '''{{x86|PREFETCHW}}''' - Prefetch data into caches, hinting a write is expected in the future.
| |
− |
| |
− | {{collist
| |
− | | count = 1
| |
− | | width = 150px
| |
− | |
| |
− | * {{x86|PREFETCHW}}
| |
− | }}
| |
− |
| |
− | == Cores ==
| |
− | {{empty section}}
| |
− |
| |
− | == All Broadwell Chips ==
| |
− | <!-- NOTE:
| |
− | This table is generated automatically from the data in the actual articles.
| |
− | If a microprocessor is missing from the list, an appropriate article for it needs to be
| |
− | created and tagged accordingly.
| |
− |
| |
− | Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
| |
− | -->
| |
− | {{comp table start}}
| |
− | <table class="comptable sortable tc6 tc7 tc20 tc21 tc22 tc23 tc24 tc25">
| |
− | <tr class="comptable-header"><th> </th><th colspan="19">List of Broadwell Processors</th></tr>
| |
− | <tr class="comptable-header"><th> </th><th colspan="9">Main processor</th><th colspan="5">{{intel|Turbo Boost}}</th><th>Mem</th><th colspan="3">IGP</th></tr>
| |
− | {{comp table header 1|cols=Launched, Price, Family, Core Name, Cores, Threads, %L2$, %L3$, TDP, %Frequency, 1 Core, 2 Cores, 3 Cores, 4 Cores, Max Mem, GPU, %Frequency, Turbo}}
| |
− | <tr class="comptable-header comptable-header-sep"><th> </th><th colspan="20">[[Uniprocessors]]</th></tr>
| |
− | {{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Broadwell]] [[max cpu count::1]]
| |
− | |?full page name
| |
− | |?model number
| |
− | |?first launched
| |
− | |?release price
| |
− | |?microprocessor family
| |
− | |?core name
| |
− | |?core count
| |
− | |?thread count
| |
− | |?l2$ size
| |
− | |?l3$ size
| |
− | |?tdp
| |
− | |?base frequency#GHz
| |
− | |?turbo frequency (1 core)#GHz
| |
− | |?turbo frequency (2 cores)#GHz
| |
− | |?turbo frequency (3 cores)#GHz
| |
− | |?turbo frequency (4 cores)#GHz
| |
− | |?max memory#GiB
| |
− | |?integrated gpu
| |
− | |?integrated gpu base frequency
| |
− | |?integrated gpu max frequency
| |
− | |format=template
| |
− | |template=proc table 3
| |
− | |searchlabel=
| |
− | |sort=microprocessor family, model number
| |
− | |order=asc,asc
| |
− | |userparam=20
| |
− | |mainlabel=-
| |
− | |limit=200
| |
− | }}
| |
− | <tr class="comptable-header comptable-header-sep"><th> </th><th colspan="20">[[Multiprocessors]] (2-way)</th></tr>
| |
− | {{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Broadwell]] [[max cpu count::2]]
| |
− | |?full page name
| |
− | |?model number
| |
− | |?first launched
| |
− | |?release price
| |
− | |?microprocessor family
| |
− | |?core name
| |
− | |?core count
| |
− | |?thread count
| |
− | |?l2$ size
| |
− | |?l3$ size
| |
− | |?tdp
| |
− | |?base frequency#GHz
| |
− | |?turbo frequency (1 core)#GHz
| |
− | |?turbo frequency (2 cores)#GHz
| |
− | |?turbo frequency (3 cores)#GHz
| |
− | |?turbo frequency (4 cores)#GHz
| |
− | |?max memory#GiB
| |
− | |?integrated gpu
| |
− | |?integrated gpu base frequency
| |
− | |?integrated gpu max frequency
| |
− | |format=template
| |
− | |template=proc table 3
| |
− | |searchlabel=
| |
− | |sort=microprocessor family, model number
| |
− | |order=asc,asc
| |
− | |userparam=20
| |
− | |mainlabel=-
| |
− | |limit=200
| |
− | }}
| |
− | <tr class="comptable-header comptable-header-sep"><th> </th><th colspan="20">[[Multiprocessors]] (4-way)</th></tr>
| |
− | {{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Broadwell]] [[max cpu count::4]]
| |
− | |?full page name
| |
− | |?model number
| |
− | |?first launched
| |
− | |?release price
| |
− | |?microprocessor family
| |
− | |?core name
| |
− | |?core count
| |
− | |?thread count
| |
− | |?l2$ size
| |
− | |?l3$ size
| |
− | |?tdp
| |
− | |?base frequency#GHz
| |
− | |?turbo frequency (1 core)#GHz
| |
− | |?turbo frequency (2 cores)#GHz
| |
− | |?turbo frequency (3 cores)#GHz
| |
− | |?turbo frequency (4 cores)#GHz
| |
− | |?max memory#GiB
| |
− | |?integrated gpu
| |
− | |?integrated gpu base frequency
| |
− | |?integrated gpu max frequency
| |
− | |format=template
| |
− | |template=proc table 3
| |
− | |searchlabel=
| |
− | |sort=microprocessor family, model number
| |
− | |order=asc,asc
| |
− | |userparam=20
| |
− | |mainlabel=-
| |
− | |limit=200
| |
− | }}
| |
− | {{comp table count|ask=[[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Broadwell]]}}
| |
− | </table>
| |
− | {{comp table end}}
| |