From WikiChip
Difference between revisions of "amd/microarchitectures/bulldozer"
< amd‎ | microarchitectures

(add tables)
 
(2 intermediate revisions by 2 users not shown)
Line 6: Line 6:
 
| manufacturer    = AMD
 
| manufacturer    = AMD
 
| introduction    = October 12, 2011
 
| introduction    = October 12, 2011
| phase-out        =  
+
| phase-out        = 2012
| process          = 32 nm
+
| process          = 32 nm <!--
|isa=x86-64
+
| process 2        = 28 nm -->
 +
| cores            = 4
 +
| cores 2          = 6
 +
| cores 3          = 8
 +
| cores 4          = 12
 +
| cores 5          = 16
 +
| isa             = x86-64
 
| succession      = Yes
 
| succession      = Yes
 
| predecessor      = K10
 
| predecessor      = K10
 
| predecessor link = amd/microarchitectures/k10
 
| predecessor link = amd/microarchitectures/k10
 +
| predecessor 2    = K8
 +
| predecessor 2 link = amd/microarchitectures/k8
 +
| predecessor 3    = K6
 +
| predecessor 3 link = amd/microarchitectures/k6
 
| successor        = Piledriver
 
| successor        = Piledriver
 
| successor link  = amd/microarchitectures/piledriver
 
| successor link  = amd/microarchitectures/piledriver
 +
| successor 2      = Steamroller
 +
| successor 2 link = amd/microarchitectures/steamroller
 +
| successor 3      = Excavator
 +
| successor 3 link = amd/microarchitectures/excavator
 
}}
 
}}
'''Bulldozer''' was the [[microarchitecture]] developed by [[AMD]] as a successor to {{\\|K10}}. Bulldozer was superseded by {{\\|Piledriver}} in 2012.
+
 
 +
'''Bulldozer''' was the [[microarchitecture]] developed by [[AMD]] as a successor to {{\\|K10}}.  
 +
:Bulldozer was superseded by {{\\|Piledriver}} in [[2012]].
  
 
== Architecture ==
 
== Architecture ==
'''Bulldozer''' introduced the concept of Clustered MultiThreading (CMT) to the AMD64 architecture.  In CMT, major functional units are shared between groups of cores to reduce the average die-area cost of implementing them.  Other CMT implementations include the UltraSPARC-T1, codenamed '''Niagara''', which shared a single FPU between many small cores, each of which also implemented four-way SMT.  Overall, Bulldozer marked a major and ambitious departure from AMD's previous designs - and not a very successful one.
+
{{empty section}}
 
 
Bulldozer shared one 4-pipe FPU, one L1 I-cache, one set of branch-prediction hardware, one 4-way instruction decoder, and the L2 unified cache between a pair of cores, referred to by AMD as a module.  In the FX-8100 series CPUs, four modules were bundled together with a shared L3 cache, to make an 8-core CPU for the AM3+ socket.  Six and four core versions were also sold, by disabling one or two faulty modules; an individual core within a module could not be salvaged through die-harvesting.  The Bulldozer core was not used in an APU design, but its successors were.
 
 
 
Pre-release marketing by AMD indicated that the hardware dedicated to each core would include four "integer pipelines", which were each initially assumed by many to be functionally equivalent to the three integer pipelines in K7, K8 and K10.  The L1 D-cache would also be dedicated per core.  It was later revealed, however, that only two of the pipelines per core were ALUs, with the other two being AGUs associated with memory loads and stores.  Worse, Bulldozer's AGUs could not be used to execute the more complex forms of the LEA (Load Effective Address) instruction efficiently because they had no connection to the result bus, so these instructions had to be cracked for multiple passes through an ALU instead.  Effectively, Bulldozer was better described as having two "integer pipelines" and two load-store units per core.
 
 
 
Each of the three "integer pipelines" of K7, K8 and K10 included both an ALU and an AGU for a total of three of each, so Bulldozer (with only two of each) actually had a 33% reduction in integer throughput per core per clock relative to its immediate predecessor, instead of the 33% increase (ie. four of each) that AMD marketing had implied.  This had an all too predictable effect on integer performance.
 
 
 
The shared FPU was considerably beefed up from K10's, with two FMAC pipelines - capable of executing adds, multiplies, and the new fused-multiply-add (FMA) instructions - and two additional pipelines for other FPU-related operations.  In principle, even with both cores heavily using the shared FPU, this increased capability should have retained rough parity in throughput per core per clock with K10.  It could fairly be claimed that this FPU was Bulldozer's best feature.
 
 
 
The four-way instruction decoder, along with the branch predictor, instruction fetcher, and L1 I-cache, could be dedicated to one core if the other core was in sleep mode.  Otherwise, they would each dedicate themselves to alternate cores on successive cycles, effectively halving the fetch and decode bandwidth observed by each core.  The decoder was capable of handling four single-op instructions, one double-op and two single-op, or up to four ops from a microcoded instruction, per cycle.  Two consecutive double-op instructions had to be decoded in separate cycles.  These limitations proved to have a significant effect on performance.
 
 
 
Other differences from K10 lay in the cache hierarchy.  The L1 I-cache, which as previously mentioned was now shared between two cores, was still 64KB and 2-way set-associative, which quickly proved to be a bottleneck when running heterogeneous workloads, because code running on one core would repeatedly evict that required by the other.  The L1 D-cache was sharply reduced in size to 16KB and became write-through instead of write-back.  Cache and memory latencies were found to be much higher than in K10, which was particularly disappointing since K8 and K10 had made a point of having low latencies with their on-die memory controllers.
 
 
 
Benchmarks quickly demonstrated that, despite Bulldozer's high core count, high clock speeds (an overclocking marketing stunt reached 8GHz on LN2) and correspondingly high power consumption, its overall performance was generally no better and often worse than K10.  Some aspects of this improved slightly with better OS support, but there were fundamental problems that no mere software tweaks could overcome.  Compared to Bulldozer's immediate competitor, Sandy Bridge, the advantage was clearly with the latter, especially for games which, at the time, rarely used more than two or three threads effectively and thrived on low memory latency.
 
 
 
== Die Shot ==
 
 
 
[[File:Bulldozer Die Shot.jpg|frame|center]]
 
  
 
== All Bulldozer Chips ==
 
== All Bulldozer Chips ==
Line 66: Line 64:
 
{{table count|col=7|ask=[[Category:microprocessor models by amd]] [[instance of::microprocessor]] [[microarchitecture::Bulldozer]]}}
 
{{table count|col=7|ask=[[Category:microprocessor models by amd]] [[instance of::microprocessor]] [[microarchitecture::Bulldozer]]}}
 
</table>
 
</table>
 +
 +
== Buldozer Family ==
 +
{|class="wikitable sortable" style="text-align: center; width: 100%; font-size: 100%;"
 +
|-
 +
! [[Microarchitecture|Architecture]]
 +
! [[technology node|Process]] <br>(nm)
 +
! Family
 +
! Release <br>date
 +
! Code name
 +
! Model group
 +
! Cores
 +
! Clock rate <br>(MHz)
 +
|-
 +
! rowspan=17 | {{amd|Bulldozer|l=arch}} <br>(x86-64)
 +
| rowspan=14 | [[32 nm]]
 +
| rowspan=6 | {{amd|Bulldozer|l=arch}}
 +
| rowspan=3 | Oct 2011
 +
| rowspan=3 | Zambezi
 +
| FX-4100 series <br>(4100, 4120, 4130, 4150, 4170)
 +
| 4
 +
| 3600–4200 <br>(3700–4300 boost)
 +
|-
 +
| FX-6100 series <br>(6100, 6120, 6130, 6200)
 +
| 6
 +
| 3300–3800 <br>(3600–4200 boost)
 +
|-
 +
| FX-8100 series <br>(8100, 8120, 8140, 8150, 8170)
 +
| 8
 +
| 2800–3900 <br>(3100–4500 boost)
 +
|-
 +
| Mar 2012
 +
| Zurich
 +
| {{amd|Opteron}} 3200 series <br>(3250HE, 3260HE, 3280)
 +
| 4/8
 +
| 2400–2700 <br>(2700–3700 boost)
 +
|-
 +
| rowspan=2 | Nov 2011
 +
| Valencia
 +
| {{amd|Opteron}} 4200 series <br>(42DX, 42MX, 4226, 4228HE, <br>4230HE, 4234, 4238, 4240, <br>4256EE, 4274HE, 4276HE, <br>4280, 4284)
 +
| 4/6/8
 +
| 1600–3400 <br>(1900–3800 boost)
 +
|-
 +
| Interlagos
 +
| {{amd|Opteron}} 6200 series <br>(6204, 6212, 6220, 6230HE, <br>6234, 6238, 6262HE, 6272, <br>6274, 6276, 6278, <br>6282SE, 6284SE)
 +
| 4/8/12/16
 +
| 1600–3300 <br>(2100–3600 boost)
 +
|-
 +
| rowspan=8 | {{amd|Piledriver|l=arch}}
 +
|
 +
| Trinity
 +
| {{amd|Sempron}} X2 240, {{amd|Athlon X2}} 340, <br>{{amd|Athlon X4}} 740, {{amd|Athlon X4}} 750K, <br>FirePro A300, FirePro A320 <!-- <br>A4-4300M, A4-4355M, A4-5300, <br>A4-5300B, A6-4400M, A6-4455M, <br>A6-5400K, A6-5400B, A8-5500, <br>A8-4500M, A8-4555M, A8-5500B, <br>A8-5600K, A10-4600M, A10-4655M, <br>A10-5700, A10-5800K, A10-5800B -->
 +
| 2/4
 +
| 1600–3800 <br>(2400–4200 boost)
 +
|-
 +
|
 +
| Richland
 +
| {{amd|Sempron}} X2 250, {{amd|Athlon X2}} 350, <br>Athlon X2 370K, Athlon X2 750, <br>Athlon X2 760K <!-- FX-670K, A4-4000, A4-4320, A4-5145M, A4-5150M, A4-6300, A4-6300B, A4-6320, A4-6320B, A4-7300, A4PRO-7300B, A6-5345M, A6-5350M, A6-5357M, A6-6400B , A6-6400K, A6-6420B, A6-6420K, A8-5345M, A8-5350M, A8-5357M, A8-6500, A8-6500T, A8-6500B, A8-6600, A10-5745M, A10-5750M, A10-5757M, A10-6700, A10-6700T, A10-6790B, A10-6790K, A10-6800B, A10-6800K -->
 +
| 2/4
 +
| 1700–4100 <br>(2600–4400 boost)
 +
|-
 +
| rowspan=3 | Oct 2012
 +
| rowspan=3 | Vishera
 +
| FX-4300 series <br>(4300, 4320, 4350)
 +
| 4
 +
| 3800–4200 <br>(4000–4300 boost)
 +
|-
 +
| FX-6300 series <br>(6300, 6350)
 +
| 6
 +
| 3500–3900 <br>(4100–4200 boost)
 +
|-
 +
| FX-8300 series <br>(8300, 8310, 8320, 8320E, 8350, <br>8370, 8370E, 9370, 9590)
 +
| 8
 +
| 3300–4700 <br>(4000–5000 boost)
 +
|-
 +
| rowspan=2 | Dec 2012
 +
| Delhi
 +
| {{amd|Opteron}} 3300 series <br>(3320EE, 3350HE, 3365, 3380)
 +
| 4/8
 +
| 1900–2800 <br>(2100–3800 boost)
 +
|-
 +
| Seoul
 +
| {{amd|Opteron}} 4300 series <br>(43CXEE, 43GKHE, 4310EE, <br>4332HE, 4334, 4340, <br>4365EE, 4376HE, 4386)
 +
| 4/6/8
 +
| 2000–3500 <br>(2300–3800 boost)
 +
|-
 +
| Nov 2012
 +
| Abu Dhabi
 +
| {{amd|Opteron}} 6300 series <br>(6308, 6320, 6328, 6338P, <br>6344, 6348, 6366HE, 6370P, <br>6376, 6378, 6380, 6386SE)
 +
| 4/8/12/16
 +
| 1800–3500 <br>(2300–3800 boost)
 +
|-
 +
| rowspan=3 | [[28 nm]]
 +
| {{amd|Steamroller|l=arch}}
 +
|
 +
| Kaveri
 +
| {{amd|Athlon X2}} 450, {{amd|Athlon X4}} 840, <br>Athlon X4 860K, Athlon X4 870K, <br>Athlon X4 880K <!-- FX-770K, FX-7500, FX-7600P, A4Pro-7350B, A6-7000, A6Pro-7050B, A6-7400K, A6Pro-7400B, A6-7470K, A8-7100, A8Pro-7150B, A8-7200P, A8-7600, A8Pro-7600B, A8-7650K, A8-7670K, A10-7300, A10Pro-7350B, A10-7400P, A10-7700K, A10-7800, A10Pro-7800B, A10-7850K, A10-7850B, A10-7860K, A10-7870K, A10-7890K -->
 +
| 2/4
 +
| 1800–4100 <br>(3000–4300 boost)
 +
|-
 +
| rowspan=2 | {{amd|Excavator|l=arch}}
 +
|
 +
| Carrizo
 +
| {{amd|Athlon X4}} 835, {{amd|Athlon X4}} 845, <br>FX-8800P, A6-8500P, A6Pro-8500B <!-- A8-8600P, A8Pro-8600B, A10-8700P, A10Pro-8700B, A10-8780P, A12Pro-8800B -->
 +
| 2/4
 +
| 1600–3500 <br>(3000–3800 boost)
 +
|-
 +
|
 +
| Bristol Ridge
 +
| {{amd|Athlon X4}} 940, {{amd|Athlon X4}} 950, <br>{{amd|Athlon X4}} 970 <!-- FX-9800P, FX-9830P, A6-9500, A6PRO-9500, A6Pro-9500B, A6-9500E, A6Pro-9500E, A6-9550, A8-9600, A8PRO-9600, A8Pro-9600B, A8Pro-9630B, A10-9600P, A10-9620P, A10-9630P, A10-9700, A10Pro-9700, A10Pro-9700B, A10-9700E, A10Pro-9700E, A10Pro-9730B, A12-9700P, A12-9720P, A12-9730P, A12-9800, A12Pro-9800, A12Pro-9800B, A12-9800E, A12Pro-9800E, A12Pro-9830B -->
 +
| 2/4
 +
| 2300–3800 <br>(3200–4200 boost)
 +
|-
 +
|}
 +
 +
== Die Shot ==
 +
 +
[[File:Bulldozer Die Shot.jpg|frame|center]]
  
 
== See also ==
 
== See also ==
* {{intel|Ivy Bridge}}
+
* [[AMD]] • {{amd|Microarchitectures}}
 +
* [[Intel]] • {{intel|Ivy Bridge}}

Latest revision as of 08:51, 28 April 2025

Edit Values
Bulldozer µarch
General Info
Arch TypeCPU
DesignerAMD
ManufacturerAMD
IntroductionOctober 12, 2011
Phase-out2012
Process32 nm
Core Configs4, 6, 8, 12, 16
Instructions
ISAx86-64
Succession

Bulldozer was the microarchitecture developed by AMD as a successor to K10.

Bulldozer was superseded by Piledriver in 2012.

Architecture[edit]

New text document.svg This section is empty; you can help add the missing info by editing this page.

All Bulldozer Chips[edit]

Bulldozer Chips
ModelFamilyCoreLaunchedPower DissipationFreqMax Mem
Count: 0

Buldozer Family[edit]

Architecture Process
(nm)
Family Release
date
Code name Model group Cores Clock rate
(MHz)
Bulldozer
(x86-64)
32 nm Bulldozer Oct 2011 Zambezi FX-4100 series
(4100, 4120, 4130, 4150, 4170)
4 3600–4200
(3700–4300 boost)
FX-6100 series
(6100, 6120, 6130, 6200)
6 3300–3800
(3600–4200 boost)
FX-8100 series
(8100, 8120, 8140, 8150, 8170)
8 2800–3900
(3100–4500 boost)
Mar 2012 Zurich Opteron 3200 series
(3250HE, 3260HE, 3280)
4/8 2400–2700
(2700–3700 boost)
Nov 2011 Valencia Opteron 4200 series
(42DX, 42MX, 4226, 4228HE,
4230HE, 4234, 4238, 4240,
4256EE, 4274HE, 4276HE,
4280, 4284)
4/6/8 1600–3400
(1900–3800 boost)
Interlagos Opteron 6200 series
(6204, 6212, 6220, 6230HE,
6234, 6238, 6262HE, 6272,
6274, 6276, 6278,
6282SE, 6284SE)
4/8/12/16 1600–3300
(2100–3600 boost)
Piledriver Trinity Sempron X2 240, Athlon X2 340,
Athlon X4 740, Athlon X4 750K,
FirePro A300, FirePro A320
2/4 1600–3800
(2400–4200 boost)
Richland Sempron X2 250, Athlon X2 350,
Athlon X2 370K, Athlon X2 750,
Athlon X2 760K
2/4 1700–4100
(2600–4400 boost)
Oct 2012 Vishera FX-4300 series
(4300, 4320, 4350)
4 3800–4200
(4000–4300 boost)
FX-6300 series
(6300, 6350)
6 3500–3900
(4100–4200 boost)
FX-8300 series
(8300, 8310, 8320, 8320E, 8350,
8370, 8370E, 9370, 9590)
8 3300–4700
(4000–5000 boost)
Dec 2012 Delhi Opteron 3300 series
(3320EE, 3350HE, 3365, 3380)
4/8 1900–2800
(2100–3800 boost)
Seoul Opteron 4300 series
(43CXEE, 43GKHE, 4310EE,
4332HE, 4334, 4340,
4365EE, 4376HE, 4386)
4/6/8 2000–3500
(2300–3800 boost)
Nov 2012 Abu Dhabi Opteron 6300 series
(6308, 6320, 6328, 6338P,
6344, 6348, 6366HE, 6370P,
6376, 6378, 6380, 6386SE)
4/8/12/16 1800–3500
(2300–3800 boost)
28 nm Steamroller Kaveri Athlon X2 450, Athlon X4 840,
Athlon X4 860K, Athlon X4 870K,
Athlon X4 880K
2/4 1800–4100
(3000–4300 boost)
Excavator Carrizo Athlon X4 835, Athlon X4 845,
FX-8800P, A6-8500P, A6Pro-8500B
2/4 1600–3500
(3000–3800 boost)
Bristol Ridge Athlon X4 940, Athlon X4 950,
Athlon X4 970
2/4 2300–3800
(3200–4200 boost)

Die Shot[edit]

Bulldozer Die Shot.jpg

See also[edit]

codenameBulldozer +
designerAMD +
first launchedOctober 12, 2011 +
full page nameamd/microarchitectures/bulldozer +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerAMD +
microarchitecture typeCPU +
nameBulldozer +
process32 nm (0.032 μm, 3.2e-5 mm) +