From WikiChip
Editing intel/microarchitectures/bonnell

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 1: Line 1:
 
{{intel title|Bonnell|arch}}
 
{{intel title|Bonnell|arch}}
 
{{microarchitecture
 
{{microarchitecture
|atype=CPU
+
| atype           = CPU
|name=Bonnell
+
| name         = Bonnell
|designer=Intel
+
| designer     = Intel
|manufacturer=Intel
+
| manufacturer = Intel
|introduction=March 2, 2008
+
| introduction = 2008
|phase-out=2011
+
| phase-out     = 2011
|process=45 nm
+
| process       = 45 nm
|cores=1
+
| cores         = 1
|cores 2=2
+
| cores 2       = 2
|type=Superscalar
+
 
|oooe=No
+
| pipeline      = Yes
|speculative=Yes
+
| type         = Superscalar
|renaming=No
+
| OoOE          = No
|stages min=16
+
| speculative   = No
|stages max=19
+
| renaming     = No
|isa=x86-64
+
| isa          = IA-32
|extension=MOVBE
+
| isa 2        = x86-64
|extension 2=MMX
+
| stages min   = 16
|extension 3=SSE
+
| stages max   = 19
|extension 4=SSE2
+
| issues        = 2
|extension 5=SSE3
+
 
|extension 6=SSSE3
+
| inst          = Yes
|l1i=32 KiB
+
| feature      =  
|l1i per=Core
+
| extension     = MOVBE
|l1i desc=8-way set associative
+
| extension 2   = MMX
|l1d=24 KiB
+
| extension 3   = SSE
|l1d per=Core
+
| extension 4   = SSE2
|l1d desc=6-way set associative
+
| extension 5   = SSE3
|l2=512 KiB
+
| extension 6   = SSSE3
|l2 per=Core
+
 
|l2 desc=8-way set associative
+
| cache        = Yes
|core name=Silverthorne
+
| l1i           = 32 KiB
|core name 2=Diamondville
+
| l1i per       = Core
|core name 3=Lincroft
+
| l1i desc     = 8-way set associative
|core name 4=Pineview
+
| l1d           = 24 KiB
|core name 5=Tunnel Creek
+
| l1d per       = Core
|core name 6=Stellarton
+
| l1d desc     = 6-way set associative
|core name 7=Sodaville
+
| l2           = 512 KiB
|core name 8=Groveland
+
| l2 per       = Core
|successor=Saltwell
+
| l2 desc       = 8-way set associative
|successor link=intel/microarchitectures/saltwell
+
 
|pipeline=Yes
+
| core names      = Yes
|OoOE=No
+
| core name       = Silverthorne
|issues=2
+
| core name 2     = Diamondville
|inst=Yes
+
| core name 3     = Lincroft
|cache=Yes
+
| core name 4     = Pineview
|core names=Yes
+
| core name 5     = Tunnel Creek
|succession=Yes
+
| core name 6     = Stellarton
 +
| core name 7     = Sodaville
 +
| core name 8     = Groveland
 +
 
 +
| succession      = Yes
 +
| predecessor      =
 +
| successor       = Saltwell
 +
| successor link   = intel/microarchitectures/saltwell
 
}}
 
}}
'''Bonnell''' was a [[microarchitecture]] for [[Intel]]'s [[45 nm]] ultra-low voltage [[microprocessor]]s first introduced in 2008 for their then-new {{intel|Atom}} family. Bonnell, which was named after the highest point in [[wikipedia:Austin, Texas|Austin]] - [[wikipedia:Mount Bonnell|Mount Bonnell]], was Intel's first x86-compatible [[microarchitecture]] designed to target the ultra-low power market.
+
'''Bonnell''' was a [[microarchitecture]] for [[Intel]]'s [[45 nm]] ultra-low power [[microprocessor]]s first introduced in 2008 for their then-new {{intel|Atom}} family. Bonnell, which was named after the highest point in [[wikipedia:Austin, Texas|Austin]] - [[wikipedia:Mount Bonnell|Mount Bonnell]], was Intel's first x86-compatible [[microarchitecture]] designed to target the ultra-low power market.
  
Bonnell (project Silverthorne then) was designed by a then-new low-power design team Intel created at their Texas Development Center in Austin in 2004 along with a new chipset ({{intel|Poulsbo|l=chipset}}) design team. The design team was led by Elinora Yoeli. While Yoeli previously worked at her native country, Bonnell was a US design and was unconnected to any of Intel's projects worked on by the Israel Design Center in Haifa. Previously Yoeli led the Israeli team in the development of {{\\|Pentium M}}.
+
Bonnell (project Silverthorne then) was designed by a then-new low-power design team Intel created at their Texas Development Center in Austin in 2004 along with a new chipset (Poulsbo) design team. The design team was led by Elinora Yoeli. While Yoeli previously worked at her native country, Bonnell was a US design and was unconnected to any of Intel's projects worked on by the Israel Design Center in Haifa. Previously Yoeli led the Israeli team in the development of {{\\|Pentium M}}.
 
== Codenames ==
 
== Codenames ==
[[File:intel low-power roadmap (45-32-22).png|500px|right|thumb|[[45 nm]] - [[32 nm]] roadmap.]]
 
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Platform !! Chipset !! Core !! Target
+
! Chipset !! Platform !! PHC !! Core !! Target
 
|-
 
|-
| {{intel|Menlow|l=platform}} || {{intel|Poulsbo|l=chipset}} || {{intel|Silverthorne|l=core}} || MIDs
+
| {{intel|Poulsbo}} || {{intel|Menlow}} || || {{intel|Silverthorne}} || MIDs
 
|-
 
|-
| {{intel|Menlow|l=platform}} || {{intel|Poulsbo|l=chipset}} || {{intel|Diamondville|l=core}} || Nettops
+
| {{intel|Poulsbo}} || {{intel|Menlow}} || || {{intel|Diamondville}} || Nettops
 
|-
 
|-
| {{intel|Moorestown|l=platform}} || {{intel|Langwell}} || {{intel|Lincroft|l=core}} || MIDs
+
|  || {{intel|Moorestown}} || {{intel|Langwell}} || {{intel|Lincroft}} || MIDs
 
|-
 
|-
| {{intel|Pine Trail}} || {{intel|Tiger Point}} || {{intel|Pineview|l=core}} || Nettops
+
|  || {{intel|Pine Trail}} || {{intel|Tiger Point}} || {{intel|Pineview}} || Nettops
 
|-
 
|-
| {{intel|Queens Bay}} || {{intel|Topcliff}} || {{intel|Tunnel Creek|l=core}} || Embedded
+
| || {{intel|Queens Bay}} || {{intel|Topcliff}} || {{intel|Tunnel Creek}} || Embedded
 
|-
 
|-
| {{intel|Queens Bay}} || {{intel|Topcliff}} || {{intel|Stellarton|l=core}} || Embedded + [[Altera]] FPGA
+
| || {{intel|Queens Bay}} || {{intel|Topcliff}} || {{intel|Stellarton}} || Embedded + [[Altera]] FPGA
 
|-
 
|-
| || || {{intel|Sodaville|l=core}} || CE
+
|  || || || {{intel|Sodaville}} || CE
 
|-
 
|-
| || || {{intel|Groveland|l=core}} || CE
+
| || || || {{intel|Groveland}} || CE
|- style="text-decoration: line-through;"
 
| || || {{intel|Elk Rock|l=core}} || CE
 
 
|}
 
|}
 
<!--
 
should probably added at one point:
 
 
Moorestown - Langwell (Core), Briertown (power mng), and Evans Peak (wireless)
 
Tolapai SoC
 
-->
 
  
 
=== Generation successor ===
 
=== Generation successor ===
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! First Generation !! !! Second Generation !! !! Third Generation
+
! First Generation !! !! Second Generation
 
|-
 
|-
| {{intel|Silverthorne|l=core}} || → || {{intel|Lincroft|l=core}}
+
| {{intel|Silverthorne}} || → || {{intel|Lincroft}}
 
|-
 
|-
| {{intel|Diamondville|l=core}} || → || {{intel|Pineview|l=core}}
+
| {{intel|Diamondville}} || → || {{intel|Pineview}}
 
|-
 
|-
| || || {{intel|Tunnel Creek|l=core}}
+
| || || {{intel|Tunnel Creek}}
 
|-
 
|-
| || || {{intel|Stellarton|l=core}}
+
| || || {{intel|Stellarton}}
|-
 
| || || {{intel|Sodaville|l=core}} || → || {{intel|Groveland|l=core}}
 
|}
 
 
 
== Brands ==
 
Intel sold Bonnell-based processors under the '''{{intel|Atom}}''' brand. Additionally, manufacturers were allowed to use the '''Centrino Atom''' brand if the system consist of a Bonnell-based processor, the chipset, wireless capabilities ([[WiFi]], [[3G]], [[WiMAX]]), is battery powered, and had a screen size of up to 6".
 
 
 
{| class="wikitable"
 
 
|-
 
|-
| [[File:intel centrino atom logo.png|100px]] ||
+
| || || {{intel|Sodaville}}
* {{intel|Silverthorne|l=core}}
 
* {{intel|Poulsbo|l=chipset}}  
 
* [[WiFi]]/[[3G]]/[[WiMAX]]
 
* 6" or less Display
 
 
|-
 
|-
| [[File:intel atom logo (2008-2009).png|100px]] ||
+
| || || {{intel|Groveland}}
* {{intel|Silverthorne|l=core}}
 
 
|}
 
|}
 
 
== Release Dates ==
 
== Release Dates ==
The {{intel|Atom}} family was officially announced on March 2, 2008 under the '''Intel Atom''' and '''Intel Centrino Atom''' brands. Bonnell was first introduced on April 2nd [[2008]] during the Intel Developers Forum in Shanghai.
+
Bonnell was first announced on April 2nd [[2008]] during the Intel Developers Forum in Shanghai.
  
 
== Process Technology ==
 
== Process Technology ==
Line 149: Line 132:
 
|}
 
|}
 
{{clear}}
 
{{clear}}
 
== Compatibility ==
 
 
{| class="wikitable"
 
! Vendor !! OS  !! Version !! Notes
 
|-
 
| rowspan="4" | [[Microsoft]] || rowspan="4" | Windows || style="background-color: #d6ffd8;" | Windows XP Embedded SP2 || Support
 
|-
 
| style="background-color: #d6ffd8;" | Windows Embedded CE 6.0 || Support
 
|-
 
| style="background-color: #d6ffd8;" | Windows 7 || Support
 
|-
 
| style="background-color: #d6ffd8;" | Windows Embedded Standard 7 || Support
 
|-
 
| rowspan="2" | Linux || rowspan="2" | Linux || style="background-color: #d6ffd8;" | Kernel 2.4/2.6? || Initial Support
 
|-
 
| style="background-color: #d6ffd8;" | MeeGo 1 || Support
 
|}
 
  
 
== Compiler support ==
 
== Compiler support ==
Line 181: Line 146:
  
 
== Architecture ==
 
== Architecture ==
[[File:Menlo with Penny Comparison (cropped).jpg|right|450px|thumb|{{intel|Silverthorne|l=core}} processor next to a penny.]]
 
 
Bonnell features a brand new architecture not based on any previous Intel design. The architecture was specifically designed for ultra-mobile PCs (UMPCs), mobile internet devices (MID), and other embedded devices. Bonnell's primary goals were:
 
Bonnell features a brand new architecture not based on any previous Intel design. The architecture was specifically designed for ultra-mobile PCs (UMPCs), mobile internet devices (MID), and other embedded devices. Bonnell's primary goals were:
  
Line 189: Line 153:
  
 
Performance/Power new rule: +1% performance for at most +1% power consumption.
 
Performance/Power new rule: +1% performance for at most +1% power consumption.
 
In additional to full-[[x86]] compatibility and power requirements, Bonnell was also required to maintain 100% compatibility with Intel's {{intel|Core|l=arch}} architecture (specficially the then-new {{intel|Core 2 Duo}} processors.
 
  
 
=== Architecture ===
 
=== Architecture ===
Line 214: Line 176:
 
* 2 FP ALUs (1 adder, 1 for others)
 
* 2 FP ALUs (1 adder, 1 for others)
 
* No Integer multiplier & divider (shared with FP ALU instead)
 
* No Integer multiplier & divider (shared with FP ALU instead)
 
=== Block Diagram ===
 
[[File:bonnell block diagram.svg]]
 
  
 
=== Memory Hierarchy ===
 
=== Memory Hierarchy ===
Line 271: Line 230:
  
 
=== Overview ===
 
=== Overview ===
Bonnell's architecture shares very little in common with other Intel designs. To achieve the strict ultra-low power objects, Bonnell features a very slimmed down design discarding many high-performance techniques used by Intel's high-performance architectures such as aggressive [[speculative execution]], [[out-of-order]] execution, and µop transformation.
+
Bonnell's architecture shares very little in common with other Intel designs. To achieve the strict ultra-low power objects, Bonnell features a very slimmed own design discarding many high-performance techniques used by Intel's high-performance architectures such as aggressive [[speculative execution]], [[out-of-order]] execution, and µop transformation.
 
 
Part of the design requirement was that Bonnell retain full [[x86]] compatibility, up to the latest extension - at one tenth of the power consumption of the {{\\|Pentium M}}. This meant any software is now 100% compatible but it forced engineers to deal with all the baggage the architecture brought along. The decision to offer full compatibility brought its own set of benefits such as access to the largest software code base in the world, including the ability to run any other [[x86]] operating system unmodified. At the same time it forced the design team to resort to other means of reducing power.  
 
  
Up to Bonnell, all of Intel's existing architectures put very low priority on power efficiency (note that this has significantly changed since the introduction of {{\\|Sandy Bridge}}). High-performance, high-throughput, complex designs are simply inadequate for the kind of power goals required out of Bonnell, even if they were trimmed down. It was decided that Bonnell would be designed from the scratch with power goals in mind. For those reasons Bonnell resembles the {{\\|P5}} microarchitecture.
+
Part of the design requirement was that Bonnell retain full [[x86]] compatibility, up to the latest extension - at the 10th of the power consumption of the {{\\|Pentium M}}. This meant any software is now 100% compatible but it forced engineers to deal with all the baggage the architecture brought along. The decision to offer full compatibility brought its own set of benefits such as access to the largest software code base in the world, including the ability to run any other [[x86]] operating system unmodified. At the same time it forced the design team to resort to other means of reducing power.  
  
 +
Up to Bonnell, all of Intel's existing architectures put very low priority on power efficiency (note that this has significantly changed since the introduction of {{\\|Sandy Bridge}}). High-performance, high-throughput, complex designs are simply inadequate for the kind of power goals required out of Bonnell, even if they were trimmed down. It was decided that Bonnel would be designed from the scratch with power goals in mind. For those reasons Bonnell resembles the {{\\|P5}} microarchitecture.
 
=== Pipeline ===
 
=== Pipeline ===
 
Much like the original {{\\|P5}} microarchitecture, Bonnell consists of an [[in-order]] [[dual-issue]] pipeline. The pipeline is shown below. Note the pipeline is duplicated for dual-issue execution.  
 
Much like the original {{\\|P5}} microarchitecture, Bonnell consists of an [[in-order]] [[dual-issue]] pipeline. The pipeline is shown below. Note the pipeline is duplicated for dual-issue execution.  
Line 284: Line 242:
  
  
Unlike {{\\|P5}}, which only had 5 stages, Bonnell has 16 to 19 pipeline stages. The longer pipeline allows a more even spreading of heat across the chip with more units. This also allows a higher clock rate.  
+
Unlike {{\\|P5}}, which only had 5 stages, Bonnell has 16 to 19 stages pipeline. The longer pipeline allows a more evenly spreading of heat across the chip with more units. This also allows a higher clock rate.  
  
 
==== Front End ====
 
==== Front End ====
Bonnell's front end is very simple when compared to Intel's high-performance architectures. [[Out-of-order execution]] (OoOE) that is found ubiquitously in all HPC architectures was rejected. Bonnell's power and area constraints simply couldn't allow for the complex logic needed to support that capability. The [[Instruction Fetch]] consists of 3 stages, capable of going through up to 16 bytes per cycle. Like fetch, the [[Instruction Decode]] is also 3 stages, capable of decording instructions with up to 3 prefixes each cycle (considerably longer for more complex instructions).
+
Bonnell's front end is very simple when compared to Intel's high-performance architectures. [[Out-of-order execution]] (OoOE) that is found ubiquitously in all HPC architectures was rejected. Bonnell's power and area constraints simply couldn't allow for the complex logic needed to support that capability. The [[Instruction Fetch]] consists of 3 stages capable going through up to 8 bytes per cycle (with a lower amount if SMT is enabled). Like fetch, the [[Instruction Decode]] is also 3 stages capable of decording instructions with up to 3 prefixes each cycle (considerably longer for more complex instructions).
  
 
Bonnell is a departure from all modern x86 architectures with respect to decoding (including those developed by [[AMD]] and [[VIA]] and every Intel architecture since {{\\|P6}}). Whereas modern architectures transform complex [[x86]] instructions into a more easily digestible µop form, Bonnell does almost no such transformations. The pipeline is tailored to execute regular x86 instructions as single atomic operations consisting of a single destination register and up to three source-registers (typical load-operate-store format). Most instructions actually correspond very closely to the original x86 instructions. This design choice results in lower complexity but at the cost of performance reduction. Bonnell has two identical decoders capable of decoding complex x86 instructions. Being variable length instruction architecture introduces an additional layer of complexity. To assist the decoders, Bonnell implements predecoders that determine instruction boundaries and mark them using a single-bit marker. Two cycles are allocated for predecoding as well as L1 storage. Boundary marks are also stored in the L1 eliminating the need to preform needlessly redundant predecoding. Repeated operations are retrieved pre-marked eliminating two cycles. Bonnel has a 36 KiB L1 instruction cache consisting of 32 KiB instruction cache and 4 KiB instruction boundary mark cache. All instructions (coming from both cache or predecode) must undergo full decode. It's worthwhile noting that Intel states Bonnell is a 16-stage pipeline because for the most part, after a cache hit you'll have 16 stages. This is also true in some cases where the processor can simultaneously decode the next instruction. However, in the cases where you get a miss, it will cost 3 additional stages to catch up and locate the boundary for that instruction for a total of 19 stages.
 
Bonnell is a departure from all modern x86 architectures with respect to decoding (including those developed by [[AMD]] and [[VIA]] and every Intel architecture since {{\\|P6}}). Whereas modern architectures transform complex [[x86]] instructions into a more easily digestible µop form, Bonnell does almost no such transformations. The pipeline is tailored to execute regular x86 instructions as single atomic operations consisting of a single destination register and up to three source-registers (typical load-operate-store format). Most instructions actually correspond very closely to the original x86 instructions. This design choice results in lower complexity but at the cost of performance reduction. Bonnell has two identical decoders capable of decoding complex x86 instructions. Being variable length instruction architecture introduces an additional layer of complexity. To assist the decoders, Bonnell implements predecoders that determine instruction boundaries and mark them using a single-bit marker. Two cycles are allocated for predecoding as well as L1 storage. Boundary marks are also stored in the L1 eliminating the need to preform needlessly redundant predecoding. Repeated operations are retrieved pre-marked eliminating two cycles. Bonnel has a 36 KiB L1 instruction cache consisting of 32 KiB instruction cache and 4 KiB instruction boundary mark cache. All instructions (coming from both cache or predecode) must undergo full decode. It's worthwhile noting that Intel states Bonnell is a 16-stage pipeline because for the most part, after a cache hit you'll have 16 stages. This is also true in some cases where the processor can simultaneously decode the next instruction. However, in the cases where you get a miss, it will cost 3 additional stages to catch up and locate the boundary for that instruction for a total of 19 stages.
  
Some x86 instructions are simply too complex to handle directly. Those selected few get diverted into the '''micro-code sequencer ROM''' ('''MSROM''') for decoding producing much more sane RISCish instructions at the cost of 2 additional cycles. Intel estimates that only 5% of common software require instructions to be split up. Only decoder0 can request transfer to use the MSROM. All instructions longer than 8 bytes or instructions having more than three prefixes will result in a MSROM transfer unconditionally. Those instructions will experience two cycles of delay. The inability to execute things [[out-of-order]] eliminates lots of optimization opportunities at this stage. One thing Bonnell can do is lockstep instructions that can be execute simultaneously such as in the case of instructions that performance a memory access along an arithmetic operation. In those instances Bonnell will issue the instruction as if it were two separate instructions executing simultaneously. In addition, only one [[x87]] instruction can be decoded per cycle.
+
Some x86 instructions are simply too complex to handle directly. Those selected few get diverted into the microcode sequencer for decoding producing much more sane RISCish instructions at the cost of 2 additional cycles. Intel estimates that only 5% of common software require instructions to be split up. The inability to execute things [[out-of-order]] eliminates lots of optimization opportunities at this stage. One thing Bonnell can do is lockstep instructions that can be execute simultaneously such as in the case of instructions that performance a memory access along an arithmetic operation. In those instances Bonnell will issue the instruction as if it were two separate instructions executing simultaneously.
  
 
Because Bonnell has support for {{intel|Hyper-Threading}}, Intel's brand name for their own [[simultaneous multithreading]] technology, a number of modifications had to be done. The [[prefetch buffer]] and the [[instruction queue]] have been duplicated for each thread.
 
Because Bonnell has support for {{intel|Hyper-Threading}}, Intel's brand name for their own [[simultaneous multithreading]] technology, a number of modifications had to be done. The [[prefetch buffer]] and the [[instruction queue]] have been duplicated for each thread.
Line 303: Line 261:
  
 
===== FP/SIMD execution Cluster =====
 
===== FP/SIMD execution Cluster =====
{| class="wikitable" style="text-align: center; float: right;"
 
! colspan="2" | SIMD/FP Execution Cluster Ports
 
|-
 
! Port 0 !! Port 1
 
|-
 
| SIMD ALU<br>(128-bit / 64-bit int) || SIMD ALU<br>(128-bit)
 
|-
 
| Shuffle unit<br>(128-bit / 64-bit int) || FP Adder
 
|-
 
| SIMD/FP multiply unit<br>(128-bit / 64-bit int)
 
|-
 
| Divide unit (support IMUL, IDIV)
 
|}
 
 
In the further pursuit of power saving specialized execution units were minimized as much as possible. Bonnell's [[floating point]] & SIMD execution cluster does most of the heavy lifting. It features a 128 bit [[SIMD]] [[integer]] path containing 2 SIMD [[ALU]]s and 1 [[shuffle unit]]. Bonnell's [[SIMD]] [[integer]] [[multiplier]] and [[floating point]] [[divider]] are also responsible for the scalar integer multiply and integer divider operations. Additionally the cluster includes a 64 bit FP & SIMD integer multipliers and a 128 bit FP adder.
 
In the further pursuit of power saving specialized execution units were minimized as much as possible. Bonnell's [[floating point]] & SIMD execution cluster does most of the heavy lifting. It features a 128 bit [[SIMD]] [[integer]] path containing 2 SIMD [[ALU]]s and 1 [[shuffle unit]]. Bonnell's [[SIMD]] [[integer]] [[multiplier]] and [[floating point]] [[divider]] are also responsible for the scalar integer multiply and integer divider operations. Additionally the cluster includes a 64 bit FP & SIMD integer multipliers and a 128 bit FP adder.
  
Additionally, this cluster contains a '''Safe Instruction Recognition''' ('''SIR''') unit responsible for supporing out-of-order commits. The idea behind the SIR unit is fairly simple, when conditions are met (i.e, when there are no inter-dependency between varying latency instructions) the two instructions will execute simultaneously allowing the shorter latency instruction to execute and finish before a possibly longer latency floating point operation ends. This algorithm reduces needless stalls that plagues traditional in-order pipelines.
+
Additionally, this cluster contains a '''Safe Instruction Recognition''' ('''SIR''') unit responsible for supporing out-of-order commits.
  
{{clear}}
 
 
===== Integer Execution Cluster =====
 
===== Integer Execution Cluster =====
{| class="wikitable" style="text-align: center; float: left;"
+
The integer execution cluster contains two [[ALU]]s, a [[shifter]], and a [[jump]] execution unit capable of performing single-cycle 64 bit [[integer]] operations.
! colspan="2" | SIMD/FP Execution Cluster Ports
 
|-
 
! Port 0 !! Port 1
 
|-
 
| Load/Store || Jump unite and LEA
 
|-
 
| ALU0 || ALU1
 
|-
 
| Shift/Rotate unit || Bit processing unit
 
|}
 
The integer execution cluster contains two [[ALU]]s, a [[shifter]], and a [[jump]] execution unit capable of performing single-cycle 64 bit [[integer]] operations. The Integer cluster has store-forwarding support allowing for a 0-cycle latency effective load-to-use.
 
  
{{clear}}
 
 
===== Memory Subsystem =====
 
===== Memory Subsystem =====
Bonnell has two [[address generation units]] (AGUs). For data, there is 24 [[KiB]] [[write-back]] L1 cache with a 2-level DTLB hierarchy, hardware page walker, and an integer store-to-load forwarding support. Additionally, there is a rather large 512 KiB [[L2 cache]] with inline ECC and [[hardware pre-fetchers]]. The tag, [[Least Recently Used|LRU]], and the state bits are all stored in a single array to minimize area. The tag and data consist of 8 4.5 KiB tag sub arrays and 32 17.5 KiB data sub-arrays made of 256 cells on the bit line and 136 cells on the write line.
+
Bonnell has two [[address generation units]] (AGUs). For data, there is 24 [[KiB]] [[write-back]] L1 cache with a 2-level DTLB hierarchy, hardware page walker, and an integer store-to-load forwarding support. Additionally, there is a rather large 512 KiB [[L2 cache]] with inline ECC and [[hardware pre-fetchers]]. The tag, [[Least Recently Used|LRU]], and the state bits are all stored in a single array to minimize area. The tag and data sub-arrays are 4.5 KiB and 17.5 KiB respectively, with 256 cells per column.
  
As a power-saving feature, the L2 cache can be configured down to 2-way dynamically (i.e. programmatically) for applications that do not require the full performance. Doing so reduces the power and downsizes the cache to 128 KiB. Additionally, for less demanding tasks, the Bonnell power gates unused ways. It's interesting to note that the design team placed the TAG blocks at the bottom of the DATA arrays allowing, in theory, to expend the L2 to 1 MiB should they want to.
+
As a power-saving feature, the L2 cache can be configured down to 2-way dynamically. Additionally, for less demanding tasks, the Bonnell power gates unused ways.
  
 
=== I/O Bus ===
 
=== I/O Bus ===
Line 346: Line 278:
 
Traditionally, Intel has been using A[[GTL]]+ transceivers (Advanced [[Gunning Transceiver Logic]]) for their [[front-side bus]] communication. With bonnell (and the [[chipset]]) Intel also introduced a [[CMOS]] signaling logic mode. CMOS has the advantage of only drawing power during transition. The switch to CMOS saves 200-500 mW at the cost of worse latency and slower bus which ranges from 400 to 533 MHz. Bonnell's intended applications is not heavy processing machine, the lower bus speed was likely a worthy compromise.
 
Traditionally, Intel has been using A[[GTL]]+ transceivers (Advanced [[Gunning Transceiver Logic]]) for their [[front-side bus]] communication. With bonnell (and the [[chipset]]) Intel also introduced a [[CMOS]] signaling logic mode. CMOS has the advantage of only drawing power during transition. The switch to CMOS saves 200-500 mW at the cost of worse latency and slower bus which ranges from 400 to 533 MHz. Bonnell's intended applications is not heavy processing machine, the lower bus speed was likely a worthy compromise.
  
Bonnell implements both mode, so designers who prefer the faster [[front-side bus|bus]] can opt for the traditional AGTL+ transceivers while those who seek low power can opt for the CMOS implementation. Intel offers both types by simply fusing the appropriate circuitry. This is done by reprogramming the NFET control pull-down and the PFET control accordingly, activating or deactivating the resistor and switching the voltage.
+
Bonnell implements both mode, so designers who prefer the faster [[front-side bus|bus]] can opt for the traditional AGTL+ transceivers while those who seek low power can opt for the CMOS implementation. Intel offers both types by simply fusing the appropriate circuitry. This is done by reprogramming the NFET control pull-down and the PFET control accordingly.
 
[[File:bonnell split power planes.png|left|400px]]
 
[[File:bonnell split power planes.png|left|400px]]
 
Note that during deep sleep, the design team designed the power rails using two power planes. To further save power, only keeping 21 pins are kept alive, reducing the average power by another 10% while killing off 182 of the other I/O which are not necessary for that state.
 
Note that during deep sleep, the design team designed the power rails using two power planes. To further save power, only keeping 21 pins are kept alive, reducing the average power by another 10% while killing off 182 of the other I/O which are not necessary for that state.
Line 370: Line 302:
  
 
Intel estimates C-6 residency to be between 80% and 90% resulting in an average power in the order of 220 mW. Likewise Idle power, which is dominated by leakage power of the functional units, is below 80 mW.
 
Intel estimates C-6 residency to be between 80% and 90% resulting in an average power in the order of 220 mW. Likewise Idle power, which is dominated by leakage power of the functional units, is below 80 mW.
 
{{clear}}
 
 
=== Modularity ===
 
Bonnell is a highly modular architecture with almost all features disableable via built-in fuses allowing for many [[binning]] variation. Both virtualization support (VT-x/d) and {{intel|Hyper-Threading}} may be disabled to cut on power. Bonnell implements both AGTL+ and CMOS transceiver logic for the [[front-side bus]] signaling with either one capable of being fused off. CMOS signaling allows for lower power but cannot reach the high bug speeds that AGTL+ can. This may or maybe not be a restriction that system designers might face.
 
 
== Second Generation Enhancements ==
 
[[File:lincroft goals.png|left|200px]]
 
[[File:bonnell system board size goals.png|right|300px]]
 
With the introduction of {{intel|Lincroft|l=core}}, Intel has made substantial improvements the overall platform. The {{intel|Silverthorne|l=core}}-based systems had a great core in terms of power and performance, but they were drugged behind when was combined with far less efficient chipset and system design. These deficiencies were addressed in the second generation of Bonnell-based models.
 
 
The first variant was {{intel|Lincroft|l=core}} which set out to reduce the original system standby power of 1.6 W down to 32 mW (a 50x reduction) while reducing the overall board size by 2x. To achieve those goals Intel turned to higher integration, moving [[integrated graphics|Graphics]], CPU core, Video Acceleration, [[Display Controller]], and [[Memory Controller]] all in a single [[system on a chip]]. Those components were previously incorporated on the [[130 nm process]] chipset. This leaves the {{intel|Langwell|l=chipset}} chipset with just the low-power [[southbridge]] functionalities. The new chipset is also manufactured on a considerably better [[65 nm process]]
 
 
=== Performance Features ===
 
To address the higher performance goals, Intel introduced a number of new features into Lincroft including '''Bus Turbo Mode''' and '''Burst Mode'''.
 
 
==== Clock Domains ====
 
Each of {{intel|Lincroft|l=core}}'s multimedia engines are assigned a specific clock ratios and using a farm of clock dividers and clock selectors the appropriate clocks get generated to the individual multimedia engines. The complex clocking architecture implemented in Lincroft was designed to allow greater flexibility and a wider range of devices. This is done by simply tweaking the appropriate ratios for each engine based on the desired performance and power goals.
 
 
[[File:lincroft clock domains.png|600px]]
 
 
==== Bus Turbo Mode & Burst Mode ====
 
Intel also introduced '''Burst Turbo Mode''', a feature designed to reduce memory latency by dynamically increasing bus frequencies in sync with CPU bursts. At pre-defined CPU frequencies, the bus gets dynamically overclocked to reduce the bottlenecking that might occur. This is implemented directly in hardware using the [[clock dividers]] (see [[#Clock Domains|§ Clock Domains]]) without the need to re-clock the PLLs.
 
 
Another feature that was introduced was '''Burst Mode''', the ability for the CPU to opportunistically take advantage of the thermal headroom on the T<sub>junction</sub> and T<sub>skin</sub> by temporarily increasing the CPU frequency. Upon violation of T<sub>junction</sub>/T<sub>skin</sub>, the system throttles down back to recovery points (LFM [[c-state]]).
 
 
=== Low-power features ===
 
In order to further reduce power Intel introduced a number of new features:
 
 
* Low power architecture features
 
** [[MIPI-DSI]]
 
** [[LP-DDR1]]
 
** Integrated Hardware accelerators for Video Encode/Code
 
* Enhanced Geyserville for ULFM
 
* Extended CPU Power [[C-States]]
 
* Distributed [[Power Gating]]
 
 
==== Enhanced Geyserville (eGVL) ====
 
[[File:lincroft new egvl mode.png|right|200px]]
 
'''Enhanced Geyserville''' is a new mode that allows the CPU to run below LFM at V<sub>min</sub>. This enables linear saving of average power during instances where the CPU is idle while in C0 [[C-State]] (cV²F, note that leakage is mostly a constant due to V=V<sub>min</sub> the entire time). Equivalent, the bus frequency is also down-clocked at predefined frequencies (see [[#Bus Turbo Mode & Burst Mode|§ Bus Turbo Mode]]). The additional ultra low-power mode is exposed as a [[P-State]] to the [[operating system]].
 
 
Below is the C-State chart with the additional Ultra-low LFM state added, enabling further decrease in average power consumption.
 
 
[[File:lincroft extended c-states.png|400px]]
 
 
==== Extensive power-gating ====
 
Lincroft introduces an extensive system of power-gating. The entire SoC is divided up into multiple physical power islands. Each island can be individually controlled through a distributed power-gating system. Lincroft allows for a fine-grained management of power through both hardware and software to be able to disable areas of the chip that are not being actively utilized.
 
 
 
<div style="display: inline-block;">
 
<div style="float: left; margin: 10px;">[[File:lincroft all off.png|300px]]</div>
 
<div style="float: left; margin: 10px;">[[File:lincroft all on.png|300px]]</div>
 
</div>
 
  
 
== Die ==
 
== Die ==
 
=== {{intel|Silverthorne|l=core}} ===
 
 
* [[45 nm process]]
 
* [[45 nm process]]
 
* 9 metal layers
 
* 9 metal layers
 
* 47,212,207 transistors
 
* 47,212,207 transistors
 
* 3.1 mm x 7.8 mm
 
* 3.1 mm x 7.8 mm
* 24.18 mm² die size
+
* 24.2 mm² die size
* packaged in a Halide-Free 441 ball, 14 mm x 13 mm µFCBGA
+
* packaged in a Halide-Free 441 ball, 14 mm x 13 mm² µFCBGA
  
 
[[File:Silverthorne die shot.jpg|1100px]]
 
[[File:Silverthorne die shot.jpg|1100px]]
  
[[File:Silverthorne die shot 2.jpg|1100px]]
 
  
 
[[File:Silverthorne die shot (marked).png|1100px]]
 
[[File:Silverthorne die shot (marked).png|1100px]]
Line 449: Line 325:
 
* '''FSB''' - Front Side Bus
 
* '''FSB''' - Front Side Bus
  
==== Physical layout ====
+
=== Physical layout ===
 
[[File:bonnell die size areas.svg|right|500px]][[File:bonnell die size areas 2.svg|right|500px]]
 
[[File:bonnell die size areas.svg|right|500px]][[File:bonnell die size areas 2.svg|right|500px]]
The Atom design team was considerably smaller than Intel's typical design teams which forced them to work in a slightly different way. The design team used a methodology they described as a "sea of Functional Unit Block" (FUBs) where by all cluster hierarchies (including unit-level hierarchies) are flattened at the chip level. This development methodology allowed for faster iteration. The various FUB designs were divided among the team members allowing them to handle the design in a more manageable way. All in all, Bonnell's physical database consisted of 205 unique FUBs interlinked via 41,000 FUB-to-FUB interconnects. Bonnell is manufactured on [[Intel]]'s [[45 nm process]]. 91% of the FUBs using pre-characterized [[standard cells]] (45% structured data-path and 46% fully synthesized random logic blocks) with only the remaining 9% being [[full-custom]] blocks. The unusually high utilization of standard cells (at least for Intel) is likely due to the limited resources given to the Bonnell design team.
+
The Atom design team was considerably smaller than Intel's typical design teams which forced them to work in a slightly different way. The design team used a methodology they described as a "sea of Functional Unit Block" (FUBs) where by all cluster hierarchies (including unit-level hierarchies) are flattened at the chip level. This development methodology allowed for faster iteration. The various FUB designs were divided among the team members allowing them to handle the design in a more manageable way. All in all, Bonnell's physical database consisted of 205 unique FUBs interlinked via 41,000 FUB-to-FUB interconnects. Bonnell is manufactured on [[Intel]]'s [[45 nm process]]. 91% of the FUBs using pre-characterized [[standard cells]] (45% structured data-path and 46% fully synthesized random logic blocks) with only the remaining 9% being [[full-custom]] blocks.
  
 
{| class="wikitable sortable"
 
{| class="wikitable sortable"
Line 484: Line 360:
  
 
{{clear}}
 
{{clear}}
 
=== {{intel|Lincroft|l=core}} ===
 
* [[45 nm process]]
 
 
==== Moorestown Platform ====
 
* [[45 nm process]]
 
* 140,000,000
 
* Die size 7.34 mm × 8.89 mm
 
* Size area 65.2526 mm²
 
[[File:lincroft die shot.png|750px]]
 
 
[[File:lincroft die shot (annotated).png|750px]]
 
 
 
[[File:lincroft die shot 2.png|750px]]
 
 
[[File:lincroft die shot 2 (annotated).png|750px]]
 
 
==== Oak Trail Platform ====
 
 
[[File:lincroft oak trail die shot.png|750px]]
 
  
 
== Cores ==
 
== Cores ==
Bonnell has lived through a number of iterations unlike the mainstream variants which followed a {{intel|tick-tock|far more ambitious development cycle}}. Products based on Bonnell can more or less be split into two generations:
+
=== First Generation===
 
+
First generation of Bonnell-based microprocessors introduced 2 cores: '''{{intel|Silverthorne}}''' for ultra-mobile PCs and mobile Internet devices (MIDs) and '''{{intel|Diamondville}}''' for ultra cheap notebooks and desktops.
* '''First Generation''' - initial Bonnell processor models. Those relied on a number of external chipset chips for the I/O, graphics, and various other system features.
+
==== Silverthorne ====
* '''Second Generation''' - considerably higher integration was introduced. The original CPU was not incorporated along with many of its peripheral on a single chip to create a [[System on a Chip]].
+
{{main|intel/silverthorne|l1=Silverthorne}}
 
+
'''Silverthorne''' was the codename for a series of Mobile Internet Devices (MIDs) introduced in 2008. These processors had 1 core and 2 threads with a FSB operating at 400 MHz-533 MHz.
=== First generation ===
+
==== Diamondville ====
First generation of Bonnell-based microprocessors introduced 2 cores: '''{{intel|Silverthorne|l=core}}''' for ultra-mobile PCs and mobile Internet devices (MIDs) and '''{{intel|Diamondville}}''' for ultra cheap notebooks and desktops.  
+
{{main|intel/diamondville|l1=Diamondville}}
 
+
'''Diamondville''' was the codename for the series of ultra cheap notebooks and desktops introduced in 2008. Diamondville is very much a soldered-on-motherboard derivative of {{intel|Silverthorne}} with faster FSB (operating at 533 MHz - 667 MHz). The dual-core version is an MCM (Multi Chip Module) Silverthorne variant.
* '''{{intel|Silverthorne|l=core}}''' was the codename for a series of Mobile Internet Devices (MIDs) introduced in 2008. These processors had 1 core and 2 threads with a FSB operating at 400 MHz-533 MHz. Those models were branded as {{intel|Atom}} MIDs and went along with the {{intel|poulsbo|l=chipset}} chipset.
 
* '''{{intel|Diamondville|l=core}}''' was the codename for the series of ultra cheap notebooks and desktops introduced in 2008. Diamondville is very much a soldered-on-motherboard derivative of {{intel|Silverthorne|l=core}} with faster FSB (operating at 533 MHz - 667 MHz). The dual-core version is a [[Multi Chip Module]] (MCM) Silverthorne variant operating on the same [[FSB]].
 
 
 
 
=== Second Generation ===
 
=== Second Generation ===
 
First generation of Bonnell-based microprocessors while being low power had to work with the older [[90 nm process]] {{intel|945GSE}} chipset and {{intel|82801GBM}} I/O controller with a TDP of almost 9.5 watts - almost 4 times that of the processor itself. Second generation Bonnell-based microprocessors aimed to address this issue by integrating a memory controller and GPU on-chip. This drastically reduced power consumption and cost.
 
First generation of Bonnell-based microprocessors while being low power had to work with the older [[90 nm process]] {{intel|945GSE}} chipset and {{intel|82801GBM}} I/O controller with a TDP of almost 9.5 watts - almost 4 times that of the processor itself. Second generation Bonnell-based microprocessors aimed to address this issue by integrating a memory controller and GPU on-chip. This drastically reduced power consumption and cost.
 
+
==== Lincroft ====
* '''{{intel|Lincroft|l=core}}''' is the codename for Bonnell-based Silverthorne's successor. Lincroft integrates on-die the graphics and memory controller. Lincroft effectively replaces the original Silverthorne offering 2x reduction in average circuit board size and up to 50x standby power reduction vs Menlow equivalent. Lincroft also introduces a 2x reduction in the overall active power consumption of the system.
+
{{main|intel/lincroft|l1=Lincroft}}
 
+
'''Lincroft''' is the codename for Bonnell-based Silverthorne's successor. Lincroft integrates on-die the graphics and memory controller.
 
==== Pineview ====
 
==== Pineview ====
 
{{main|intel/pineview|l1=Pineview}}
 
{{main|intel/pineview|l1=Pineview}}
Line 548: Line 400:
 
           Missing a chip? please dump its name here: http://en.wikichip.org/wiki/WikiChip:wanted_chips
 
           Missing a chip? please dump its name here: http://en.wikichip.org/wiki/WikiChip:wanted_chips
 
-->
 
-->
{{comp table start}}
+
<table class="wikitable sortable">
<table class="comptable sortable tc13 tc14 tc15 tc16 tc17 tc18 tc19 tc20 tc21 tc22">
+
<tr><th colspan="11" style="background:#D6D6FF;">Bonnell Chips</th></tr>
<tr class="comptable-header"><th>&nbsp;</th><th colspan="20">List of Bonnell-based Processors</th></tr>
+
<tr><th colspan="8">CPU</th><th colspan="3">IGP</th></tr>
<tr class="comptable-header"><th>&nbsp;</th><th colspan="9">Main processor</th><th colspan="2">Bus</th><th colspan="2">[[IGP]]</th><th colspan="4">Features</th></tr>
+
<tr><th>Model</th><th>µarch</th><th>Platform</th><th>Core</th><th>Launched</th><th>SDP</th><th>Freq</th><th>Max Mem</th><th>Name</th><th>Freq</th><th>Max Freq</th></tr>
{{comp table header 1|cols=Price, Core, Launched, C, T, Freq, Burst, TDP, SDP, Speed, Rate, Name, Frequency, Package, {{intel|Hyper-Threading|HT}}, VT-x, {{intel|EIST}}}}
+
{{#ask: [[Category:microprocessor models by intel]] [[microarchitecture::Bonnell]]
{{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Bonnell]]
 
 
  |?full page name
 
  |?full page name
 
  |?model number
 
  |?model number
  |?release price
+
  |?microarchitecture
 +
|?platform
 
  |?core name
 
  |?core name
 
  |?first launched
 
  |?first launched
  |?core count
+
  |?sdp
|?thread count
+
  |?base frequency
  |?base frequency#GHz
+
  |?max memory
|?turbo frequency (1 core)#GHz
 
|?tdp#mW
 
|?sdp#mW
 
|?bus speed
 
  |?bus rate
 
 
  |?integrated gpu
 
  |?integrated gpu
 
  |?integrated gpu base frequency
 
  |?integrated gpu base frequency
  |?package
+
  |?integrated gpu max frequency
|?has simultaneous multithreading
 
|?has intel vt-x technology
 
|?has intel enhanced speedstep technology
 
 
  |format=template
 
  |format=template
  |template=proc table 3
+
  |template=proc table 2
  |userparam=19:17
+
  |userparam=12
 
  |mainlabel=-
 
  |mainlabel=-
 
}}
 
}}
{{comp table count|ask=[[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Bonnell]]}}
 
 
</table>
 
</table>
{{comp table end}}
 
 
== Documents ==
 
* [[:File:Menlow Platform.pdf|Menlow Platform]] presentation
 
* [[:File:nettops 2008 platform.pdf|Nettops 2008 platform]]
 
 
== Artwork ==
 
<gallery>
 
File:Menlo with Penny Comparison.jpg|Menlo Platform with penny comparison
 
File:Silverthorne package.jpg
 
File:Silverthorne die on Tukwila wafer.jpg|Silverthorne die on a {{intel|Tukwila}} (Itanium) wafer comparison
 
File:Silverthorne glamour shot1.jpg|Silverthorne dice on a whole wafer
 
File:Silverthorne glamour shot2.jpg|Silverthorne
 
File:Silverthorne glamour shot3.jpg|Silverthorne
 
File:Silverthorne wafer shot.jpg|Silverthorne
 
File:Silverthorne wafer shot1.jpg|Silverthorne
 
File:Silverthorne wafer shot2.jpg|Silverthorne
 
File:Silverthorne wafer shot3.jpg|Silverthorne
 
File:Silverthorne wafer shot with needle.jpg|Silverthorne
 
</gallery>
 
  
 
== References ==
 
== References ==
Line 607: Line 430:
 
* Taufique, Mohammed H., et al. "A 512-KB level-2 cache design in 45-nm for low power IA processor silverthorne." Custom Integrated Circuits Conference, 2008. CICC 2008. IEEE. IEEE, 2008.
 
* Taufique, Mohammed H., et al. "A 512-KB level-2 cache design in 45-nm for low power IA processor silverthorne." Custom Integrated Circuits Conference, 2008. CICC 2008. IEEE. IEEE, 2008.
 
* Wang, Perry H., et al. "Intel® atom™ processor core made FPGA-synthesizable." Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 2009.
 
* Wang, Perry H., et al. "Intel® atom™ processor core made FPGA-synthesizable." Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 2009.
* Corporation, Intel. "Intel 64 and IA-32 architectures optimization reference manual." (2009).
 
* Beavers, Brad. "The story behind the Intel Atom processor success." IEEE Design & Test of Computers 26.2 (2009).
 
 
== See also ==
 
* Marvell's {{marvell|Sheeva PJ1|l=arch}}
 
* ARM's {{arm|ARM11|l=arch}}
 

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)