From WikiChip
Editing intel/microarchitectures/sandy bridge (client)
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 18: | Line 18: | ||
|stages max=19 | |stages max=19 | ||
|decode=4-way | |decode=4-way | ||
− | |isa=x86-64 | + | |isa=x86-16 |
+ | |isa 2=x86-32 | ||
+ | |isa 3=x86-64 | ||
|predecessor=Westmere | |predecessor=Westmere | ||
|predecessor link=intel/microarchitectures/westmere | |predecessor link=intel/microarchitectures/westmere | ||
− | |||
− | |||
|successor=Ivy Bridge | |successor=Ivy Bridge | ||
|successor link=intel/microarchitectures/ivy bridge | |successor link=intel/microarchitectures/ivy bridge | ||
Line 28: | Line 28: | ||
|contemporary link=intel/microarchitectures/sandy bridge (server) | |contemporary link=intel/microarchitectures/sandy bridge (server) | ||
}} | }} | ||
− | '''Sandy Bridge''' ('''SNB''') '''Client Configuration''', formerly '''Gesher''', is [[Intel]]'s successor to {{\\| | + | '''Sandy Bridge''' ('''SNB''') '''Client Configuration''', formerly '''Gesher''', is [[Intel]]'s successor to {{\\|Nehalem}}, a [[32 nm process]] [[microarchitecture]] for mainstream workstations, desktops, and mobile devices. Sandy Bridge is the "Tock" phase as part of Intel's {{intel|Tick-Tock}} model which added a significant number of enhancements and features. The microarchitecture was developed by Intel's R&D center in [[wikipedia:Haifa, Israel|Haifa, Israel]]. |
For desktop and mobile, Sandy Bridge is branded as 2nd Generation Intel {{intel|Core i3}}, {{intel|Core i5}}, {{intel|Core i7}} processors. For workstations it's branded as first generation {{intel|Xeon E3}}. | For desktop and mobile, Sandy Bridge is branded as 2nd Generation Intel {{intel|Core i3}}, {{intel|Core i5}}, {{intel|Core i7}} processors. For workstations it's branded as first generation {{intel|Xeon E3}}. | ||
Line 42: | Line 42: | ||
== Codenames == | == Codenames == | ||
− | { | + | {{empty section}} |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Process Technology == | == Process Technology == | ||
[[File:sandy bridge wafer.jpg|right|thumb|300px|Sandy Bridge Wafer]] | [[File:sandy bridge wafer.jpg|right|thumb|300px|Sandy Bridge Wafer]] | ||
− | {{ | + | {{main|intel/microarchitectures/westmere#Process_Technology|l1=Westmere § Process Technology}} |
Sandy Bridge uses the same [[32 nm process]] used for the Westmere microarchitecture for all mainstream consumer parts. | Sandy Bridge uses the same [[32 nm process]] used for the Westmere microarchitecture for all mainstream consumer parts. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Architecture == | == Architecture == | ||
− | Sandy Bridge features an entirely new architecture with a brand new core design which is both more | + | Sandy Bridge's features an entirely new architecture with a brand new core design which is both more performent and more power efficient. The front-end has been entirely rearchitected to incorporate a new decoded pipeline using a new µOP cache. The back-end is an entirely new PRF-based renaming architecture with a considerably large parallelism window. Sandy Bridge also provides considerable higher integration verses its {{intel|microarchitectures|predecessors}} resulting a full [[system on a chip]] design. |
=== Key changes from {{\\|Westmere}} === | === Key changes from {{\\|Westmere}} === | ||
[[File:sandy bridge buffer window.png|right|350px]] | [[File:sandy bridge buffer window.png|right|350px]] | ||
Line 133: | Line 57: | ||
* New last level cache architecture | * New last level cache architecture | ||
** Multi-bank LLC/Agent architecture | ** Multi-bank LLC/Agent architecture | ||
− | |||
* New {{intel|System Agent}} architecture | * New {{intel|System Agent}} architecture | ||
− | |||
− | |||
− | |||
* Chipset | * Chipset | ||
** {{intel|Ibex Peak|l=chipset}} → {{intel|Cougar Point}} | ** {{intel|Ibex Peak|l=chipset}} → {{intel|Cougar Point}} | ||
Line 158: | Line 78: | ||
**** New Zeroing Idioms optimizations | **** New Zeroing Idioms optimizations | ||
**** New Onces Idioms optimizations | **** New Onces Idioms optimizations | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
** Memory Subsystem | ** Memory Subsystem | ||
*** L1I$ change to 8-way (from 4-way) | *** L1I$ change to 8-way (from 4-way) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
* Integrated Graphics | * Integrated Graphics | ||
** Integrated graphics is now integrated on the same die (previously was on a second die) | ** Integrated graphics is now integrated on the same die (previously was on a second die) | ||
** Dropped {{intel|QPI}} controller which linked the two dies | ** Dropped {{intel|QPI}} controller which linked the two dies | ||
− | * | + | * IMC |
− | |||
− | |||
− | |||
− | |||
** Integrated on-die is now integrated on the same die (previously was on a second die) | ** Integrated on-die is now integrated on the same die (previously was on a second die) | ||
** Dropped {{intel|QPI}} controller which linked the two dies | ** Dropped {{intel|QPI}} controller which linked the two dies | ||
− | + | {{expand section}} | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Block Diagram === | === Block Diagram === | ||
− | ==== | + | ====== Individual Core ====== |
− | |||
− | |||
− | |||
− | |||
− | |||
− | ==== | ||
[[File:sandy bridge block diagram.svg|900px]] | [[File:sandy bridge block diagram.svg|900px]] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==Overview == | ==Overview == | ||
Line 286: | Line 107: | ||
=== Cache Architecture === | === Cache Architecture === | ||
− | As part of the entire system overhaul, the cache architecture has been streamlined and | + | As part of the entire system overhaul, the cache architecture has been streamlined and more scalable. Sandy Bridge features a high-bandwidth [[last level cache]] which is shared by all the [[physical core|cores]] as well as the [[integrated graphics]] and the [[system agent]]. The LLC is an inclusive multi-bank cache architecture that is tightly associative with the individual cores. Each core is paired with a "slice" of LLC which is 2 [[MiB]] in size (lower amount for lower-end models). This pairing of cores and cache slices scales with the number of cores which provides a significant performance boost while saving power and bandwidth. Partitioning the data also helps simplifies coherency as well as reduce localized contentions and hot spots. |
− | |||
− | |||
The last level cache is an inclusive cache with a 64 byte cache line organized as 16-way set associative. Each LLC slice is accessible to all cores. With up to 2 MiB per slice per core, a four-core model will sport a total of 8 MiB. Lower-end/budget models feature a smaller cache slice. This is done by disabling ways of cache in 4-way increments (for a granularity of 512 KiB). The LLC to use latency in Sandy Bridge has been greatly improved from 35-40+ in {{\\|Nehalem}} to 26-31 cycles (depending on ring hops). | The last level cache is an inclusive cache with a 64 byte cache line organized as 16-way set associative. Each LLC slice is accessible to all cores. With up to 2 MiB per slice per core, a four-core model will sport a total of 8 MiB. Lower-end/budget models feature a smaller cache slice. This is done by disabling ways of cache in 4-way increments (for a granularity of 512 KiB). The LLC to use latency in Sandy Bridge has been greatly improved from 35-40+ in {{\\|Nehalem}} to 26-31 cycles (depending on ring hops). | ||
Line 295: | Line 114: | ||
Within each cache slice is cache box. The cache box is the controller and agent serving as the interface between the LLC and the rest of the system (i.e., cores, graphics, and system agent). The cache box implements the ring logic and the address hashing and arbitration. It is also tasked with communicating with the {{intel|System Agent}} on cache misses, non-cacheable accesses (such as in the case of I/O pulls), as well as external I/O [[snoop]] requests. The cache box is fully pipelined and is capable of working on multiple requests at the same time. Agent requests (e.g., ones from the GPU) are handled by the individual cache boxes via the ring. This is much different to how it was previously done in {{\\|Westmere}} where a single unified cache handled everything. The distribution of slices allows Sandy Bridge to have higher associativity bandwidth while reducing traffic. | Within each cache slice is cache box. The cache box is the controller and agent serving as the interface between the LLC and the rest of the system (i.e., cores, graphics, and system agent). The cache box implements the ring logic and the address hashing and arbitration. It is also tasked with communicating with the {{intel|System Agent}} on cache misses, non-cacheable accesses (such as in the case of I/O pulls), as well as external I/O [[snoop]] requests. The cache box is fully pipelined and is capable of working on multiple requests at the same time. Agent requests (e.g., ones from the GPU) are handled by the individual cache boxes via the ring. This is much different to how it was previously done in {{\\|Westmere}} where a single unified cache handled everything. The distribution of slices allows Sandy Bridge to have higher associativity bandwidth while reducing traffic. | ||
− | The entire physical address space is mapped distributively across all the slices using a [[hash function]]. On a [[cache miss|miss]], the core needs | + | The entire physical address space is mapped distributively across all the slices using a [[hash function]]. On a [[cache miss|miss]], the core needs decodes the address to figure out which slice ID to request the data from. Physical addresses are hashed at the source in order to prevent hot spots. The cache box is responsible for the maintaining of coherency and ordering between requests. Because the LLC slices are fully inclusive, it can make efficienct use of an on-die snoop filter. Each slice makes use of {{intel|Core Valid Bits}} (CVB) which is used to eliminate unnecessary snoops to the cores. A single bit per cache line is needed to indicate if the line may be in the core. Snoops are therefore only needed if the line is in the LLC and a CVB is asserted on that line. This mechanism helps limits external snoops to the cache box most of the time without resorting to going to the cores. |
Note that both the cache slices and the cache boxes reside within the same [[clock domain]] as the cores themselves - sharing the same voltage and frequency and scaling along with the cores when needed. | Note that both the cache slices and the cache boxes reside within the same [[clock domain]] as the cores themselves - sharing the same voltage and frequency and scaling along with the cores when needed. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Ring Interconnect === | === Ring Interconnect === | ||
− | In the pursuit of modularity, Sandy Bridge incorporates a new and robust high-bandwidth coherent interconnect that links all the separate components together. The ring is a system of interconnects between the [[physical core|cores]], the [[integrated graphics|graphics]], the [[last level cache]], and the {{intel|System Agent}}. The ring allows Intel to scale up and down efficiently depending on the market segmentation which allows for finer balance of performance, power, and cost. The choice to use a ring makes design and validation easier compared to some of the more complex typologies such as [[packet routing]]. It's also easier configurability-wise | + | In the pursuit of modularity, Sandy Bridge incorporates a new and robust high-bandwidth coherent interconnect that links all the separate components together. The ring is a system of interconnects between the [[physical core|cores]], the [[integrated graphics|graphics]], the [[last level cache]], and the {{intel|System Agent}}. The ring allows Intel to scale up and down efficiently depending on the market segmentation which allows for finer balance of performance, power, and cost. The choice to use a ring makes design and validation easier compared to some of the more complex typologies such as [[packet routing]]. It's also easier configurability-wise. |
Internally, the ring is composed of four physical independent rings which handle the communication and enforce coherency. | Internally, the ring is composed of four physical independent rings which handle the communication and enforce coherency. | ||
Line 317: | Line 129: | ||
[[File:sandy bridge ring flow.svg|right|275px]] | [[File:sandy bridge ring flow.svg|right|275px]] | ||
− | The four rings consists of a considerable amount of wiring and routing. Because the routing runs in the upper metal layers over the LLC, the rings have no real impact on die area. As with the LLC slices, the ring is fully pipelined and operates within the core's [[clock domain]] as it scales with the frequency of the core. The bandwidth of the ring also scales in bandwidth with each additional core/$slice pair that is added onto the ring, however with more cores the ring becomes more congested and adding latency as the average hop count increases. Intel expected the ring to support a fairly large amount of | + | The four rings consists of a considerable amount of wiring and routing. Because the routing runs in the upper metal layers over the LLC, the rings have no real impact on die area. As with the LLC slices, the ring is fully pipelined and operates within the core's [[clock domain]] as it scales with the frequency of the core. The bandwidth of the ring also scales in bandwidth with each additional core/$slice pair that is added onto the ring, however with more cores the ring becomes more congested and adding latency as the average hop count increases. Intel expected the ring to support a fairly large amount of core before facing real performance issues. |
It's important to note that the term ring refers to its structure and not necessarily how the data flows. The ring is not a round-robin and requests may travel up or down as needed. The use of address hashing allows the source agent to know exactly where the destination is. In order reduce latency, the ring is designed such as that all accesses on the ring always picks the shortest path. Because of this aspect of the ring and the fact that some requests can take longer than others to complete, the ring might have requests being handled out of order. It is the responsibility of the source agents to handle the ordering requirements. The ring [[cache coherency]] protocol is largely an enhancement based on Intel's {{intel|QuickPath Interconnect|QPI}} protocols with [[MESI]]-based source snooping protocol. On each cycle, the agents receive an indication whether there is an available slot on the ring for communication in the next cycle. When asserted, the agent can sent any type of communication (e.g. data or snoop) on the ring the following cycle. | It's important to note that the term ring refers to its structure and not necessarily how the data flows. The ring is not a round-robin and requests may travel up or down as needed. The use of address hashing allows the source agent to know exactly where the destination is. In order reduce latency, the ring is designed such as that all accesses on the ring always picks the shortest path. Because of this aspect of the ring and the fact that some requests can take longer than others to complete, the ring might have requests being handled out of order. It is the responsibility of the source agents to handle the ordering requirements. The ring [[cache coherency]] protocol is largely an enhancement based on Intel's {{intel|QuickPath Interconnect|QPI}} protocols with [[MESI]]-based source snooping protocol. On each cycle, the agents receive an indication whether there is an available slot on the ring for communication in the next cycle. When asserted, the agent can sent any type of communication (e.g. data or snoop) on the ring the following cycle. | ||
− | The data ring is 32 bytes meaning each slice can pass half a cache line to the ring each cycle. This means that a [[dual-core]] operating at 4 GHz on both cores will have a | + | The data ring is 32 bytes meaning each slice can pass half a cache line to the ring each cycle. This means that a [[dual-core]] operating at 4 GHz on both cores will have a bandwidth of 256 GB/s. |
=== System Agent === | === System Agent === | ||
Line 327: | Line 139: | ||
The System Agent (SA) is a centralized peripheral device integration unit. It contains what was previously the traditional [[Memory Controller Hub]] (MCH) which includes all the I/O such as the [[PCIe]], {{intel|DMI}}, and others. Additionally the SA incorporates the [[memory controller]] and the display engine which works in tandem with the integrated graphics. The major enabler for the new System Agent is in fact the [[32 nm process]] which allowed for considerably higher integration, over a dozen [[clock domain]]s and [[PHY]]s. | The System Agent (SA) is a centralized peripheral device integration unit. It contains what was previously the traditional [[Memory Controller Hub]] (MCH) which includes all the I/O such as the [[PCIe]], {{intel|DMI}}, and others. Additionally the SA incorporates the [[memory controller]] and the display engine which works in tandem with the integrated graphics. The major enabler for the new System Agent is in fact the [[32 nm process]] which allowed for considerably higher integration, over a dozen [[clock domain]]s and [[PHY]]s. | ||
− | The | + | The system agent interfaces with the rest of the system via the ring in a similar manner to the cache boxes in the LLC slices. It is also in charge of handling I/O to cache coherency. The SA enables [[direct memory access]] (DMA) allows devices to snoop the cache hierarchy. Address conflicts resulting from multiple concurrent requests associated with the same cache line are also handled by the SA. |
− | + | With Sandy Bridge, Intel introduced a large number of power features to save powers depending on the workload, temperature, and what's I/O is being utilized. The various power features are handled at the system agent as well. | |
− | |||
− | |||
− | |||
− | With Sandy Bridge, Intel introduced a large number of power features to save powers depending on the workload, temperature, and what I/O is being utilized. The | ||
− | |||
− | |||
== Core == | == Core == | ||
Line 343: | Line 149: | ||
=== Pipeline === | === Pipeline === | ||
− | The Sandy | + | The Sandy Lake core focuses on extracting performance and reducing power through a great number ways. Intel placed heavy emphasis in the cores on performance enhancing features that can provide more-than-linear performance-to-power ratio as well as features that provide more performance while reducing power. The various enhancements can be found in both the front-end and the back-end of the core. |
==== Broad Overview ==== | ==== Broad Overview ==== | ||
Line 349: | Line 155: | ||
==== Front-end ==== | ==== Front-end ==== | ||
− | The front-end is tasked with the challenge of fetching the complex [[x86]] instructions from memory, decoding them, and delivering them to the execution units. In other words, the front end needs to be able to consistently deliver enough [[µOPs]] from the [[instruction code stream]] to keep the back-end busy. When the back-end is not being fully utilized, the core is not reaching its full performance. A weak or under-performing front-end will directly affect the back-end, resulting in a poorly performing core. In the case of Sandy Bridge base, this challenge is further complicated by various | + | The front-end is is tasked with the challenge of fetching the complex [[x86]] instructions from memory, decoding them, and delivering them to the execution units. In other words, the front end needs to be able to consistently deliver enough [[µOPs]] from the [[instruction code stream]] to keep the back-end busy. When the back-end is not being fully utilized, the core is not reaching its full performance. A weak or under-performing front-end will directly affect the back-end, resulting in a poorly performing core. In the case of Sandy Bridge base, this challenge is further complicated by various redirection such as branches and the complex nature of the [[x86]] instructions themselves. |
− | The entire front-end was redesigned from the ground up in Sandy Bridge. The four major changes in the front-end of Sandy Bridge is the entirely new [[µOP cache]], the overhauled branch predictor and further decoupling of the front-end, and the improved macro-op fusion capabilities. All those features not only improve performance but they also reduce power | + | The entire front-end was redesigned from the ground up in Sandy Bridge. The four major changes in the front-end of Sandy Bridge is the entirely new [[µOP cache]], the overhauled branch predictor and further decoupling of the front-end, and the improved macro-op fusion capabilities. All those features not only improve performance but they also reduce power at the same time. |
===== Fetch & pre-decoding ===== | ===== Fetch & pre-decoding ===== | ||
− | Blocks of memory arrive at the core from either the cache slice or further down [[#Ring_Interconnect|the ring]] from one of the other cache slice. On occasion, far less desirably, from main memory. On their first pass, instructions should have already been prefetched from the [[L2 cache]] and into the [[L1 cache]]. The L1 is a 32 [[KiB]], 64B line, 8-way set associative cache. The [[instruction cache]] is identical in size to that of {{\\|Nehalem}}'s but its associativity was increased to 8-way. Sandy Bridge fetching is done on a 16-byte fetch window. A window size that has not changed in a number of generations. Up to 16 bytes of code can be fetched each cycle. Note that | + | Blocks of memory arrive at the core from either the cache slice or further down [[#Ring_Interconnect|the ring]] from one of the other cache slice. On occasion, far less desirably, from main memory. On their first pass, instructions should have already been prefetched from the [[L2 cache]] and into the [[L1 cache]]. The L1 is a 32 [[KiB]], 64B line, 8-way set associative cache. The [[instruction cache]] is identical in size to that of {{\\|Nehalem}}'s but its associativity was increased to 8-way. Sandy Bridge fetching is done on a 16-byte fetch window. A window size that has not changed in a number of generations. Up to 16 bytes of code can be fetched each cycle. Note that fetcher is shared evenly between two thread, so that each thread gets every other cycle. At this point they are still [[macro-ops]] (i.e. variable-length [[x86]] architectural instruction). Instructions are brought into the pre-decode buffer for initial preparation. |
[[File:sandy bridge fetch.svg|left|300px]] | [[File:sandy bridge fetch.svg|left|300px]] | ||
Line 384: | Line 190: | ||
===== Decoding ===== | ===== Decoding ===== | ||
[[File:sandy bridge decode.svg|right|350px]] | [[File:sandy bridge decode.svg|right|350px]] | ||
− | Up to four instructions (or five in cases where one of the instructions was macro-fused) pre-decoded instructions are sent to the decoders each cycle. Like the fetchers, the | + | Up to four instructions (or five in cases where one of the instructions was macro-fused) pre-decoded instructions are sent to the decoders each cycle. Like the fetchers, the Decoders alternate between the two thread each cycle. Decoders read in [[macro-operations]] and emit regular, fixed length [[µOPs]]. The decoders organization in Sandy Bridge has been kept more or less the same as {{\\|Nehalem}}. As with its predecessor, Sandy Bridge features four decodes. The decoders are asymmetric; the first one, Decoder 0, is a [[complex decoder]] while the other three are [[simple decoders]]. A simple decoder is capable of translating instructions that emit a single fused-[[µOP]]. By contrast, a [[complex decoder]] can decode anywhere from one to four fused-µOPs. Overall up to 4 simple instructions can be decoded each cycle with lesser amounts if the complex decoder needs to emit addition µOPs; i.e., for each additional µOP the complex decoder needs to emit, 1 less simple decoder can operate. In other words, for each additional µOP the complex decoder emits, one less decoder is active. |
− | Sandy Bridge brought about the first 256-bit [[SIMD]] set of instructions called {{x86|AVX}}. This extension expanded the sixteen pre-existing 128-bit {{x86|XMM}} registers to 256-bit {{x86|YMM}} registers for floating point vector operations (note that {{\\|Haswell}} expanded this further to [[Integer]] operations as well). Most of the new AVX instructions have been designed as simple instructions that can be decoded by the simple decoders | + | Sandy Bridge brought about the first 256-bit [[SIMD]] set of instructions called {{x86|AVX}}. This extension expanded the sixteen pre-existing 128-bit {{x86|XMM}} registers to 256-bit {{x86|YMM}} registers for floating point vector operations (note that {{\\|Haswell}} expanded this further to [[Integer]] operations as well). Most of the new AVX instructions have been designed as simple instructions that can be decoded by the simple decoders |
====== MSROM & Stack Engine ====== | ====== MSROM & Stack Engine ====== | ||
There are more complex instructions that are not trivial to be decoded even by complex decoder. For instructions that transform into more than four µOPs, the instruction detours through the [[microcode sequencer]] (MS) ROM. When that happens, up to 4 µOPs/cycle are emitted until the microcode sequencer is done. During that time, the decoders are disabled. | There are more complex instructions that are not trivial to be decoded even by complex decoder. For instructions that transform into more than four µOPs, the instruction detours through the [[microcode sequencer]] (MS) ROM. When that happens, up to 4 µOPs/cycle are emitted until the microcode sequencer is done. During that time, the decoders are disabled. | ||
− | [[x86]] has dedicated [[stack machine]] operations. Instructions such as <code>{{x86|PUSH}}</code>, <code>{{x86|POP}}</code>, as well as <code>{{x86|CALL}}</code>, and <code>{{x86|RET}}</code> all operate on the [[stack pointer]] (<code>{{x86|ESP}}</code>). Without any specialized hardware, such operations would need to be sent to the back-end for execution using the general purpose ALUs, using up some of the bandwidth and utilizing scheduler and execution units resources. Since {{\\|Pentium M}}, Intel has been making use of a [[Stack Engine]]. The Stack Engine has a set of three dedicated adders it uses to perform and eliminate the stack-updating µOPs (i.e. capable of handling three additions per cycle). Instruction such as <code>{{x86|PUSH}}</code> are translated into a store and a subtraction of 4 from <code>{{x86|ESP}}</code>. The subtraction in this case will be done by the Stack Engine. The Stack Engine sits after the [[instruction decode|decoders]] and monitors the µOPs stream as it passes by. Incoming stack-modifying operations are caught by the Stack Engine. This operation | + | [[x86]] has dedicated [[stack machine]] operations. Instructions such as <code>{{x86|PUSH}}</code>, <code>{{x86|POP}}</code>, as well as <code>{{x86|CALL}}</code>, and <code>{{x86|RET}}</code> all operate on the [[stack pointer]] (<code>{{x86|ESP}}</code>). Without any specialized hardware, such operations would would need to be sent to the back-end for execution using the general purpose ALUs, using up some of the bandwidth and utilizing scheduler and execution units resources. Since {{\\|Pentium M}}, Intel has been making use of a [[Stack Engine]]. The Stack Engine has a set of three dedicated adders it uses to perform and eliminate the stack-updating µOPs (i.e. capable of handling three additions per cycle). Instruction such as <code>{{x86|PUSH}}</code> are translated into a store and a subtraction of 4 from <code>{{x86|ESP}}</code>. The subtraction in this case will be done by the Stack Engine. The Stack Engine sits after the [[instruction decode|decoders]] and monitors the µOPs stream as it passes by. Incoming stack-modifying operations are caught by the Stack Engine. This operation alleviate the burden of the pipeline from stack pointer-modifying µOPs. In other words, it's cheaper and faster to calculate stack pointer targets at the Stack Engine than it is to send those operations down the pipeline to be done by the execution units (i.e., general purpose ALUs). |
===== New µOP cache & x86 tax ===== | ===== New µOP cache & x86 tax ===== | ||
[[File:sandy bridge ucache.svg|right|400px]] | [[File:sandy bridge ucache.svg|right|400px]] | ||
− | Decoding the variable-length, inconsistent, and complex [[x86]] instructions is a nontrivial task. It's also expensive in terms of performance and power. Therefore, the best way for the pipeline to avoid those things is to simply not decode the instructions. This is exactly what Intel has done with Sandy Bridge and what's perhaps the single biggest feature that has been added to the core. With Sandy Bridge Intel introduced a new [[µOP cache]] unit or perhaps more appropriately called the Decoded Stream Buffer (DSB). The micro-op cache is unique in that | + | Decoding the variable-length, inconsistent, and complex [[x86]] instructions is a nontrivial task. It's also expensive in terms of performance and power. Therefore, the best way for the pipeline to avoid those things is to simply not decode the instructions. This is exactly what Intel has done with Sandy Bridge and what's perhaps the single biggest feature that has been added to the core. With Sandy Bridge Intel introduced a new [[µOP cache]] unit or perhaps more appropriately called the Decoded Stream Buffer (DSB). The micro-op cache is unique in that not only does it substantially improve performance but it does so while significantly reducing power. |
− | On the surface, the µOP cache can be conceptualized as a second instruction cache unit that is subset of the [[level one instruction cache]]. What's unique about it is that it stores actual decoded instructions (i.e., µOPs). While it shares many of the goals of {{\\|NetBurst}}'s [[trace cache]], the two implementations are entirely different | + | On the surface, the µOP cache can be conceptualized as a second instruction cache unit that is subset of the [[level one instruction cache]]. What's unique about it is that it stores actual decoded instructions (i.e., µOPs). While it shares many of the goals of {{\\|NetBurst}}'s [[trace cache]], the two implementations are entirely different. The idea behind both mechanisms is to increase the front-end bandwidth by reducing reliance on the decoders. |
− | The micro-op cache is organized into 32 sets of 8 cache lines with each line holding up to 6 µOP for a total of 1,536 µOPs. The cache is competitively shared between the two threads and can also hold pointers to the | + | The micro-op cache is organized into 32 sets of 8 cache lines with each line holding up to 6 µOP for a total of 1,536 µOPs. The cache is competitively shared between the two threads and can also hold pointers to the microcode sequencer rom. It's also virtually addressed and is a strict subset of the L1 instruction cache (that is, the L1 is inclusive of the µOP cache). Each line includes additional meta info for the number of contained µOP and their length. |
− | At any given time, the core operates on a contiguous chunks of 32 bytes of the [[instruction stream]]. Likewise, the µOP cache operates on full 32 B windows as well. This is by design so that the µOP cache could store and evict entire windows based on a [[LRU]] policy. Intel refers to the traditional pipeline path as the "legacy decode pipeline". On initial iteration, all instructions go through the legacy decode pipeline. Once the entire stream window is decoded and makes | + | At any given time, the core operates on a contiguous chunks of 32 bytes of the [[instruction stream]]. Likewise, the µOP cache operates on full 32 B windows as well. This is by design so that the µOP cache could store and evict entire windows based on a [[LRU]] policy. Intel refers to the traditional pipeline path as the "legacy decode pipeline". On initial iteration, all instructions go through the legacy decode pipeline. Once the entire stream window is decoded and makes it to the allocation queue, a copy of the window is inserted into the µOP cache. This occurs simultaneously with all other operations; i.e., no additional cycles or stages are added for this functionality. On all subsequent iterations, the cached pre-decoded stream is sent directly to the allocation queue - bypassing fetching, predecoding, and decoding, saving power and increasing throughput. |
− | Note that a single stream window of 32 bytes can only span 3 ways with 6 µOPs per line; this means | + | Note that a single stream window of 32 bytes can only span 3 ways with 6 µOPs per line; this means a maximum of 18 µOPs per window can be cached by the µOP cache. This consequently means a 32 byte window that generates more than 18 µOPs will not be allocated in the µOP cache and will have to go through the legacy decode pipeline instead. |
− | The µOP cache has an average hit rate of around 80%. During the instruction fetch, the branch predictor will probe the µOPs cache tags. A hit in the | + | The µOP cache has an average hit rate of around 80%. During the instruction fetch, the branch predictor will probe the µOPs cache tags. A hit in the µOP allows for up to 4 µOPs (possibly fused macro-ops) per cycle to be sent directly to the Instruction Decode Queue (IDQ), bypassing all the pre-decoding and decoding that would otherwise have to be done. During those cycles, the rest of the front-end is entirely clock-gated which is how the substantial power saving is gained. Whereas the legacy decode path works in 16-byte instruction fetch windows, the µOP cache has no such restriction and can deliver 4 µOPs/cycle corresponding to the much bigger 32-byte window. Since a single window can be made of up to 18 µOPs, up to 5 whole cycles may be required to read out the entire decoded stream. Nonetheless, the µOPs cache can deliver consistently higher bandwidth than the legacy pipeline which is limited to the 16-byte fetch window and can be a serious [[bottleneck]] if the average instruction length is more than four bytes per window. |
− | It's interesting to note that the µOPs cache only operates on full windows, that is, a full 32B window that has all the µOPs cached. Any partial hits are required to go through the legacy decode pipeline as if nothing was cached. The choice to not handle partial cache | + | It's interesting to note that the µOPs cache only operates on full windows, that is, a full 32B window that has all the µOPs cached. Any partial hits are required to go through the legacy decode pipeline as if nothing was cached. The choice to not handle partial cache is rooted in the µOPs' efficiency. On partial window hits you'd end up emitting some µOPs from the micro-op cache as well as having the legacy decode pipeline decoding the remaining missed µOPs. Effectively multiple components will end up emitting µOPs. Not only such mechanism would increase complexity, but it's also unclear how much, if any, benefits would be gained by that. |
− | As noted earlier, the µOPs cache has its root in the original {{\\|NetBurst}} trace cache, particularly as far as goals are concerned. But that's where the similarities end. The micro-op cache in Sandy Bridge can be seen as a light-weight, efficient µOPs delivery mechanism that can surpass the legacy pipeline whenever possible. This is different to the trace cache that attempted to effectively replace the entire front-end and use vastly inferior and slow fetch/decode for workloads that the trace cache could not handle. It's worth noting that the trace cache was costly and complicated (having dedicated components such as a trace BTB | + | As noted earlier, the µOPs cache has its root in the original {{\\|NetBurst}} trace cache, particularly as far as goals are concerned. But that's where the similarities end. The micro-op cache in Sandy Bridge can be seen as a light-weight, efficient µOPs delivery mechanism that can surpass the legacy pipeline whenever possible. This is different to the trace cache that attempted to effectively replace the entire front-end and use vastly inferior and slow fetch/decode for workloads that the trace cache could not handle. It's worth noting that the trace cache was costly and complicated (having dedicated components such as a trace BTB) and had various side-effects such as needing to flush on context switches. Deficiencies the µOP cache doesn't have. Trace caches resulted in a lot of duplication as well, for example {{\\|NetBurst}}'s 12k uops trace cache had a hit rate comparable to an 8 KiB-16 KiB L1I$ whereas this 1.5K µOPs cache cache is comparable to a 6 KiB instruction cache. This implies a significant storage efficiency of four-fold or greater. |
===== Allocation Queue ===== | ===== Allocation Queue ===== | ||
Line 423: | Line 229: | ||
The back-end or execution engine of Sandy Bridge deals with the execution of [[out-of-order]] operations. Sandy Bridge back-end is a clear a happy merger of both {{\\|NetBurst}} and {{\\|P6}}. As with {{\\|P6}} (through {{\\|Westmere}}), Sandy Bridge treats the three classes of µOPs ([[floating point]], [[integer]], and [[vector]]) separately. The implementation itself, however, is quite different. Sandy Bridge borrows the tracking and renaming architecture of {{\\|NetBurst}} which is far more efficient. | The back-end or execution engine of Sandy Bridge deals with the execution of [[out-of-order]] operations. Sandy Bridge back-end is a clear a happy merger of both {{\\|NetBurst}} and {{\\|P6}}. As with {{\\|P6}} (through {{\\|Westmere}}), Sandy Bridge treats the three classes of µOPs ([[floating point]], [[integer]], and [[vector]]) separately. The implementation itself, however, is quite different. Sandy Bridge borrows the tracking and renaming architecture of {{\\|NetBurst}} which is far more efficient. | ||
− | Sandy Bridge uses the tracking technique found in {{\\|NetBurst}} which uses a rename which is based on [[physical register file]] (PRF). All earlier predecessors, {{\\|P6}} through {{\\|Westmere}}, utilized a [[Retirement Register File]] (RRF) along with a [[Re-Order Buffer]] (ROB) which was used to track the micro-ops and data that are in flight. ROB results are then written into the RRF on retirement. Sandy Bridge returned to a PRF, meaning all of the data is now stored in the PRF with a separate component dedicated for the various meta data such as status information. | + | Sandy Bridge uses the tracking technique found in {{\\|NetBurst}} which uses a rename which is based on [[physical register file]] (PRF). All earlier predecessors, {{\\|P6}} through {{\\|Westmere}}, utilized a [[Retirement Register File]] (RRF) along with a [[Re-Order Buffer]] (ROB) which was used to track the micro-ops and data that are in flight. ROB results are then written into the RRF on retirement. Sandy Bridge returned to a PRF, meaning all of the data is now stored in the PRF with a separate component dedicated for the various meta data such as status information. Unlike a RRF, retirement is considerably simpler, requiring a simple mapping change between the architectural registers and the PRF, eliminating any actual data transfers. An additional component, the Register Alias Tables (RAT), is used to maintain the mapping of logical registers to physical registers. This includes both architectural state and most recent speculated state. In Sandy Bridge, ReOrder Buffer (ROB) still exists but is a much simpler component that tracks the in-flight µOPs and their status. |
− | In addition to the back-end | + | In addition to the improved back-end, Intel has also increased most of the buffers significantly, allowing for far more µOPs in-flight than before. |
===== Renaming & Allocation ===== | ===== Renaming & Allocation ===== | ||
− | On each cycle, up to 4 µOPs can be delivered here from the front-end from one of the two threads. As stated earlier, the Re-Order Buffer is now a light-weight component that tracks the in-flight µOPs. The ROB in Sandy Bridge is 168 entries, allowing for up to 40 additional µOPs in-flight over {{\\|Nehalem}}. At this point of the pipeline the µOPs are still handled sequentially (i.e., in order) with each | + | On each cycle, up to 4 µOPs can be delivered here from the front-end from one of the two threads. As stated earlier, the Re-Order Buffer is now a light-weight component that tracks the in-flight µOPs. The ROB in Sandy Bridge is 168 entries, allowing for up to 40 additional µOPs in-flight over {{\\|Nehalem}}. At this point of the pipeline the µOPs are still handled sequentially (i.e., in order) with each µOPs occupying the next entry in the ROB. This entry is used to track the correct execution order and statues. In order for the ROB to rename an integer µOP, there needs to be an available Integer PRF entry. Likewise, for FP and SIMD µOPs there needs to be an available FP PRF entry. Following renaming, all bets are off and the µOPs are free to execute as soon as their dependencies are resolved. |
− | It is at this stage that [[architectural registers]] are mapped onto the underlying [[physical registers]]. Other additional bookkeeping tasks are also done at this point such as allocating resources for stores, loads, and determining all possible scheduler ports. Register renaming is also controlled by the [[Register Alias Table]] (RAT) which is used to mark where the data we depend on is coming from (after that value, too, came from an instruction that has previously been renamed). Sandy Bridge | + | It is at this stage that [[architectural registers]] are mapped onto the underlying [[physical registers]]. Other additional bookkeeping tasks are also done at this point such as allocating resources for stores, loads, and determining all possible scheduler ports. Register renaming is also controlled by the [[Register Alias Table]] (RAT) which is used to mark where the data we depend on is coming from (after that value, too, came from an instruction that has previously been renamed). Sandy Bridge move to a PRF-based renaming has a fairly substantial impact on power too. With {{x86|AVX|the new}} {{x86|extensions|instruction set extension}} which allows for 256-bit operations, a retirement would've meant large amount of 256-bit values have to be needlessly moved to the Retirement Register File each time. This is entirely eliminated in Sandy Bridge. The decoupling of the PRFs from the RAT/ROB likely means some added latency is at play here, but the overall benefits are more than worth it. |
There is no special costs involved in splitting up fused µOPs before execution or [[retirement]] and the two fused µOPs only occupy a single entry in the ROB, however the rename registers has more than doubled over {{\\|Nehalem}} in order to accommodate those fused operations. The RAT is capable of handling 4 µOPs each cycle. Note that the ROB still operates on fused µOPs, therefore 4 µOPs can effectively be as high as 8 µOPs. | There is no special costs involved in splitting up fused µOPs before execution or [[retirement]] and the two fused µOPs only occupy a single entry in the ROB, however the rename registers has more than doubled over {{\\|Nehalem}} in order to accommodate those fused operations. The RAT is capable of handling 4 µOPs each cycle. Note that the ROB still operates on fused µOPs, therefore 4 µOPs can effectively be as high as 8 µOPs. | ||
Line 444: | Line 250: | ||
| Not only does this instruction get eliminated at the ROB, but it's actually encoded as just 2 bytes <code>31 C0</code> vs the 5 bytes for <code>{{x86|mov}} {{x86|eax}}, 0x0</code> which is encoded as <code>b8 00 00 00 00</code>. | | Not only does this instruction get eliminated at the ROB, but it's actually encoded as just 2 bytes <code>31 C0</code> vs the 5 bytes for <code>{{x86|mov}} {{x86|eax}}, 0x0</code> which is encoded as <code>b8 00 00 00 00</code>. | ||
|} | |} | ||
− | Sandy Bridge introduced a number of new optimizations it performs prior to entering the out-of-order and renaming part. Two of those optimizations are [[Zeroing Idioms]], and [[Ones Idioms]]. The first common optimization performed in Sandy Bridge is [[Zeroing Idioms]] elimination | + | Sandy Bridge introduced a number of new optimizations it performs prior to entering the out-of-order and renaming part. Two of those optimizations are [[Zeroing Idioms]], and [[Ones Idioms]]. The first common optimization performed in Sandy Bridge is [[Zeroing Idioms]] elimination. A number common zeroing idioms are recognized and consequently eliminated. This is done prior to bookkeeping at the ROB, allowing those µOPs to save resources and eliminating them entirely. Eliminated zeroing idioms are zero latency and are entirely removed from the pipeline (i.e., retired). |
Sandy Bridge recognizes instructions such as <code>{{x86|XOR}}</code>, <code>{{x86|PXOR}}</code>, and <code>{{x86|XORPS}}</code> as zeroing idioms when the [[source operand|source]] and [[destination operand|destination]] operands are the same. Those optimizations are done at the same rate as renaming during renaming (at 4 µOPs per cycle) and the architectural register is simply set to zero (no actual physical register is used). | Sandy Bridge recognizes instructions such as <code>{{x86|XOR}}</code>, <code>{{x86|PXOR}}</code>, and <code>{{x86|XORPS}}</code> as zeroing idioms when the [[source operand|source]] and [[destination operand|destination]] operands are the same. Those optimizations are done at the same rate as renaming during renaming (at 4 µOPs per cycle) and the architectural register is simply set to zero (no actual physical register is used). | ||
− | The [[ones idioms]] is another dependency breaking idiom that can be optimized. In all the various {{x86|PCMPEQ|PCMPEQx}} instructions that perform packed comparison the same register with itself always set all bits to one. On those cases, while the µOP still has to be executed, the instructions may be scheduled as soon as possible because | + | The [[ones idioms]] is another dependency breaking idiom that can be optimized. In all the various {{x86|PCMPEQ|PCMPEQx}} instructions that perform packed comparison the same register with itself always set all bits to one. On those cases, while the µOP still has to be executed, the instructions may be scheduled as soon as possible because all the decencies are resolved. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Die == | == Die == | ||
Line 656: | Line 271: | ||
: [[File:sandy bridge system agent (annotated).png|650px]] | : [[File:sandy bridge system agent (annotated).png|650px]] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Quad-Core === | === Quad-Core === | ||
Quad-core Sandy Bridge die: | Quad-core Sandy Bridge die: | ||
− | * | + | * 995,000,000 transistors |
* [[32 nm process]] | * [[32 nm process]] | ||
* 216 mm² die size | * 216 mm² die size | ||
Line 724: | Line 296: | ||
<gallery mode=slideshow> | <gallery mode=slideshow> | ||
− | File:sandy bridge die on a wafer.png|A partial wafer shot showing a complete quad-core die along with | + | File:sandy bridge die on a wafer.png|A partial wafer shot showing a complete quad-core die along with seven other dies around it. |
File:sandy bridge whole wafer.png|An entire Sandy Bridge Wafer. | File:sandy bridge whole wafer.png|An entire Sandy Bridge Wafer. | ||
File:sandy bridge wafer angled 1.png|Partial wafer shot, angled. shot 1. | File:sandy bridge wafer angled 1.png|Partial wafer shot, angled. shot 1. | ||
Line 730: | Line 302: | ||
File:sandy bridge wafer angled 3.png|Partial wafer shot, angled. shot 3. | File:sandy bridge wafer angled 3.png|Partial wafer shot, angled. shot 3. | ||
</gallery> | </gallery> | ||
+ | |||
+ | == Cores == | ||
+ | {{empty section}} | ||
== All Sandy Bridge Chips == | == All Sandy Bridge Chips == | ||
Line 739: | Line 314: | ||
Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips | Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips | ||
--> | --> | ||
− | + | <table class="wikitable sortable"> | |
− | <table class=" | + | <tr><th colspan="12" style="background:#D6D6FF;">Sandy Bridge Chips</th></tr> |
− | <tr | + | <tr><th colspan="9">Main processor</th><th colspan="3">IGP</th></tr> |
− | <tr | + | <tr><th>Model</th><th>µarch</th><th>Platform</th><th>Core</th><th>Launched</th><th>SDP</th><th>TDP</th><th>Freq</th><th>Max Mem</th><th>Name</th><th>Freq</th><th>Max Freq</th></tr> |
− | + | {{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Sandy Bridge]] | |
− | {{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Sandy Bridge | ||
|?full page name | |?full page name | ||
|?model number | |?model number | ||
+ | |?microarchitecture | ||
+ | |?platform | ||
+ | |?core name | ||
|?first launched | |?first launched | ||
− | |? | + | |?sdp |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|?tdp | |?tdp | ||
− | |?base frequency | + | |?base frequency |
− | + | |?max memory | |
− | |||
− | |||
− | |||
− | |?max memory | ||
|?integrated gpu | |?integrated gpu | ||
|?integrated gpu base frequency | |?integrated gpu base frequency | ||
|?integrated gpu max frequency | |?integrated gpu max frequency | ||
|format=template | |format=template | ||
− | |template=proc table | + | |template=proc table 2 |
− | + | |userparam=13 | |
− | |||
− | |||
− | |userparam= | ||
|mainlabel=- | |mainlabel=- | ||
− | |||
}} | }} | ||
− | {{ | + | <tr><th colspan="12">Count: {{#ask:[[Category:microprocessor models by intel]][[instance of::microprocessor]][[microarchitecture::Sandy Bridge]]|format=count}}</th></tr> |
</table> | </table> | ||
− | |||
== References == | == References == | ||
* [https://newsroom.intel.com/editorials/sandy-bridge-breaks-the-mold-for-chip-codenames/ Sandy Bridge Breaks the Mold for Chip Codenames], December 29, 2010 | * [https://newsroom.intel.com/editorials/sandy-bridge-breaks-the-mold-for-chip-codenames/ Sandy Bridge Breaks the Mold for Chip Codenames], December 29, 2010 | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Facts about "Sandy Bridge (client) - Microarchitectures - Intel"
codename | Sandy Bridge (client) + |
core count | 2 + and 4 + |
designer | Intel + |
first launched | September 13, 2010 + |
full page name | intel/microarchitectures/sandy bridge (client) + |
instance of | microarchitecture + |
instruction set architecture | x86-64 + |
manufacturer | Intel + |
microarchitecture type | CPU + |
name | Sandy Bridge (client) + |
phase-out | November 2012 + |
pipeline stages (max) | 19 + |
pipeline stages (min) | 14 + |
process | 32 nm (0.032 μm, 3.2e-5 mm) + |