From WikiChip
Editing intel/microarchitectures/sandy bridge (client)

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 456: Line 456:
 
Sandy Bridge has two distinct [[physical register files]] (PRF). 64-bit data values are stored in the 160-entry Integer PRF, while [[Floating Point]] and vector data values are stored in the Vector PRF which has been extended to 256 bits in order to accommodate the new {{x86|AVX}} {{x86|YMM}} registers. The Vector PRF is 144-entry deep which is slightly smaller than the Integer one. It's worth pointing out that prior to Sandy Bridge, code that relied on constant register reading was bottlenecked by a limitation in the register file which was limited to three reads. This restriction has been eliminated in Sandy Bridge.
 
Sandy Bridge has two distinct [[physical register files]] (PRF). 64-bit data values are stored in the 160-entry Integer PRF, while [[Floating Point]] and vector data values are stored in the Vector PRF which has been extended to 256 bits in order to accommodate the new {{x86|AVX}} {{x86|YMM}} registers. The Vector PRF is 144-entry deep which is slightly smaller than the Integer one. It's worth pointing out that prior to Sandy Bridge, code that relied on constant register reading was bottlenecked by a limitation in the register file which was limited to three reads. This restriction has been eliminated in Sandy Bridge.
  
At this point, µOPs are not longer fused and will be dispatched to the execution units independently. The scheduler holds the µOPs while they wait to be executed. A µOP could be waiting on an operand that has not arrived (e.g., fetched from memory or currently being calculated from another µOPs) or because the execution unit it needs is busy. Once the µOP is ready, they are dispatched through their designated port.
+
At this point, µOPs are not longer fused and will be dispatched to the execution units independently. The scheduler holds the µOPs while they wait to be executed. A µOP could be waiting on an operand that has not arrived (e.g., fetched from memory or currently being calculated from another µOPs) or because the execution unit it needs is busy. Once the µOP is ready, they are dispatched through their designated port. The scheduler will send the oldest ready µOP to be executed on each of the six ports each cycle.
 
 
The scheduler will send the oldest ready µOP to be executed on each of the six ports each cycle. Sandy Bridge, like {{\\|Nehalem}} has six ports. Port 0, 1, and 5 are used for executing computational operations. Each port has 3 stacks of execution units corresponding to the three classes of data types: Integer, SIMD Integer, and Floating Point. The Integer stack handles 64-bit general-purpose integer operations. The SIMD Integer and the Floating Point stacks are both 128-bit wide.
 
 
 
Each cluster operates within its own domain. Domains help refuse power for less frequently used domains and to simplify the routing networks. Data flowing within its domain (e.g., integer->integer or FP->FP) are effectively free and can be done each cycle. Data flowing between two separate domains (e.g., integer->FP or FP->integer) will incur an additional one or two cycles for the data to be forwarded to the appropriate domain.
 
 
 
Sandy Bridge ports 2, 3, and 4, are used for executing memory related operations such as loads and stores. Those ports all operate within the Integer stack due to integer latency sensitivity.
 
 
 
====== New 256-bit extension ======
 
Sandy Bridge introduced the {{x86|AVX}} extension, a new 256-bit [[x86]] floating point [[SIMD]] {{x86|extension}}. The new instructions are implemented using an improved {{x86|VEX|Vector EXtension}} (VEX) prefix byte sequence. Those instructions are in turn decoded into single µOPs most of the time.
 
 
 
In order to have a noticeable effect on performance, Intel opted to not to repeat their initial SSE implementation choices. Instead, Intel doubled the width of all the associated executed units. This includes a full hardware 256-bit floating point multiply, add, and shuffle - all having a single-cycle latency.
 
 
 
As mentioned earlier, both the Integer SIMD and the FP stacks are 128-bit wide. Widening the entire pipeline is a fairly complex undertaking. The challenge with introducing a new 256-bit extension is being able to double the output of one of the stacks while remaining invisible to the others. Intel solved this problem by cleverly dual-purposing the two existing 128-bit stacks during AVX operations to move full 256-bit values. For example a 256-bit floating point add operation would use the Integer SIMD domain for the lower 128-bit half and the FP domain for the upper 128-bit half to form the entire 256-bit value. The re-using of existing datapaths results in a fairly substantial die area and power saving.
 
 
 
Overall, Sandy Bridge achieves double the [[FLOP]] of {{\\|Nehalem}}, capable of performing 256-bit multiply + 256-bit add each cycle. That is, Sandy Bridge can sustain 16 single-precision FLOP/cycle or 8 double-precision FLOP/cycle.
 
 
 
====== Scheduler Ports & Execution Units ======
 
Overall, Sandy Bridge is further enhanced with rebalanced ports. For example, the various string operations have been moved to port 0. A second has been augmented with LEA execution capabilities. Ports 0 and 1 gained additional capabilities such as integer blends. The various security enhancements that were added in {{\\|Westmere}} have also been improved.
 
  
 
===== Retirement =====
 
===== Retirement =====

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)

This page is a member of 1 hidden category:

codenameSandy Bridge (client) +
core count2 + and 4 +
designerIntel +
first launchedSeptember 13, 2010 +
full page nameintel/microarchitectures/sandy bridge (client) +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameSandy Bridge (client) +
phase-outNovember 2012 +
pipeline stages (max)19 +
pipeline stages (min)14 +
process32 nm (0.032 μm, 3.2e-5 mm) +