From WikiChip
Editing centaur/microarchitectures/cha

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 165: Line 165:
 
CNS has a 32 KiB [[L1 instruction cache]] organized as 8 ways of 64 sets. Running ahead of the instruction stream, the CNS prefetchers will attempt to ensure the instruction stream is resident in that cache. Centaur has stated CNS featured improved prefetchers although no specifics were disclosed.
 
CNS has a 32 KiB [[L1 instruction cache]] organized as 8 ways of 64 sets. Running ahead of the instruction stream, the CNS prefetchers will attempt to ensure the instruction stream is resident in that cache. Centaur has stated CNS featured improved prefetchers although no specifics were disclosed.
  
Each cycle, up to 32 bytes (half a line) of the instruction stream are fetched from the [[instruction cache]] into the instruction pre-decode queue. Since [[x86]] instructions may range from a single byte to 15 bytes, this buffer receives an unstructured byte stream which is then marked at the instruction boundary. In addition to marking instruction boundaries, the pre-decode also does various prefix processing. From the pre-decode queue, up to five individual instructions are fed into the formatted instruction queue (FIQ).
+
Each cycle, up to 32 bytes (half a line) of the instruction stream are fetched from the [[instruction cache]] into the instruction pre-decode queue. Since [[x86]] instructions may range from a single byte to 15 bytes, this buffer receives an unstructured byte stream which is then marked at the instruction boundary. In addition to marking instruction boundaries, the pre-decode also does various prefix processing. From the pre-decode queue, up to four individual instructions are fed into the formatted instruction queue (FIQ).
  
 
Prior to getting sent to decode, the FIQ has the ability to do [[macro-op fusion]]. CNS can detect certain pairs of adjacent instructions such as a simple arithmetic operation followed by a conditional jump and couple them together such that they get decoded at the same time into a fused operation. This was improved further with the new CNS core.
 
Prior to getting sent to decode, the FIQ has the ability to do [[macro-op fusion]]. CNS can detect certain pairs of adjacent instructions such as a simple arithmetic operation followed by a conditional jump and couple them together such that they get decoded at the same time into a fused operation. This was improved further with the new CNS core.
  
 
[[File:cns decode.svg|right|500px]]
 
[[File:cns decode.svg|right|500px]]
From the FIQ, up to four instructions (or five if fused) are sent to the decode. CNS features four homogenous decoders - each capable of decoding all types of instructions. At the decode, x86 instructions are transformed into micro-operations. Most instructions will translate into a single micro-operation while more complex instructions (e.g., register-memory) will emit two µOPs. For fused instructions, those instructions are decoded into a fused micro-operations which remain fused for its remaining lifetime. In other words, those fused instructions will also retire as a fused instruction. This enables CNS to decode up to five instructions per cycle, reducing the effective bandwidth across the entire pipeline from the fused operations.
+
From the FIQ, up to four instructions (or five if fused) are sent to the decode. CNS features four homogenous decoders - each capable of decoding all types of instructions. For fused instructions, those instructions are decoded into a fused micro-operations which remain fuse for its remaining lifetime. In other words, those fused instructions will retire as a fused instruction. This enables CNS to decode up to five instructions per cycle, reducing the effective bandwidth across the entire pipeline from the fused operations.
  
 
Following decode, instructions are queued up in the micro-operation queue which serves to decouple the front-end from the back-end.
 
Following decode, instructions are queued up in the micro-operation queue which serves to decouple the front-end from the back-end.

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)

This page is a member of 1 hidden category: