From WikiChip
Editing centaur/microarchitectures/cha
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 175: | Line 175: | ||
=== Back-end === | === Back-end === | ||
− | The back-end deals with the [[out-of-order]] execution of instructions. CNS | + | The back-end deals with the [[out-of-order]] execution of instructions. CNS makes major improvements to the back-end over prior generations. From the front-end, micro-operations are fetched from the micro-operation queue which decouples the front-end from the back-end. Each cycle, up to four instructions can be renamed (and later retire). This is an increase from the previously 3-wide [[instruction rename|rename]]. The widening of rename and retire match the decode rate of the front-end. Once renamed, micro-operations are sent to the scheduler. |
− | Prior Centaur chips were manufactured on relatively older [[process nodes]] such as [[65 nm]] and later [[45 nm]]. The move to a more leading-edge node ([[TSMC]] [[16-nanometer]] [[FinFET]], in this case) provided them with a significantly higher transistor budget. Centaur takes advantage of that to build a wider out-of-order core. To that end, | + | Prior Centaur chips were manufactured on relatively older [[process nodes]] such as [[65 nm]] and later [[45 nm]]. The move to a more leading-edge node ([[TSMC]] [[16-nanometer]] [[FinFET]], in this case) provided them with a significantly higher transistor budget. Centaur takes advantage of that to build a wider out-of-order core. To that end, Centaur’s CNS core supports 192 OoO instructions in-flight. This is identical to both {{intel|Haswell|Intel Haswell|l=arch}} and {{amd|Zen|AMD Zen|l=arch}}. |
==== Execution ports ==== | ==== Execution ports ==== | ||
Line 188: | Line 188: | ||
CNS incorporates three dedicated ports for [[floating-point]] and vector operations. Two of the ports support [[fused-multiply-add|FMA operations]] while the third has the divide and crypto units. All three pipes are 256-bit wide. In terms of raw compute power, the total [[FLOPS]] per core is 16 double-precision FLOPs/cycle – reaching parity with AMD {{amd|Zen 2|l=arch}} as well as Intel's {{intel|Haswell|l=arch}}, {{intel|Broadwell|l=arch}}, and {{intel|Skylake (Client)|l=arch}}. | CNS incorporates three dedicated ports for [[floating-point]] and vector operations. Two of the ports support [[fused-multiply-add|FMA operations]] while the third has the divide and crypto units. All three pipes are 256-bit wide. In terms of raw compute power, the total [[FLOPS]] per core is 16 double-precision FLOPs/cycle – reaching parity with AMD {{amd|Zen 2|l=arch}} as well as Intel's {{intel|Haswell|l=arch}}, {{intel|Broadwell|l=arch}}, and {{intel|Skylake (Client)|l=arch}}. | ||
− | CNS added extensive [[x86]] ISA support, including new support for {{x86|AVX-512}}. CNS supports all the AVX-512 extensions supported by Intel's {{intel|Skylake (Server)|l=arch}} as well as those found in {{intel|Palm Cove|l=arch}}. From an implementation point of view, | + | CNS added extensive [[x86]] ISA support, including new support for {{x86|AVX-512}}. CNS supports all the AVX-512 extensions supported by Intel's {{intel|Skylake (Server)|l=arch}} as well as those found in {{intel|Palm Cove|l=arch}}. From an implementation point of view, Centaur’s CNS cores Vector lanes are 256-wide, therefore AVX-512 operations are cracked into two 256-wide operations which are then scheduled independently. In other words, there is no throughput advantage here. The design is similar to how AMD dealt with AVX-256 in their {{amd|Zen core|l=arch}} where operations had to be executed as two 128-bit wide operations. Note that the implementation of AVX-512 on CNS usually exhibits no downclocking. The design of the core was such that it's designed to operate at the full frequency of the core and the rest of the SoC. Centaur does implement a power management engine that's capable of downclocking for certain power-sensitive SKUs if necessary. |
=== Memory subsystem === | === Memory subsystem === | ||
[[File:cns mem subsys.svg|right|450px]] | [[File:cns mem subsys.svg|right|450px]] | ||
− | The memory subsystem on CNS features three ports - two generic AGUs and one store AGU port | + | The memory subsystem on CNS features three ports - two generic AGUs and one store AGU port. CNS supports 116 memory operations in-flight. The MOB consists of a 72-entry load-buffer and a 44-entry store buffer. |
− | CNS features a [[level 1 data cache]] with a capacity of 32 KiB. The cache is organized as 8 ways of 64 sets. It is fully multi-ported, | + | CNS features a [[level 1 data cache]] with a capacity of 32 KiB. The cache is organized as 8 ways of 64 sets. It is fully multi-ported, supporting 2 reads and 1 write every cycle. Each port is 32B wide, therefore 512-bit memory operations, like the arithmetic counterparts, have to be cracked into two 256-bit operations. With two load operations, CNS can do a single 512-bit operation each cycle. |
− | |||
− | |||
== NCORE NPU == | == NCORE NPU == |
Facts about "CHA - Microarchitectures - Centaur Technology"
codename | CHA + |
core count | 8 + |
designer | Centaur Technology + |
full page name | centaur/microarchitectures/cha + |
instance of | microarchitecture + |
instruction set architecture | x86-64 + |
manufacturer | TSMC + |
microarchitecture type | CPU + |
name | CHA + |
pipeline stages (max) | 22 + |
pipeline stages (min) | 20 + |
process | 16 nm (0.016 μm, 1.6e-5 mm) + |