From WikiChip
Editing amd/microarchitectures/zen

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 610: Line 610:
 
AMD organized Zen in groups of cores called a '''CPU Complex''' ('''CCX'''). Each CCX consists of four cores connected to an L3 cache. The L3 cache is an 8 MiB 16-way set associative [[victim cache]] and is mostly [[exclusive cache|exclusive]] of the L2. The L3 cache is made of four slices (providing 2 MiB L3 slice/core) organized by low-order address interleaved. Every core can access every L3 cache slice with the same average latency. When a certain core starts working on a chunk of memory it will fill up the L2 and as it continue to execute and fetch new data any spillover will find its way in the L3.  
 
AMD organized Zen in groups of cores called a '''CPU Complex''' ('''CCX'''). Each CCX consists of four cores connected to an L3 cache. The L3 cache is an 8 MiB 16-way set associative [[victim cache]] and is mostly [[exclusive cache|exclusive]] of the L2. The L3 cache is made of four slices (providing 2 MiB L3 slice/core) organized by low-order address interleaved. Every core can access every L3 cache slice with the same average latency. When a certain core starts working on a chunk of memory it will fill up the L2 and as it continue to execute and fetch new data any spillover will find its way in the L3.  
  
Depending on the exact processor processor model, there may be one or more CCXs joined together. For example, all mainstream {{amd|Ryzen 3}}/{{amd|Ryzen 5|5}}/{{amd|Ryzen 7|7}} models have two CCXs with up to 8 cores (and an equal amount of cores disabled on each CCX as the chips are down-binned to 4/6 cores). It's important to note that the L3 in Zen is not a true last level cache (LLC) as the 16 MiB L3$ will consist of two separate 8 MiB and not one unified L3. The separate CPU complexes can communicate with each other via the {{amd|Infinity Fabric}} which connects the CCXs along with the memory controller and I/O. While the CCXs operate at core frequency ([[#Clock domains|CClk]]), the fabric itself operates at [[#Clock domains|MemClk]] (see [[#Clock domains|§ Clock domains]]). This design choice allows for the scaling up to large high-performance multi-core system (i.e., high scalability, particularly in the server segment, through high core count and large bandwidth) but it does mean that systems making use of Zen processors have to treat every CPU Complex as a processor of its own - i.e., schedule tasks using [[cache-coherent non-uniform memory access]] (ccNUMA-aware) scheduling. This is important to ensure that threads are not moved from one CCX to the other as doing so will likely incur unnecessary performance penalties (as cache data would need to be communicated over via the fabric from one CCX to the next which has additional overhead latency and lower bandwidth).  
+
Depending on the exact processor model, an 8-core processor will incorporate two CPU Complexes. It's important to note that the L3 in Zen is not a true last level cache (LLC) as the 16 MiB L3$ will consist of two separate 8 MiB and not one unified L3. While no details have yet been disclosed, AMD did state that the separate complexes can communicate with each other via their custom fabric which connects the CCXs along with the memory controller and I/O. This design choice is aligned with the AMD's goal of scaling up to large high-performance multi-core system (i.e., high scalability, particularly in the server segment, through high core count and large bandwidth) but it does mean that systems making use of Zen processors have to treat every CPU Complex as a processor of its own - i.e., schedule tasks using [[cache-coherent non-uniform memory access]] (ccNUMA-aware) scheduling. This is important to ensure that threads are not moved from one CCX to the other as doing so will likely incur unnecessary performance penalties (as cache data would need to be communicated over via the fabric from one CCX to the next which has additional overhead latency and lower bandwidth).  
 
 
While specific worst-case scenario performance tests have shown that rapid inter-CCXs data movement incur a substantial performance penalty, real world tests have shown the penalty is rather small in practice as the operating system (e.g. Windows) knows how to do the right thing. Additionally performance can be improved with faster memory kits which in turn increases the frequency of the fabric as well (see [[#Clock domains|§ Clock domains]]).
 
 
 
 
  
 
<div style="display: inline-block">
 
<div style="display: inline-block">

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)

This page is a member of 1 hidden category:

codenameZen +
core count4 +, 6 +, 8 +, 16 +, 24 +, 32 + and 12 +
designerAMD +
first launchedMarch 2, 2017 +
full page nameamd/microarchitectures/zen +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerGlobalFoundries +
microarchitecture typeCPU +
nameZen +
pipeline stages19 +
process14 nm (0.014 μm, 1.4e-5 mm) +