From WikiChip
Editing amd/microarchitectures/zen 4

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 43: Line 43:
 
! Processor Series !! Cores/Threads !! Market
 
! Processor Series !! Cores/Threads !! Market
 
|-
 
|-
| EPYC 9004 "{{amd|Bergamo|l=core}}" || Up to 128/256  || Cloud [[multiprocessors]] (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.)
+
| EPYC 9004 {{amd|Bergamo|l=core}} || Up to 128/256  || Cloud [[multiprocessors]] (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.)
|-
 
| EPYC 8004 "{{amd|Siena|l=core}}" || Up to 64/128 || Edge-optimized server chips
 
 
|}
 
|}
  
Line 71: Line 69:
 
* {{x86|AVX-512}} instructions support, 256-bit data path<ref name="ryzen-7000-preview"/>
 
* {{x86|AVX-512}} instructions support, 256-bit data path<ref name="ryzen-7000-preview"/>
 
* L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
 
* L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
* Op cache size increased from 4,096 to 6,912 Ops per core
+
* Op cache size increased from 4,096 to 6,750 Ops per core
 
* L2 cache doubled from 512&nbsp;KiB to 1&nbsp;MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum
 
* L2 cache doubled from 512&nbsp;KiB to 1&nbsp;MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum
 
* L3 cache average load-to-use latency increased from 46 to 50 cycles
 
* L3 cache average load-to-use latency increased from 46 to 50 cycles
Line 79: Line 77:
 
* Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
 
* Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
 
* Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries)
 
* Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries)
* REPE CMPSB (sometimes used to implement string comparison) is significantly sped up, processes more than 32 bytes/cycle when operating on L1 data.
 
* BSF, BSR, and BMI1 instructions BLSI, BLSMSK, BLSR, TZCNT have smaller latency of 1 and x2 throughput (4 insn/cycle).
 
* Latency and/or throughput of VPERMx, V[P]BROADCASTx, VPMOV{S,Z}Xx instructions improved.
 
* Some ALU operations on vector registers increased throughput from 2 to 3 ops/cycle.
 
* Some ALU operations on vector registers (VPABSx,VPHADDx,VPHSUBx,VPSLLx,VPSRLx,VPSRAx,VPACKx,VPSIGNx,VMAXx,VMINx) increased latency by 1 cycle.
 
 
  
 
Package level changes:
 
Package level changes:
Line 109: Line 101:
 
** {{x86|AVX512_VNNI}} - Vector Neural Network Instructions (Ice Lake)
 
** {{x86|AVX512_VNNI}} - Vector Neural Network Instructions (Ice Lake)
 
** {{x86|AVX512_BF16}} - [[bfloat16|BFloat16]] Instructions ({{intel|Cooper Lake|l=arch}})
 
** {{x86|AVX512_BF16}} - [[bfloat16|BFloat16]] Instructions ({{intel|Cooper Lake|l=arch}})
** ''Not supported'': AVX512ER, AVX512PF ({{intel|Knights Landing|l=arch}}); AVX512 4VNNIW, 4FMAPS ({{intel|Knights Mill|l=arch}}); VP2INTERSECT ({{intel|Tiger Lake|l=arch}}); FP16 ({{intel|Sapphire Rapids|l=arch}})
+
** ''Not supported'': AVX512ER, AVX512PF ({{intel|Knights Landing|l=arch}}); AVX512 4VNNIW, 4FMAPS ({{intel|Knights Mill|l=arch}}); VP2INTERSECT ({{intel|Tiger Lake|l=arch}}); FP16 ({{intel|Cooper Lake|l=arch}})
 
* {{x86|GFNI}} - Galois Field New Instructions (first introduced with [[Intel]] {{intel|ice lake (server)|Ice Lake|l=arch}})
 
* {{x86|GFNI}} - Galois Field New Instructions (first introduced with [[Intel]] {{intel|ice lake (server)|Ice Lake|l=arch}})
 
** <code>VGF2P8AFFINEQB</code> - Galois field affine transformation
 
** <code>VGF2P8AFFINEQB</code> - Galois field affine transformation
Line 118: Line 110:
 
==== Data and Instruction Caches ====
 
==== Data and Instruction Caches ====
 
* L0 Op Cache:
 
* L0 Op Cache:
** Up to 6,912 Ops per core, 12-way set associative
+
** Up to 6,750 Ops per core, 8-way set associative
 
** 9 Op line size (restrictions apply depending on instruction type)
 
** 9 Op line size (restrictions apply depending on instruction type)
 
** Parity protected
 
** Parity protected

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameZen 4 +
designerAMD +
full page nameamd/microarchitectures/zen 4 +
instance ofmicroarchitecture +
manufacturerTSMC +
microarchitecture typeCPU +
nameZen 4 +
process5 nm (0.005 μm, 5.0e-6 mm) + and 6 nm (0.006 μm, 6.0e-6 mm) +