From WikiChip
Difference between revisions of "amd/microarchitectures/zen 4"
< amd‎ | microarchitectures

(Key changes from {{\\|Zen 3}}: improved grammar)
(6.75k is not 6750, increase from 4096=8*8*2^6 to 6912=9*12*2^6 (=4096*9/8 +50%))
 
(40 intermediate revisions by 15 users not shown)
Line 6: Line 6:
 
|manufacturer=TSMC
 
|manufacturer=TSMC
 
|process=5 nm
 
|process=5 nm
 +
|process 2=6 nm
 
|predecessor=Zen 3
 
|predecessor=Zen 3
 
|predecessor link=amd/microarchitectures/zen 3
 
|predecessor link=amd/microarchitectures/zen 3
Line 12: Line 13:
 
|succession=Yes
 
|succession=Yes
 
}}
 
}}
'''Zen 4''' is a planned [[microarchitecture]] being developed by [[AMD]] as a successor to {{\\|Zen 3}}.
+
'''Zen 4''' is a [[microarchitecture]] developed by [[AMD]] as a successor to {{\\|Zen 3}}. See press release for details: [https://www.amd.com/en/press-releases/2022-08-29-amd-launches-ryzen-7000-series-desktop-processors-zen-4-architecture-the AMD Launches Ryzen 7000 Series Desktop Processors]
  
 
== History ==
 
== History ==
Line 18: Line 19:
 
Zen 4 was first mentioned by Forrest Norrod during AMD's EPYC One Year Anniversary webinar. During the next horizon event which was held on November 6, 2018, AMD stated that Zen 4 was at the design completion phase.
 
Zen 4 was first mentioned by Forrest Norrod during AMD's EPYC One Year Anniversary webinar. During the next horizon event which was held on November 6, 2018, AMD stated that Zen 4 was at the design completion phase.
  
== Process Technology ==
+
== Products ==
AMD claims that Zen4 is going to be produced on a [[5nm]] node by [[TSMC]].
 
 
 
== Codenames ==
 
 
{{future information}}
 
{{future information}}
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Core !! C/T !! Target
+
! Processor Series !! Cores/Threads !! Market
 +
|-
 +
| EPYC 9004 "{{amd|Genoa|l=core}}" || Up to 96/192 || High-end server [[multiprocessors]]
 +
|-
 +
| Ryzen Threadripper 7000 "{{amd|Storm Peak|l=core}}" || Up to 96/192 || Workstation & enthusiasts
 
|-
 
|-
| {{amd|Genoa|l=core}} || Up to 96/192 || High-end server [[multiprocessors]]
+
| Ryzen 7000 "{{amd|Raphael|l=core}}" || Up to 16/32 || Mainstream to high-end desktops & enthusiasts
 
|-
 
|-
| {{amd|Warhol|l=core}} || Up to 20/40 || Mainstream to high-end desktops & enthusiasts market processors
+
| Ryzen 7000 APU "{{amd|Dragon Range|l=core}}" || Up to 16/32 || High-end mobile processors with GPU
 
|-
 
|-
| {{amd|Rembrandt|l=core}} || Up to 8/16 || Mainstream desktop & mobile processors with GPU  
+
| Ryzen 7000 APU "{{amd|Phoenix Point|l=core}}" || Up to 8/16 || Mainstream desktop & mobile processors with GPU  
 
|}
 
|}
  
Line 39: Line 41:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Core !! C/T !! Target
+
! Processor Series !! Cores/Threads !! Market
 +
|-
 +
| EPYC 9004 "{{amd|Bergamo|l=core}}" || Up to 128/256  || Cloud [[multiprocessors]] (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.)
 +
|-
 +
| EPYC 8004 "{{amd|Siena|l=core}}" || Up to 64/128 || Edge-optimized server chips
 +
|}
 +
 
 +
'''Architectural Codenames:'''
 +
{| class="wikitable"
 +
|-
 +
! Arch !! Codename
 +
|-
 +
| Core || Persephone
 
|-
 
|-
| {{amd|Bergamo|l=core}} || Up to 128/128?  || Cloud multiprocessing (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core likely sacrificing AVX-512, L3 and possibly SMT)
+
| {{abbr|CCD}} || Durango
 
|}
 
|}
 +
 +
== Process Technology ==
 +
Processors implementing Zen 4 are {{abbr|SoC}}s configured as a Multi-Chip Module or monolithic chip. MCMs consist of a single I/O die and up to 12 Core Complex Dies attached with full-duplex serial point-to-point links. The IOD contains memory controllers, I/O controllers, microcontrollers for security purposes and power management, and other peripherals. The CCDs communicate with peripherals and each other through the Data and Control Fabrics on the I/O die, and each contain a single Core Complex (CCX). The monolithic chips integrate a subset of the IOD facilities and additional peripherals tailored for their target market, a CCX, and a GPU. A CCX contains 8 CPU cores (fewer may be usable on some models) communicating through a shared L3 cache.
 +
 +
("Bergamo" processors configuration TBD.)
 +
 +
The chips are fabricated by [[TSMC]], CCDs and monolithic chips on a [[5&nbsp;nm]] node, IODs on a [[6&nbsp;nm]] node.
 +
 
== Architecture ==
 
== Architecture ==
Little is currently known about the architectural improvements that are being done to Zen 4.
+
Zen 4 is a 64-bit superscalar, out-of-order, 2-way [[SMT]] microarchitecture with advanced dynamic branch prediction, 4-way decoding of [[x86]] instructions with a stack optimizer, multiple caches including an Op cache for decoded instructions and prefetchers for code and data, four integer/address and two floating point instruction schedulers, 3-way address generation, 5-way integer execution. 4-way 256-bit wide floating point execution, a speculative, out-of-order load/store unit capable of up to three loads or two stores per cycle with a 48/88-entry load and 64-entry store queue, write-combining, and 5-level paging with four {{abbr|TLB}}s and six hardware page table walkers.
  
 
=== Key changes from {{\\|Zen 3}} ===
 
=== Key changes from {{\\|Zen 3}} ===
{{empty section}}
+
* {{x86|AVX-512}} instructions support, 256-bit data path<ref name="ryzen-7000-preview"/>
*Raised maximum core/thread count from 64/128 to at least 96/192
+
* L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
(Bergamo supports 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space)
+
* Op cache size increased from 4,096 to 6,912 Ops per core
*Improved cache load, write and prefetch from/to register (less latency).
+
* L2 cache doubled from 512&nbsp;KiB to 1&nbsp;MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum
*Utilizes new AM5 socket and is confirmed to support DDR5 and PCI-E 5.
+
* L3 cache average load-to-use latency increased from 46 to 50 cycles
*Higher Transistor Density, due to 5nm process
+
* Five-level paging; Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
*Double the L2 cache when compared to Zen 3
+
* Improved cache load, write and prefetch from/to register (less latency)
*Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
+
* Higher Transistor Density, due to 5nm process
 +
* Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
 +
* Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries)
 +
* REPE CMPSB (sometimes used to implement string comparison) is significantly sped up, processes more than 32 bytes/cycle when operating on L1 data.
 +
* BSF, BSR, and BMI1 instructions BLSI, BLSMSK, BLSR, TZCNT have smaller latency of 1 and x2 throughput (4 insn/cycle).
 +
* Latency and/or throughput of VPERMx, V[P]BROADCASTx, VPMOV{S,Z}Xx instructions improved.
 +
* Some ALU operations on vector registers increased throughput from 2 to 3 ops/cycle.
 +
* Some ALU operations on vector registers (VPABSx,VPHADDx,VPHSUBx,VPSLLx,VPSRLx,VPSRAx,VPACKx,VPSIGNx,VMAXx,VMINx) increased latency by 1 cycle.
 +
 
 +
 
 +
Package level changes:
 +
* EPYC 9004 "{{amd|Genoa|l=core}}": Max. core/thread count 96/192, up from 64/128 on EPYC 7003 "{{amd|Milan|l=core}}"
 +
* EPYC "{{amd|Bergamo|l=core}}": Max. 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space
 +
* Support for DDR5 memory and PCIe Gen 5
 +
* New sockets {{amd|AM5|l=pack}} (client), {{amd|SP5|l=pack}} and {{amd|SP6|l=pack}} (server), {{amd|FP7|FP7/FP7r2|l=pack}} (mobile)
 +
* {{abbr|APU}}s: RDNA2-based iGPU with 2 compute units (128 stream processors)
 +
 
 +
=== New Instructions ===
 +
Zen 4 introduced the following ISA enhancements:
 +
 
 +
* {{x86|AVX-512}} - 512-bit Vector Instructions
 +
** {{x86|AVX512F}} - Foundation (first introduced with [[Intel]] {{intel|skylake (server)|Skylake|l=arch}})
 +
** {{x86|AVX512CD}} - Conflict Detection Instructions ({{intel|Skylake X|l=core}})
 +
** {{x86|AVX512VL}} - Vector Length Extensions (Skylake X)
 +
** {{x86|AVX512DQ}} - Doubleword and Quadword Instructions (Skylake X)
 +
** {{x86|AVX512BW}} - Byte and Word Instructions (Skylake X)
 +
** {{x86|AVX512_IFMA}} - Integer Fused Multiply-Add ({{intel|Cannon Lake|l=arch}})
 +
** {{x86|AVX512_VBMI}} - Vector Bit Manipulation Instructions (Cannon Lake)
 +
** {{x86|AVX512_VPOPCNTDQ}} - Vector Population Count Instructions ({{intel|ice lake (server)|Ice Lake|l=arch}})
 +
** {{x86|AVX512_BITALG}} - Bit Algorithms (Ice Lake)
 +
** {{x86|AVX512_VBMI2}} - Vector Bit Manipulation Instructions 2 (Ice Lake)
 +
** {{x86|AVX512_VNNI}} - Vector Neural Network Instructions (Ice Lake)
 +
** {{x86|AVX512_BF16}} - [[bfloat16|BFloat16]] Instructions ({{intel|Cooper Lake|l=arch}})
 +
** ''Not supported'': AVX512ER, AVX512PF ({{intel|Knights Landing|l=arch}}); AVX512 4VNNIW, 4FMAPS ({{intel|Knights Mill|l=arch}}); VP2INTERSECT ({{intel|Tiger Lake|l=arch}}); FP16 ({{intel|Sapphire Rapids|l=arch}})
 +
* {{x86|GFNI}} - Galois Field New Instructions (first introduced with [[Intel]] {{intel|ice lake (server)|Ice Lake|l=arch}})
 +
** <code>VGF2P8AFFINEQB</code> - Galois field affine transformation
 +
** <code>VGF2P8AFFINEINVQB</code> - Galois field affine transformation inverse
 +
** <code>VGF2P8MULB</code> - Galois field multiply bytes
 +
 
 +
=== Memory Hierarchy ===
 +
==== Data and Instruction Caches ====
 +
* L0 Op Cache:
 +
** Up to 6,912 Ops per core, 12-way set associative
 +
** 9 Op line size (restrictions apply depending on instruction type)
 +
** Parity protected
 +
* L1I Cache:
 +
** 32&nbsp;KiB per core, 8-way set associative
 +
** 64&nbsp;B line size
 +
** Parity protected
 +
* L1D Cache:
 +
** 32&nbsp;KiB per core, 8-way set associative
 +
** 64&nbsp;B line size
 +
** Write-back policy
 +
** 4-5 cycles latency for Int
 +
** 7-8 cycles latency for FP
 +
** ECC
 +
* L2 Cache:
 +
** 512&nbsp;KiB or 1&nbsp;MiB per core (varies by processor model), 8-way set associative
 +
** 64&nbsp;B line size
 +
** Write-back policy
 +
** Inclusive of L1
 +
** ≥ 14 cycles latency
 +
** {{abbr|DEC-TED}} ECC, tag & state arrays {{abbr|SEC-DED}}<!--7 check bits for 42 tag bits; AMD-55901-0.97 Sec 3.5-->
 +
* L3 Cache:
 +
** "{{amd|Genoa|l=core}}": up to 32&nbsp;MiB/{{abbr|CCX}} (8 cores), up to 384&nbsp;MiB total
 +
** Shared by all cores in the CCX, configurable
 +
** 16-way set associative
 +
** 64&nbsp;B line size
 +
** L2 [[victim cache]]
 +
** Write-back policy
 +
** 50 cycles average load-to-use latency
 +
** DEC-TED ECC, tag array & shadow tags SEC-DED<!--AMD-55901-0.97 Sec 3.5-->
 +
** QoS Monitoring and Enforcement with {{abbr|BMEC|Bandwidth Monitoring Event Configuration}}, {{abbr|L3RR|L3 Range Reservation}}, {{abbr|L3SBE|L3 External Slow Memory Bandwidth Enforcement}}
 +
 
 +
==== Translation Lookaside Buffers ====
 +
* ITLB
 +
** 64 entry L1 TLB, fully associative
 +
*** 4-Kbyte, 2-Mbyte, 1-Gbyte page sizes
 +
** 512 entry L2 TLB, 8-way set associative
 +
*** 4-Kbyte, 2-Mbyte, and 4-Mbyte pages
 +
** Parity protected
 +
* DTLB
 +
** 72 entry L1 TLB, fully associative
 +
*** 4-Kbyte, 16-Kbyte, 2-Mbyte, 1-Gbyte page sizes
 +
** 3,072 entry L2 TLB, 24-way set associative
 +
*** 4-Kbyte, 16-Kbyte, 2-Mbyte, and 4-Mbyte pages, {{abbr|PDE|Page Directory Entry}}s to speed up table walks
 +
** Parity protected
 +
 
 +
4-Mbyte pages require two 2-Mbyte entries in all TLBs. 16-Kbyte page size refers to {{abbr|PTE|Page Table Entry}} coalescing of four physically consecutive and 16-Kbyte aligned 4-Kbyte pages. All caches and TLBs are competitively shared in multi-threaded mode.
 +
 
 +
==== System DRAM ====
 +
* Ryzen 7000 "{{amd|Raphael|l=core}}":
 +
** Up to PC5-41600 (DDR5-5200) without overclocking
 +
 
 +
* EPYC 9004 "{{amd|Genoa|l=core}}":
 +
** 12 channels per socket, two 40-bit (32 data, 8 ECC) DDR5 subchannels per channel
 +
** Up to 24 DIMMs, max. 6&nbsp;TiB
 +
** Up to PC5-38400 (DDR5-4800)
 +
** {{abbr|SR}}/{{abbr|DR}} {{abbr|RDIMM}}, {{abbr|4R}}/{{abbr|8R}} {{abbr|LRDIMM}}, {{abbr|3DS DIMM}}
 +
** ECC supported (x4, x8, x16, chipkill)<!--AMD-55901-0.97 Sec 3.7-->
 +
** DRAM bus parity and write data CRC options<!--ibid-->
 +
 
 +
Sources: <ref name="amd-55901-ppr-1911"/><ref name="amd-57647-zen4-optim"/><ref name="amd-58015-9004-overv"/>
  
== Bibliography ==
+
== All Zen 4 Processors ==
{{reflist}}
+
<!-- NOTE:
 +
This table is generated automatically from the data in the actual articles.
 +
If a microprocessor is missing from the list, an appropriate article for it needs to be
 +
created and tagged accordingly.
 +
Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 +
-->
 +
{| class="comptable3"
 +
! List of all Zen 4-based Processors
 +
|}
 +
<div class="comptable-scroller sticky">
 +
{| class="comptable3 stickycol1 sortable"
 +
|- class="header continued"
 +
! Model
 +
! Codename
 +
! {{abbr|C|Cores}}
 +
! {{abbr|T|Threads}}
 +
! data-sort-type=number | L2$
 +
! data-sort-type=number | L3$
 +
! data-sort-type=number | Frequ.
 +
! data-sort-type=number | Turbo
 +
! data-sort-type=number | Turbo 1C
 +
! Memory
 +
! data-sort-type=number | {{abbr|TDP}}
 +
! data-sort-type=date | Launched
 +
! Release<br />Price
 +
! {{abbr|OPN}}
 +
|- class="separator sortbottom"
 +
| colspan=4 | [[Uniprocessors]]
 +
| colspan=10 | &nbsp;
 +
{{#invoke:comptable|askt
 +
|condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::1]]
 +
|sort=name |valuesep=,<br /> |template=<nowiki>
 +
|-
 +
| data-sort-value="{{{name#-}}}" | {{amd|{{{microprocessor family#-}}}}} [[{{{page#-}}}|{{{model number#-}}}]]
 +
| {{amd|{{{core name#-}}}|l=core}}
 +
| {{{core count}}}
 +
| {{{thread count}}}
 +
| {{{l2$ size}}}
 +
| {{{l3$ size}}}
 +
| {{{base frequency#GHz}}}
 +
| {{{turbo frequency#GHz}}}
 +
| {{{turbo frequency (1 core)#GHz}}}
 +
| {{{supported memory type}}}
 +
| {{{tdp}}}
 +
| {{{first launched}}}
 +
| {{#if:{{{release price}}}|{{{release price}}}{{#ifeq:{{{release price}}}|{{{release price (tray)}}}|&#32;(1k)}} }}
 +
| {{{part number}}}</nowiki>|outrotemplate=<nowiki>
 +
|- class="separator sortbottom"
 +
| colspan=4 | [[Multiprocessors]] (dual-socket)
 +
| colspan=10 | &nbsp;
 +
{{#invoke:comptable|askt
 +
|condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::>>1]]
 +
|sort=name |valuesep=,<br /> |template=&lt;nowiki>{{{#template}}}&lt;/nowiki>}}</nowiki>}}
 +
|-
 +
! Count: {{#ask:[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] |format=count}}
 +
|}
 +
</div>
  
 
== Designers ==
 
== Designers ==
Line 63: Line 243:
  
 
== Bibliography ==
 
== Bibliography ==
{{reflist}}
+
 
 +
== References ==
 +
<references>
 +
<ref name="ryzen-7000-preview">{{cite techdoc|title=Ryzen 7000 Desktop Preview|url=https://www.angstronomics.com/p/ryzen-7000-desktop-preview|publ=Angstronomics|date=2022-08-29}}</ref>
 +
<ref name="amd-55901-ppr-1911">{{cite techdoc|title=Processor Programming Reference (PPR) for AMD Family 19h Models 11h, Revision B1 Processors|url=https://www.amd.com/system/files/TechDocs/55901_0.25.zip|publ=AMD|pid=55901|rev=0.25|date=2022-11-10}}</ref>
 +
<ref name="amd-57647-zen4-optim">{{cite techdoc|title=Software Optimization Guide for the AMD Zen4 Microarchitecture|url=https://www.amd.com/system/files/TechDocs/57647.zip|publ=AMD|pid=57647|rev=1.00|date=2023-01-06}}</ref>
 +
<ref name="amd-58015-9004-overv">{{cite techdoc|title=AMD EPYC™ 9004 Series Architecture Overview|url=https://www.amd.com/system/files/documents/58015-epyc-9004-tg-architecture-overview.pdf|publ=AMD|pid=58015|rev=1.1|date=2022-12}}</ref>
 +
</references>
  
 
== See Also ==
 
== See Also ==
* AMD {{\\|Zen}}
+
* AMD {{\\|Zen}}, {{\\|Zen 2}}, {{\\|Zen 3}}
 
* Intel {{intel|Meteor Lake|l=arch}}
 
* Intel {{intel|Meteor Lake|l=arch}}

Latest revision as of 18:28, 13 November 2023

Edit Values
Zen 4 µarch
General Info
Arch TypeCPU
DesignerAMD
ManufacturerTSMC
Process5 nm, 6 nm
Succession

Zen 4 is a microarchitecture developed by AMD as a successor to Zen 3. See press release for details: AMD Launches Ryzen 7000 Series Desktop Processors

History[edit]

Zen 4 on the roadmap.

Zen 4 was first mentioned by Forrest Norrod during AMD's EPYC One Year Anniversary webinar. During the next horizon event which was held on November 6, 2018, AMD stated that Zen 4 was at the design completion phase.

Products[edit]

Symbol version future.svg Preliminary Data! Information presented in this article deal with future products, data, features, and specifications that have yet to be finalized, announced, or released. Information may be incomplete and can change by final release.
Processor Series Cores/Threads Market
EPYC 9004 "Genoa" Up to 96/192 High-end server multiprocessors
Ryzen Threadripper 7000 "Storm Peak" Up to 96/192 Workstation & enthusiasts
Ryzen 7000 "Raphael" Up to 16/32 Mainstream to high-end desktops & enthusiasts
Ryzen 7000 APU "Dragon Range" Up to 16/32 High-end mobile processors with GPU
Ryzen 7000 APU "Phoenix Point" Up to 8/16 Mainstream desktop & mobile processors with GPU

Cores using variant Zen 4 uarch:

Processor Series Cores/Threads Market
EPYC 9004 "Bergamo" Up to 128/256 Cloud multiprocessors (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.)
EPYC 8004 "Siena" Up to 64/128 Edge-optimized server chips

Architectural Codenames:

Arch Codename
Core Persephone
CCD Durango

Process Technology[edit]

Processors implementing Zen 4 are SoCs configured as a Multi-Chip Module or monolithic chip. MCMs consist of a single I/O die and up to 12 Core Complex Dies attached with full-duplex serial point-to-point links. The IOD contains memory controllers, I/O controllers, microcontrollers for security purposes and power management, and other peripherals. The CCDs communicate with peripherals and each other through the Data and Control Fabrics on the I/O die, and each contain a single Core Complex (CCX). The monolithic chips integrate a subset of the IOD facilities and additional peripherals tailored for their target market, a CCX, and a GPU. A CCX contains 8 CPU cores (fewer may be usable on some models) communicating through a shared L3 cache.

("Bergamo" processors configuration TBD.)

The chips are fabricated by TSMC, CCDs and monolithic chips on a 5 nm node, IODs on a 6 nm node.

Architecture[edit]

Zen 4 is a 64-bit superscalar, out-of-order, 2-way SMT microarchitecture with advanced dynamic branch prediction, 4-way decoding of x86 instructions with a stack optimizer, multiple caches including an Op cache for decoded instructions and prefetchers for code and data, four integer/address and two floating point instruction schedulers, 3-way address generation, 5-way integer execution. 4-way 256-bit wide floating point execution, a speculative, out-of-order load/store unit capable of up to three loads or two stores per cycle with a 48/88-entry load and 64-entry store queue, write-combining, and 5-level paging with four TLBs and six hardware page table walkers.

Key changes from Zen 3[edit]

  • AVX-512 instructions support, 256-bit data path[1]
  • L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
  • Op cache size increased from 4,096 to 6,912 Ops per core
  • L2 cache doubled from 512 KiB to 1 MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum
  • L3 cache average load-to-use latency increased from 46 to 50 cycles
  • Five-level paging; Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
  • Improved cache load, write and prefetch from/to register (less latency)
  • Higher Transistor Density, due to 5nm process
  • Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
  • Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries)
  • REPE CMPSB (sometimes used to implement string comparison) is significantly sped up, processes more than 32 bytes/cycle when operating on L1 data.
  • BSF, BSR, and BMI1 instructions BLSI, BLSMSK, BLSR, TZCNT have smaller latency of 1 and x2 throughput (4 insn/cycle).
  • Latency and/or throughput of VPERMx, V[P]BROADCASTx, VPMOV{S,Z}Xx instructions improved.
  • Some ALU operations on vector registers increased throughput from 2 to 3 ops/cycle.
  • Some ALU operations on vector registers (VPABSx,VPHADDx,VPHSUBx,VPSLLx,VPSRLx,VPSRAx,VPACKx,VPSIGNx,VMAXx,VMINx) increased latency by 1 cycle.


Package level changes:

  • EPYC 9004 "Genoa": Max. core/thread count 96/192, up from 64/128 on EPYC 7003 "Milan"
  • EPYC "Bergamo": Max. 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space
  • Support for DDR5 memory and PCIe Gen 5
  • New sockets AM5 (client), SP5 and SP6 (server), FP7/FP7r2 (mobile)
  • APUs: RDNA2-based iGPU with 2 compute units (128 stream processors)

New Instructions[edit]

Zen 4 introduced the following ISA enhancements:

Memory Hierarchy[edit]

Data and Instruction Caches[edit]

  • L0 Op Cache:
    • Up to 6,912 Ops per core, 12-way set associative
    • 9 Op line size (restrictions apply depending on instruction type)
    • Parity protected
  • L1I Cache:
    • 32 KiB per core, 8-way set associative
    • 64 B line size
    • Parity protected
  • L1D Cache:
    • 32 KiB per core, 8-way set associative
    • 64 B line size
    • Write-back policy
    • 4-5 cycles latency for Int
    • 7-8 cycles latency for FP
    • ECC
  • L2 Cache:
    • 512 KiB or 1 MiB per core (varies by processor model), 8-way set associative
    • 64 B line size
    • Write-back policy
    • Inclusive of L1
    • ≥ 14 cycles latency
    • DEC-TED ECC, tag & state arrays SEC-DED
  • L3 Cache:
    • "Genoa": up to 32 MiB/CCX (8 cores), up to 384 MiB total
    • Shared by all cores in the CCX, configurable
    • 16-way set associative
    • 64 B line size
    • L2 victim cache
    • Write-back policy
    • 50 cycles average load-to-use latency
    • DEC-TED ECC, tag array & shadow tags SEC-DED
    • QoS Monitoring and Enforcement with BMEC, L3RR, L3SBE

Translation Lookaside Buffers[edit]

  • ITLB
    • 64 entry L1 TLB, fully associative
      • 4-Kbyte, 2-Mbyte, 1-Gbyte page sizes
    • 512 entry L2 TLB, 8-way set associative
      • 4-Kbyte, 2-Mbyte, and 4-Mbyte pages
    • Parity protected
  • DTLB
    • 72 entry L1 TLB, fully associative
      • 4-Kbyte, 16-Kbyte, 2-Mbyte, 1-Gbyte page sizes
    • 3,072 entry L2 TLB, 24-way set associative
      • 4-Kbyte, 16-Kbyte, 2-Mbyte, and 4-Mbyte pages, PDEs to speed up table walks
    • Parity protected

4-Mbyte pages require two 2-Mbyte entries in all TLBs. 16-Kbyte page size refers to PTE coalescing of four physically consecutive and 16-Kbyte aligned 4-Kbyte pages. All caches and TLBs are competitively shared in multi-threaded mode.

System DRAM[edit]

  • Ryzen 7000 "Raphael":
    • Up to PC5-41600 (DDR5-5200) without overclocking
  • EPYC 9004 "Genoa":
    • 12 channels per socket, two 40-bit (32 data, 8 ECC) DDR5 subchannels per channel
    • Up to 24 DIMMs, max. 6 TiB
    • Up to PC5-38400 (DDR5-4800)
    • SR/DR RDIMM, 4R/8R LRDIMM, 3DS DIMM
    • ECC supported (x4, x8, x16, chipkill)
    • DRAM bus parity and write data CRC options

Sources: [2][3][4]

All Zen 4 Processors[edit]

List of all Zen 4-based Processors
Model Codename C T L2$ L3$ Frequ. Turbo Turbo 1C Memory TDP Launched Release
Price
OPN
Uniprocessors  
EPYC 9354P Genoa 32 64 32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.25 GHz
3,250 MHz
3,250,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
3.8 GHz
3,800 MHz
3,800,000 kHz
DDR5-4800 280 W
280,000 mW
0.375 hp
0.28 kW
10 November 2022 $ 2,730.00
€ 2,457.00
£ 2,211.30
¥ 282,090.90
(1k)
100-100000805,
100-100000805WOF
EPYC 9454P Genoa 48 96 48 MiB
49,152 KiB
50,331,648 B
0.0469 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
2.75 GHz
2,750 MHz
2,750,000 kHz
3.65 GHz
3,650 MHz
3,650,000 kHz
3.8 GHz
3,800 MHz
3,800,000 kHz
DDR5-4800 290 W
290,000 mW
0.389 hp
0.29 kW
10 November 2022 $ 4,598.00
€ 4,138.20
£ 3,724.38
¥ 475,111.34
(1k)
100-100000873,
100-100000873WOF
EPYC 9554P Genoa 64 128 64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.1 GHz
3,100 MHz
3,100,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
DDR5-4800 360 W
360,000 mW
0.483 hp
0.36 kW
10 November 2022 $ 7,104.00
€ 6,393.60
£ 5,754.24
¥ 734,056.32
(1k)
100-100000804,
100-100000804WOF
EPYC 9654P Genoa 96 192 96 MiB
98,304 KiB
100,663,296 B
0.0938 GiB
384 MiB
393,216 KiB
402,653,184 B
0.375 GiB
2.4 GHz
2,400 MHz
2,400,000 kHz
3.55 GHz
3,550 MHz
3,550,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 360 W
360,000 mW
0.483 hp
0.36 kW
10 November 2022 $ 10,625.00
€ 9,562.50
£ 8,606.25
¥ 1,097,881.25
(1k)
100-100000803,
100-100000803WOF
Ryzen 5 7600X Raphael 6 12 6 MiB
6,144 KiB
6,291,456 B
0.00586 GiB
32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
4.7 GHz
4,700 MHz
4,700,000 kHz
5.3 GHz
5,300 MHz
5,300,000 kHz
105 W
105,000 mW
0.141 hp
0.105 kW
Ryzen 7 7700 Raphael 8 16 8 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
3.8 GHz
3,800 MHz
3,800,000 kHz
5.3 GHz
5,300 MHz
5,300,000 kHz
65 W
65,000 mW
0.0872 hp
0.065 kW
10 January 2023 $ 339.00
€ 305.10
£ 274.59
¥ 35,028.87
100-000000592,
100-100000592BOX
Ryzen 7 7700X Raphael 8 16 8 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
4.5 GHz
4,500 MHz
4,500,000 kHz
5.4 GHz
5,400 MHz
5,400,000 kHz
105 W
105,000 mW
0.141 hp
0.105 kW
27 September 2022 $ 399.00
€ 359.10
£ 323.19
¥ 41,228.67
100-000000591,
100-100000591WOF
Ryzen 7 7800X3D Raphael 8 16 8 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
96 MiB
98,304 KiB
100,663,296 B
0.0938 GiB
4.2 GHz
4,200 MHz
4,200,000 kHz
5 GHz
5,000 MHz
5,000,000 kHz
120 W
120,000 mW
0.161 hp
0.12 kW
Ryzen 9 7900X3D Raphael 12 24 12 MiB
12,288 KiB
12,582,912 B
0.0117 GiB
128 MiB
131,072 KiB
134,217,728 B
0.125 GiB
4.4 GHz
4,400 MHz
4,400,000 kHz
5.6 GHz
5,600 MHz
5,600,000 kHz
120 W
120,000 mW
0.161 hp
0.12 kW
Ryzen 9 7950X3D Raphael 16 32 16 MiB
16,384 KiB
16,777,216 B
0.0156 GiB
128 MiB
131,072 KiB
134,217,728 B
0.125 GiB
4.2 GHz
4,200 MHz
4,200,000 kHz
5.7 GHz
5,700 MHz
5,700,000 kHz
120 W
120,000 mW
0.161 hp
0.12 kW
28 February 2023 $ 699.00
€ 629.10
£ 566.19
¥ 72,227.67
100-000000908,
100-000000908WOF
Multiprocessors (dual-socket)  
EPYC 9124 Genoa 16 32 16 MiB
16,384 KiB
16,777,216 B
0.0156 GiB
64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
3 GHz
3,000 MHz
3,000,000 kHz
3.6 GHz
3,600 MHz
3,600,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 200 W
200,000 mW
0.268 hp
0.2 kW
10 November 2022 $ 1,083.00
€ 974.70
£ 877.23
¥ 111,906.39
(1k)
100-100000802,
100-100000802WOF
EPYC 9174F Genoa 16 32 16 MiB
16,384 KiB
16,777,216 B
0.0156 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
4.1 GHz
4,100 MHz
4,100,000 kHz
4.15 GHz
4,150 MHz
4,150,000 kHz
4.4 GHz
4,400 MHz
4,400,000 kHz
DDR5-4800 320 W
320,000 mW
0.429 hp
0.32 kW
10 November 2022 $ 3,850.00
€ 3,465.00
£ 3,118.50
¥ 397,820.50
(1k)
100-100000796,
100-100000796WOF
EPYC 9224 Genoa 24 48 24 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
2.5 GHz
2,500 MHz
2,500,000 kHz
3.65 GHz
3,650 MHz
3,650,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 200 W
200,000 mW
0.268 hp
0.2 kW
10 November 2022 $ 1,825.00
€ 1,642.50
£ 1,478.25
¥ 188,577.25
(1k)
100-100000939,
100-100000939WOF
EPYC 9254 Genoa 24 48 24 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
128 MiB
131,072 KiB
134,217,728 B
0.125 GiB
2.9 GHz
2,900 MHz
2,900,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
4.15 GHz
4,150 MHz
4,150,000 kHz
DDR5-4800 200 W
200,000 mW
0.268 hp
0.2 kW
10 November 2022 $ 2,299.00
€ 2,069.10
£ 1,862.19
¥ 237,555.67
(1k)
100-100000480,
100-100000480WOF
EPYC 9274F Genoa 24 48 24 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
4.05 GHz
4,050 MHz
4,050,000 kHz
4.1 GHz
4,100 MHz
4,100,000 kHz
4.3 GHz
4,300 MHz
4,300,000 kHz
DDR5-4800 320 W
320,000 mW
0.429 hp
0.32 kW
10 November 2022 $ 3,060.00
€ 2,754.00
£ 2,478.60
¥ 316,189.80
(1k)
100-100000794,
100-100000794WOF
EPYC 9334 Genoa 32 64 32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
128 MiB
131,072 KiB
134,217,728 B
0.125 GiB
2.7 GHz
2,700 MHz
2,700,000 kHz
3.85 GHz
3,850 MHz
3,850,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
DDR5-4800 210 W
210,000 mW
0.282 hp
0.21 kW
10 November 2022 $ 2,990.00
€ 2,691.00
£ 2,421.90
¥ 308,956.70
(1k)
100-100000800,
100-100000800WOF
EPYC 9354 Genoa 32 64 32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.25 GHz
3,250 MHz
3,250,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
3.8 GHz
3,800 MHz
3,800,000 kHz
DDR5-4800 280 W
280,000 mW
0.375 hp
0.28 kW
10 November 2022 $ 3,420.00
€ 3,078.00
£ 2,770.20
¥ 353,388.60
(1k)
100-100000798,
100-100000798WOF
EPYC 9374F Genoa 32 64 32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.85 GHz
3,850 MHz
3,850,000 kHz
4.1 GHz
4,100 MHz
4,100,000 kHz
4.3 GHz
4,300 MHz
4,300,000 kHz
DDR5-4800 320 W
320,000 mW
0.429 hp
0.32 kW
10 November 2022 $ 4,850.00
€ 4,365.00
£ 3,928.50
¥ 501,150.50
(1k)
100-100000792,
100-100000792WOF
EPYC 9454 Genoa 48 96 48 MiB
49,152 KiB
50,331,648 B
0.0469 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
2.75 GHz
2,750 MHz
2,750,000 kHz
3.65 GHz
3,650 MHz
3,650,000 kHz
3.8 GHz
3,800 MHz
3,800,000 kHz
DDR5-4800 290 W
290,000 mW
0.389 hp
0.29 kW
10 November 2022 $ 5,225.00
€ 4,702.50
£ 4,232.25
¥ 539,899.25
(1k)
100-100000478,
100-100000478WOF
EPYC 9474F Genoa 48 96 48 MiB
49,152 KiB
50,331,648 B
0.0469 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.6 GHz
3,600 MHz
3,600,000 kHz
3.95 GHz
3,950 MHz
3,950,000 kHz
4.1 GHz
4,100 MHz
4,100,000 kHz
DDR5-4800 360 W
360,000 mW
0.483 hp
0.36 kW
10 November 2022 $ 6,780.00
€ 6,102.00
£ 5,491.80
¥ 700,577.40
(1k)
100-100000788,
100-100000788WOF
EPYC 9534 Genoa 64 128 64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
2.45 GHz
2,450 MHz
2,450,000 kHz
3.55 GHz
3,550 MHz
3,550,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 280 W
280,000 mW
0.375 hp
0.28 kW
10 November 2022 $ 8,803.00
€ 7,922.70
£ 7,130.43
¥ 909,613.99
(1k)
100-100000799,
100-100000799WOF
EPYC 9554 Genoa 64 128 64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.1 GHz
3,100 MHz
3,100,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
DDR5-4800 360 W
360,000 mW
0.483 hp
0.36 kW
10 November 2022 $ 9,087.00
€ 8,178.30
£ 7,360.47
¥ 938,959.71
(1k)
100-100000790,
100-100000790WOF
EPYC 9634 Genoa 84 168 84 MiB
86,016 KiB
88,080,384 B
0.082 GiB
384 MiB
393,216 KiB
402,653,184 B
0.375 GiB
2.25 GHz
2,250 MHz
2,250,000 kHz
3.1 GHz
3,100 MHz
3,100,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 290 W
290,000 mW
0.389 hp
0.29 kW
10 November 2022 $ 10,304.00
€ 9,273.60
£ 8,346.24
¥ 1,064,712.32
(1k)
100-100000797,
100-100000797WOF
EPYC 9654 Genoa 96 192 96 MiB
98,304 KiB
100,663,296 B
0.0938 GiB
384 MiB
393,216 KiB
402,653,184 B
0.375 GiB
2.4 GHz
2,400 MHz
2,400,000 kHz
3.55 GHz
3,550 MHz
3,550,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 360 W
360,000 mW
0.483 hp
0.36 kW
10 November 2022 $ 11,805.00
€ 10,624.50
£ 9,562.05
¥ 1,219,810.65
(1k)
100-100000789,
100-100000789WOF
Count: 24

Designers[edit]

  • Mike Clark(?), chief architect

Bibliography[edit]

References[edit]

  1. "Ryzen 7000 Desktop Preview", Angstronomics, August 29, 2022
  2. "Processor Programming Reference (PPR) for AMD Family 19h Models 11h, Revision B1 Processors", AMD Publ. #55901, Rev. 0.25, November 10, 2022
  3. "Software Optimization Guide for the AMD Zen4 Microarchitecture", AMD Publ. #57647, Rev. 1.00, January 6, 2023
  4. "AMD EPYC™ 9004 Series Architecture Overview", AMD Publ. #58015, Rev. 1.1, December 2022

See Also[edit]

codenameZen 4 +
designerAMD +
full page nameamd/microarchitectures/zen 4 +
instance ofmicroarchitecture +
manufacturerTSMC +
microarchitecture typeCPU +
nameZen 4 +
process5 nm (0.005 μm, 5.0e-6 mm) + and 6 nm (0.006 μm, 6.0e-6 mm) +