From WikiChip
Difference between revisions of "amd/microarchitectures/zen 4"
< amd‎ | microarchitectures

(Corrected Warhol to Raphael)
(6.75k is not 6750, increase from 4096=8*8*2^6 to 6912=9*12*2^6 (=4096*9/8 +50%))
 
(38 intermediate revisions by 15 users not shown)
Line 6: Line 6:
 
|manufacturer=TSMC
 
|manufacturer=TSMC
 
|process=5 nm
 
|process=5 nm
 +
|process 2=6 nm
 
|predecessor=Zen 3
 
|predecessor=Zen 3
 
|predecessor link=amd/microarchitectures/zen 3
 
|predecessor link=amd/microarchitectures/zen 3
Line 12: Line 13:
 
|succession=Yes
 
|succession=Yes
 
}}
 
}}
'''Zen 4''' is a planned [[microarchitecture]] being developed by [[AMD]] as a successor to {{\\|Zen 3}}.
+
'''Zen 4''' is a [[microarchitecture]] developed by [[AMD]] as a successor to {{\\|Zen 3}}. See press release for details: [https://www.amd.com/en/press-releases/2022-08-29-amd-launches-ryzen-7000-series-desktop-processors-zen-4-architecture-the AMD Launches Ryzen 7000 Series Desktop Processors]
  
 
== History ==
 
== History ==
Line 25: Line 26:
 
! Processor Series !! Cores/Threads !! Market
 
! Processor Series !! Cores/Threads !! Market
 
|-
 
|-
| EPYC 7004 "{{amd|Genoa|l=core}}" || Up to 96/192 || High-end server [[multiprocessors]]
+
| EPYC 9004 "{{amd|Genoa|l=core}}" || Up to 96/192 || High-end server [[multiprocessors]]
 
|-
 
|-
| {{amd|Raphael|l=core}} || Up to 16/32 || Mainstream to high-end desktops & enthusiasts market processors
+
| Ryzen Threadripper 7000 "{{amd|Storm Peak|l=core}}" || Up to 96/192 || Workstation & enthusiasts
 
|-
 
|-
| {{amd|Rembrandt|l=core}} || Up to 8/16 || Mainstream desktop & mobile processors with GPU  
+
| Ryzen 7000 "{{amd|Raphael|l=core}}" || Up to 16/32 || Mainstream to high-end desktops & enthusiasts
 +
|-
 +
| Ryzen 7000 APU "{{amd|Dragon Range|l=core}}" || Up to 16/32 || High-end mobile processors with GPU
 +
|-
 +
| Ryzen 7000 APU "{{amd|Phoenix Point|l=core}}" || Up to 8/16 || Mainstream desktop & mobile processors with GPU  
 
|}
 
|}
  
Line 38: Line 43:
 
! Processor Series !! Cores/Threads !! Market
 
! Processor Series !! Cores/Threads !! Market
 
|-
 
|-
| {{amd|Bergamo|l=core}} || Up to 128/128? || Cloud multiprocessing (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core likely sacrificing AVX-512, L3 and possibly SMT)
+
| EPYC 9004 "{{amd|Bergamo|l=core}}" || Up to 128/256 || Cloud [[multiprocessors]] (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.)
 +
|-
 +
| EPYC 8004 "{{amd|Siena|l=core}}" || Up to 64/128 || Edge-optimized server chips
 +
|}
 +
 
 +
'''Architectural Codenames:'''
 +
{| class="wikitable"
 +
|-
 +
! Arch !! Codename
 +
|-
 +
| Core || Persephone
 +
|-
 +
| {{abbr|CCD}} || Durango
 
|}
 
|}
  
 
== Process Technology ==
 
== Process Technology ==
AMD claims that Zen4 is going to be produced on a [[5nm]] node by [[TSMC]].
+
Processors implementing Zen 4 are {{abbr|SoC}}s configured as a Multi-Chip Module or monolithic chip. MCMs consist of a single I/O die and up to 12 Core Complex Dies attached with full-duplex serial point-to-point links. The IOD contains memory controllers, I/O controllers, microcontrollers for security purposes and power management, and other peripherals. The CCDs communicate with peripherals and each other through the Data and Control Fabrics on the I/O die, and each contain a single Core Complex (CCX). The monolithic chips integrate a subset of the IOD facilities and additional peripherals tailored for their target market, a CCX, and a GPU. A CCX contains 8 CPU cores (fewer may be usable on some models) communicating through a shared L3 cache.
 +
 
 +
("Bergamo" processors configuration TBD.)
 +
 
 +
The chips are fabricated by [[TSMC]], CCDs and monolithic chips on a [[5&nbsp;nm]] node, IODs on a [[6&nbsp;nm]] node.
  
 
== Architecture ==
 
== Architecture ==
{{future information}}
+
Zen 4 is a 64-bit superscalar, out-of-order, 2-way [[SMT]] microarchitecture with advanced dynamic branch prediction, 4-way decoding of [[x86]] instructions with a stack optimizer, multiple caches including an Op cache for decoded instructions and prefetchers for code and data, four integer/address and two floating point instruction schedulers, 3-way address generation, 5-way integer execution. 4-way 256-bit wide floating point execution, a speculative, out-of-order load/store unit capable of up to three loads or two stores per cycle with a 48/88-entry load and 64-entry store queue, write-combining, and 5-level paging with four {{abbr|TLB}}s and six hardware page table walkers.
  
 
=== Key changes from {{\\|Zen 3}} ===
 
=== Key changes from {{\\|Zen 3}} ===
* Core
+
* {{x86|AVX-512}} instructions support, 256-bit data path<ref name="ryzen-7000-preview"/>
** AVX-512 instructions support
+
* L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
** L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
+
* Op cache size increased from 4,096 to 6,912 Ops per core
** L2 cache doubled from 512 KiB to 1 MiB per core
+
* L2 cache doubled from 512&nbsp;KiB to 1&nbsp;MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum
** Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
+
* L3 cache average load-to-use latency increased from 46 to 50 cycles
** Improved cache load, write and prefetch from/to register (less latency).
+
* Five-level paging; Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
** Higher Transistor Density, due to 5nm process
+
* Improved cache load, write and prefetch from/to register (less latency)
** Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
+
* Higher Transistor Density, due to 5nm process
* Package
+
* Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
** Raised maximum core/thread count from 64/128 to at least 96/192 (EPYC 7004) (Bergamo supports 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space)
+
* Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries)
** Support for DDR5 memory and PCIe Gen 5
+
* REPE CMPSB (sometimes used to implement string comparison) is significantly sped up, processes more than 32 bytes/cycle when operating on L1 data.
** New sockets {{amd|AM5|l=pack}} (client), {{amd|SP5|l=pack}} (server), {{amd|FP7|FP7/FP7r2|l=pack}} (mobile)
+
* BSF, BSR, and BMI1 instructions BLSI, BLSMSK, BLSR, TZCNT have smaller latency of 1 and x2 throughput (4 insn/cycle).
 +
* Latency and/or throughput of VPERMx, V[P]BROADCASTx, VPMOV{S,Z}Xx instructions improved.
 +
* Some ALU operations on vector registers increased throughput from 2 to 3 ops/cycle.
 +
* Some ALU operations on vector registers (VPABSx,VPHADDx,VPHSUBx,VPSLLx,VPSRLx,VPSRAx,VPACKx,VPSIGNx,VMAXx,VMINx) increased latency by 1 cycle.
 +
 
 +
 
 +
Package level changes:
 +
* EPYC 9004 "{{amd|Genoa|l=core}}": Max. core/thread count 96/192, up from 64/128 on EPYC 7003 "{{amd|Milan|l=core}}"
 +
* EPYC "{{amd|Bergamo|l=core}}": Max. 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space
 +
* Support for DDR5 memory and PCIe Gen 5
 +
* New sockets {{amd|AM5|l=pack}} (client), {{amd|SP5|l=pack}} and {{amd|SP6|l=pack}} (server), {{amd|FP7|FP7/FP7r2|l=pack}} (mobile)
 +
* {{abbr|APU}}s: RDNA2-based iGPU with 2 compute units (128 stream processors)
  
 
=== New Instructions ===
 
=== New Instructions ===
 
Zen 4 introduced the following ISA enhancements:
 
Zen 4 introduced the following ISA enhancements:
  
<!--Update AVX-512 article when confirmed.-->
 
 
* {{x86|AVX-512}} - 512-bit Vector Instructions
 
* {{x86|AVX-512}} - 512-bit Vector Instructions
 
** {{x86|AVX512F}} - Foundation (first introduced with [[Intel]] {{intel|skylake (server)|Skylake|l=arch}})
 
** {{x86|AVX512F}} - Foundation (first introduced with [[Intel]] {{intel|skylake (server)|Skylake|l=arch}})
Line 71: Line 102:
 
** {{x86|AVX512DQ}} - Doubleword and Quadword Instructions (Skylake X)
 
** {{x86|AVX512DQ}} - Doubleword and Quadword Instructions (Skylake X)
 
** {{x86|AVX512BW}} - Byte and Word Instructions (Skylake X)
 
** {{x86|AVX512BW}} - Byte and Word Instructions (Skylake X)
** {{x86|AVX512 IFMA}} - Integer Fused Multiply-Add ({{intel|Cannon Lake|l=arch}})
+
** {{x86|AVX512_IFMA}} - Integer Fused Multiply-Add ({{intel|Cannon Lake|l=arch}})
** {{x86|AVX512 VBMI}} - Vector Bit Manipulation Instructions (Cannon Lake)
+
** {{x86|AVX512_VBMI}} - Vector Bit Manipulation Instructions (Cannon Lake)
** {{x86|AVX512 VPOPCNTDQ}} - Vector Population Count Instruction ({{intel|ice lake (server)|Ice Lake|l=arch}})
+
** {{x86|AVX512_VPOPCNTDQ}} - Vector Population Count Instructions ({{intel|ice lake (server)|Ice Lake|l=arch}})
** {{x86|AVX512 BITALG}} - Bit Algorithms (Ice Lake)
+
** {{x86|AVX512_BITALG}} - Bit Algorithms (Ice Lake)
** {{x86|AVX512 VBMI2}} - Vector Bit Manipulation Instructions 2 (Ice Lake)
+
** {{x86|AVX512_VBMI2}} - Vector Bit Manipulation Instructions 2 (Ice Lake)  
** {{x86|AVX512 VNNI}} - Vector Neural Network Instructions (Ice Lake)
+
** {{x86|AVX512_VNNI}} - Vector Neural Network Instructions (Ice Lake)
** {{x86|AVX512 BF16}} - [[bfloat16|BFloat16]] Instructions ({{intel|Cooper Lake|l=arch}})
+
** {{x86|AVX512_BF16}} - [[bfloat16|BFloat16]] Instructions ({{intel|Cooper Lake|l=arch}})
** ''Not supported'': AVX512ER, AVX512PF ({{intel|Knights Landing|l=arch}}); AVX512 4VNNIW, 4FMAPS ({{intel|Knights Mill|l=arch}}); VP2INTERSECT ({{intel|Tiger Lake|l=arch}})
+
** ''Not supported'': AVX512ER, AVX512PF ({{intel|Knights Landing|l=arch}}); AVX512 4VNNIW, 4FMAPS ({{intel|Knights Mill|l=arch}}); VP2INTERSECT ({{intel|Tiger Lake|l=arch}}); FP16 ({{intel|Sapphire Rapids|l=arch}})
* GFNI - Galois Field New Instructions (first introduced with [[Intel]] {{intel|ice lake (server)|Ice Lake|l=arch}})
+
* {{x86|GFNI}} - Galois Field New Instructions (first introduced with [[Intel]] {{intel|ice lake (server)|Ice Lake|l=arch}})
 
** <code>VGF2P8AFFINEQB</code> - Galois field affine transformation
 
** <code>VGF2P8AFFINEQB</code> - Galois field affine transformation
 
** <code>VGF2P8AFFINEINVQB</code> - Galois field affine transformation inverse
 
** <code>VGF2P8AFFINEINVQB</code> - Galois field affine transformation inverse
Line 87: Line 118:
 
==== Data and Instruction Caches ====
 
==== Data and Instruction Caches ====
 
* L0 Op Cache:
 
* L0 Op Cache:
** 4,096(?) Ops per core, 8-way(?) set associative
+
** Up to 6,912 Ops per core, 12-way set associative
** 8 Op line size(?)
+
** 9 Op line size (restrictions apply depending on instruction type)
 
** Parity protected
 
** Parity protected
 
* L1I Cache:
 
* L1I Cache:
** 32 KiB per core, 8-way set associative
+
** 32&nbsp;KiB per core, 8-way set associative
** 64 B line size
+
** 64&nbsp;B line size
 
** Parity protected
 
** Parity protected
 
* L1D Cache:
 
* L1D Cache:
** 32 KiB per core, 8-way set associative
+
** 32&nbsp;KiB per core, 8-way set associative
** 64 B line size
+
** 64&nbsp;B line size
 
** Write-back policy
 
** Write-back policy
** ? cycles latency for Int
+
** 4-5 cycles latency for Int
** ? cycles latency for FP
+
** 7-8 cycles latency for FP
 
** ECC
 
** ECC
 
* L2 Cache:
 
* L2 Cache:
** 1 MiB per core, 8-way set associative
+
** 512&nbsp;KiB or 1&nbsp;MiB per core (varies by processor model), 8-way set associative
** 64 B line size
+
** 64&nbsp;B line size
 
** Write-back policy
 
** Write-back policy
** Inclusive of L1(?)
+
** Inclusive of L1
** ? cycles latency
+
** ≥ 14 cycles latency
 
** {{abbr|DEC-TED}} ECC, tag & state arrays {{abbr|SEC-DED}}<!--7 check bits for 42 tag bits; AMD-55901-0.97 Sec 3.5-->
 
** {{abbr|DEC-TED}} ECC, tag & state arrays {{abbr|SEC-DED}}<!--7 check bits for 42 tag bits; AMD-55901-0.97 Sec 3.5-->
 
* L3 Cache:
 
* L3 Cache:
<!--** "{{amd|Genoa|l=core}}": ? MiB/CCX, up to ? MiB total-->
+
** "{{amd|Genoa|l=core}}": up to 32&nbsp;MiB/{{abbr|CCX}} (8 cores), up to 384&nbsp;MiB total
 
** Shared by all cores in the CCX, configurable
 
** Shared by all cores in the CCX, configurable
 
** 16-way set associative
 
** 16-way set associative
** 64 B line size
+
** 64&nbsp;B line size
** L2 [[victim cache]](?)
+
** L2 [[victim cache]]
 
** Write-back policy
 
** Write-back policy
** ? cycles average load-to-use latency
+
** 50 cycles average load-to-use latency
 
** DEC-TED ECC, tag array & shadow tags SEC-DED<!--AMD-55901-0.97 Sec 3.5-->
 
** DEC-TED ECC, tag array & shadow tags SEC-DED<!--AMD-55901-0.97 Sec 3.5-->
** QoS Monitoring and Enforcement
+
** QoS Monitoring and Enforcement with {{abbr|BMEC|Bandwidth Monitoring Event Configuration}}, {{abbr|L3RR|L3 Range Reservation}}, {{abbr|L3SBE|L3 External Slow Memory Bandwidth Enforcement}}
  
 
==== Translation Lookaside Buffers ====
 
==== Translation Lookaside Buffers ====
 
* ITLB
 
* ITLB
** 64 entry L1 TLB, fully associative, all page sizes
+
** 64 entry L1 TLB, fully associative
** 512 entry L2 TLB, ?-way set associative
+
*** 4-Kbyte, 2-Mbyte, 1-Gbyte page sizes
 +
** 512 entry L2 TLB, 8-way set associative  
 
*** 4-Kbyte, 2-Mbyte, and 4-Mbyte pages
 
*** 4-Kbyte, 2-Mbyte, and 4-Mbyte pages
 
** Parity protected
 
** Parity protected
 
* DTLB
 
* DTLB
** 72 entry L1 TLB, fully associative, all page sizes
+
** 72 entry L1 TLB, fully associative
** 3,072 entry L2 TLB, 12-way set associative
+
*** 4-Kbyte, 16-Kbyte, 2-Mbyte, 1-Gbyte page sizes
*** 4-Kbyte, 2-Mbyte, and 4-Mbyte pages, PDEs to speed up table walks(?)
+
** 3,072 entry L2 TLB, 24-way set associative
 +
*** 4-Kbyte, 16-Kbyte, 2-Mbyte, and 4-Mbyte pages, {{abbr|PDE|Page Directory Entry}}s to speed up table walks
 
** Parity protected
 
** Parity protected
  
4-Mbyte pages require two 2-Mbyte entries in all TLBs. <!--TBD: All caches and TLBs are competitively shared in multi-threaded mode.-->
+
4-Mbyte pages require two 2-Mbyte entries in all TLBs. 16-Kbyte page size refers to {{abbr|PTE|Page Table Entry}} coalescing of four physically consecutive and 16-Kbyte aligned 4-Kbyte pages. All caches and TLBs are competitively shared in multi-threaded mode.
  
 
==== System DRAM ====
 
==== System DRAM ====
* EPYC 7004 "{{amd|Genoa|l=core}}":
+
* Ryzen 7000 "{{amd|Raphael|l=core}}":
** 12 channels per socket, two 40-bit DDR5 subchannels per channel
+
** Up to PC5-41600 (DDR5-5200) without overclocking
** Up to 24 DIMMs, max. ?&nbsp;TiB
+
 
** Up to PC5-41600 (DDR5-5200)
+
* EPYC 9004 "{{amd|Genoa|l=core}}":
 +
** 12 channels per socket, two 40-bit (32 data, 8 ECC) DDR5 subchannels per channel
 +
** Up to 24 DIMMs, max. 6&nbsp;TiB
 +
** Up to PC5-38400 (DDR5-4800)
 
** {{abbr|SR}}/{{abbr|DR}} {{abbr|RDIMM}}, {{abbr|4R}}/{{abbr|8R}} {{abbr|LRDIMM}}, {{abbr|3DS DIMM}}
 
** {{abbr|SR}}/{{abbr|DR}} {{abbr|RDIMM}}, {{abbr|4R}}/{{abbr|8R}} {{abbr|LRDIMM}}, {{abbr|3DS DIMM}}
 
** ECC supported (x4, x8, x16, chipkill)<!--AMD-55901-0.97 Sec 3.7-->
 
** ECC supported (x4, x8, x16, chipkill)<!--AMD-55901-0.97 Sec 3.7-->
 
** DRAM bus parity and write data CRC options<!--ibid-->
 
** DRAM bus parity and write data CRC options<!--ibid-->
  
Sources: <ref name="amd-55901-ppr-1910"/>
+
Sources: <ref name="amd-55901-ppr-1911"/><ref name="amd-57647-zen4-optim"/><ref name="amd-58015-9004-overv"/>
  
 
== All Zen 4 Processors ==
 
== All Zen 4 Processors ==
Line 151: Line 187:
 
Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 
Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 
-->
 
-->
{{comp table start}}
+
{| class="comptable3"
<table class="comptable sortable">
+
! List of all Zen 4-based Processors
{{comp table header|main|14:List of all Zen 4-based Processors}}
+
|}
{{comp table header|cols|Family|Codename|{{abbr|C|Cores}}|{{abbr|T|Threads}}|L2|L3|Base|Turbo|Memory|{{abbr|TDP}}|Launched|Price|{{abbr|OPN}}}}
+
<div class="comptable-scroller sticky">
{{comp table header|lsep|14:[[Uniprocessors]]}}
+
{| class="comptable3 stickycol1 sortable"
{{#ask: [[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::1]]
+
|- class="header continued"
|?full page name
+
! Model
|?model number
+
! Codename
|?microprocessor family
+
! {{abbr|C|Cores}}
|?core name
+
! {{abbr|T|Threads}}
|?core count
+
! data-sort-type=number | L2$
|?thread count
+
! data-sort-type=number | L3$
|?l2$ size
+
! data-sort-type=number | Frequ.
|?l3$ size
+
! data-sort-type=number | Turbo
|?base frequency#GHz
+
! data-sort-type=number | Turbo 1C
|?turbo frequency#GHz
+
! Memory
|?supported memory type
+
! data-sort-type=number | {{abbr|TDP}}
|?tdp
+
! data-sort-type=date | Launched
|?first launched
+
! Release<br />Price
|?release price
+
! {{abbr|OPN}}
|?part number
+
|- class="separator sortbottom"
|sort=model number
+
| colspan=4 | [[Uniprocessors]]
|format=template
+
| colspan=10 | &nbsp;
|template=proc table 3
+
{{#invoke:comptable|askt
|userparam=15
+
|condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::1]]
|mainlabel=-
+
|sort=name |valuesep=,<br /> |template=<nowiki>
|valuesep=,<br/>
+
|-
}}
+
| data-sort-value="{{{name#-}}}" | {{amd|{{{microprocessor family#-}}}}} [[{{{page#-}}}|{{{model number#-}}}]]
{{comp table header|lsep|14:[[Multiprocessors]] (dual-socket)}}
+
| {{amd|{{{core name#-}}}|l=core}}
{{#ask: [[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::>>1]]
+
| {{{core count}}}
|?full page name
+
| {{{thread count}}}
|?model number
+
| {{{l2$ size}}}
|?microprocessor family
+
| {{{l3$ size}}}
|?core name
+
| {{{base frequency#GHz}}}
|?core count
+
| {{{turbo frequency#GHz}}}
|?thread count
+
| {{{turbo frequency (1 core)#GHz}}}
|?l2$ size
+
| {{{supported memory type}}}
|?l3$ size
+
| {{{tdp}}}
|?base frequency#GHz
+
| {{{first launched}}}
|?turbo frequency#GHz
+
| {{#if:{{{release price}}}|{{{release price}}}{{#ifeq:{{{release price}}}|{{{release price (tray)}}}|&#32;(1k)}} }}
|?supported memory type
+
| {{{part number}}}</nowiki>|outrotemplate=<nowiki>
|?tdp
+
|- class="separator sortbottom"
|?first launched
+
| colspan=4 | [[Multiprocessors]] (dual-socket)
|?release price
+
| colspan=10 | &nbsp;
|?part number
+
{{#invoke:comptable|askt
|sort=model number
+
|condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::>>1]]
|format=template
+
|sort=name |valuesep=,<br /> |template=&lt;nowiki>{{{#template}}}&lt;/nowiki>}}</nowiki>}}
|template=proc table 3
+
|-
|userparam=15
+
! Count: {{#ask:[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] |format=count}}
|mainlabel=-
+
|}
|valuesep=,<br/>
+
</div>
}}
 
{{comp table count|ask=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]]}}
 
</table>
 
{{comp table end}}
 
  
 
== Designers ==
 
== Designers ==
Line 214: Line 246:
 
== References ==
 
== References ==
 
<references>
 
<references>
<ref name="amd-55901-ppr-1910">{{cite techdoc|title=Processor Programming Reference (PPR) for AMD Family 19h Models 10h, Revision A0 Processors|publ=AMD|pid=55901|rev=0.97|date=2021-05-30}}</ref>
+
<ref name="ryzen-7000-preview">{{cite techdoc|title=Ryzen 7000 Desktop Preview|url=https://www.angstronomics.com/p/ryzen-7000-desktop-preview|publ=Angstronomics|date=2022-08-29}}</ref>
 +
<ref name="amd-55901-ppr-1911">{{cite techdoc|title=Processor Programming Reference (PPR) for AMD Family 19h Models 11h, Revision B1 Processors|url=https://www.amd.com/system/files/TechDocs/55901_0.25.zip|publ=AMD|pid=55901|rev=0.25|date=2022-11-10}}</ref>
 +
<ref name="amd-57647-zen4-optim">{{cite techdoc|title=Software Optimization Guide for the AMD Zen4 Microarchitecture|url=https://www.amd.com/system/files/TechDocs/57647.zip|publ=AMD|pid=57647|rev=1.00|date=2023-01-06}}</ref>
 +
<ref name="amd-58015-9004-overv">{{cite techdoc|title=AMD EPYC™ 9004 Series Architecture Overview|url=https://www.amd.com/system/files/documents/58015-epyc-9004-tg-architecture-overview.pdf|publ=AMD|pid=58015|rev=1.1|date=2022-12}}</ref>
 
</references>
 
</references>
  

Latest revision as of 18:28, 13 November 2023

Edit Values
Zen 4 µarch
General Info
Arch TypeCPU
DesignerAMD
ManufacturerTSMC
Process5 nm, 6 nm
Succession

Zen 4 is a microarchitecture developed by AMD as a successor to Zen 3. See press release for details: AMD Launches Ryzen 7000 Series Desktop Processors

History[edit]

Zen 4 on the roadmap.

Zen 4 was first mentioned by Forrest Norrod during AMD's EPYC One Year Anniversary webinar. During the next horizon event which was held on November 6, 2018, AMD stated that Zen 4 was at the design completion phase.

Products[edit]

Symbol version future.svg Preliminary Data! Information presented in this article deal with future products, data, features, and specifications that have yet to be finalized, announced, or released. Information may be incomplete and can change by final release.
Processor Series Cores/Threads Market
EPYC 9004 "Genoa" Up to 96/192 High-end server multiprocessors
Ryzen Threadripper 7000 "Storm Peak" Up to 96/192 Workstation & enthusiasts
Ryzen 7000 "Raphael" Up to 16/32 Mainstream to high-end desktops & enthusiasts
Ryzen 7000 APU "Dragon Range" Up to 16/32 High-end mobile processors with GPU
Ryzen 7000 APU "Phoenix Point" Up to 8/16 Mainstream desktop & mobile processors with GPU

Cores using variant Zen 4 uarch:

Processor Series Cores/Threads Market
EPYC 9004 "Bergamo" Up to 128/256 Cloud multiprocessors (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.)
EPYC 8004 "Siena" Up to 64/128 Edge-optimized server chips

Architectural Codenames:

Arch Codename
Core Persephone
CCD Durango

Process Technology[edit]

Processors implementing Zen 4 are SoCs configured as a Multi-Chip Module or monolithic chip. MCMs consist of a single I/O die and up to 12 Core Complex Dies attached with full-duplex serial point-to-point links. The IOD contains memory controllers, I/O controllers, microcontrollers for security purposes and power management, and other peripherals. The CCDs communicate with peripherals and each other through the Data and Control Fabrics on the I/O die, and each contain a single Core Complex (CCX). The monolithic chips integrate a subset of the IOD facilities and additional peripherals tailored for their target market, a CCX, and a GPU. A CCX contains 8 CPU cores (fewer may be usable on some models) communicating through a shared L3 cache.

("Bergamo" processors configuration TBD.)

The chips are fabricated by TSMC, CCDs and monolithic chips on a 5 nm node, IODs on a 6 nm node.

Architecture[edit]

Zen 4 is a 64-bit superscalar, out-of-order, 2-way SMT microarchitecture with advanced dynamic branch prediction, 4-way decoding of x86 instructions with a stack optimizer, multiple caches including an Op cache for decoded instructions and prefetchers for code and data, four integer/address and two floating point instruction schedulers, 3-way address generation, 5-way integer execution. 4-way 256-bit wide floating point execution, a speculative, out-of-order load/store unit capable of up to three loads or two stores per cycle with a 48/88-entry load and 64-entry store queue, write-combining, and 5-level paging with four TLBs and six hardware page table walkers.

Key changes from Zen 3[edit]

  • AVX-512 instructions support, 256-bit data path[1]
  • L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
  • Op cache size increased from 4,096 to 6,912 Ops per core
  • L2 cache doubled from 512 KiB to 1 MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum
  • L3 cache average load-to-use latency increased from 46 to 50 cycles
  • Five-level paging; Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
  • Improved cache load, write and prefetch from/to register (less latency)
  • Higher Transistor Density, due to 5nm process
  • Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
  • Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries)
  • REPE CMPSB (sometimes used to implement string comparison) is significantly sped up, processes more than 32 bytes/cycle when operating on L1 data.
  • BSF, BSR, and BMI1 instructions BLSI, BLSMSK, BLSR, TZCNT have smaller latency of 1 and x2 throughput (4 insn/cycle).
  • Latency and/or throughput of VPERMx, V[P]BROADCASTx, VPMOV{S,Z}Xx instructions improved.
  • Some ALU operations on vector registers increased throughput from 2 to 3 ops/cycle.
  • Some ALU operations on vector registers (VPABSx,VPHADDx,VPHSUBx,VPSLLx,VPSRLx,VPSRAx,VPACKx,VPSIGNx,VMAXx,VMINx) increased latency by 1 cycle.


Package level changes:

  • EPYC 9004 "Genoa": Max. core/thread count 96/192, up from 64/128 on EPYC 7003 "Milan"
  • EPYC "Bergamo": Max. 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space
  • Support for DDR5 memory and PCIe Gen 5
  • New sockets AM5 (client), SP5 and SP6 (server), FP7/FP7r2 (mobile)
  • APUs: RDNA2-based iGPU with 2 compute units (128 stream processors)

New Instructions[edit]

Zen 4 introduced the following ISA enhancements:

Memory Hierarchy[edit]

Data and Instruction Caches[edit]

  • L0 Op Cache:
    • Up to 6,912 Ops per core, 12-way set associative
    • 9 Op line size (restrictions apply depending on instruction type)
    • Parity protected
  • L1I Cache:
    • 32 KiB per core, 8-way set associative
    • 64 B line size
    • Parity protected
  • L1D Cache:
    • 32 KiB per core, 8-way set associative
    • 64 B line size
    • Write-back policy
    • 4-5 cycles latency for Int
    • 7-8 cycles latency for FP
    • ECC
  • L2 Cache:
    • 512 KiB or 1 MiB per core (varies by processor model), 8-way set associative
    • 64 B line size
    • Write-back policy
    • Inclusive of L1
    • ≥ 14 cycles latency
    • DEC-TED ECC, tag & state arrays SEC-DED
  • L3 Cache:
    • "Genoa": up to 32 MiB/CCX (8 cores), up to 384 MiB total
    • Shared by all cores in the CCX, configurable
    • 16-way set associative
    • 64 B line size
    • L2 victim cache
    • Write-back policy
    • 50 cycles average load-to-use latency
    • DEC-TED ECC, tag array & shadow tags SEC-DED
    • QoS Monitoring and Enforcement with BMEC, L3RR, L3SBE

Translation Lookaside Buffers[edit]

  • ITLB
    • 64 entry L1 TLB, fully associative
      • 4-Kbyte, 2-Mbyte, 1-Gbyte page sizes
    • 512 entry L2 TLB, 8-way set associative
      • 4-Kbyte, 2-Mbyte, and 4-Mbyte pages
    • Parity protected
  • DTLB
    • 72 entry L1 TLB, fully associative
      • 4-Kbyte, 16-Kbyte, 2-Mbyte, 1-Gbyte page sizes
    • 3,072 entry L2 TLB, 24-way set associative
      • 4-Kbyte, 16-Kbyte, 2-Mbyte, and 4-Mbyte pages, PDEs to speed up table walks
    • Parity protected

4-Mbyte pages require two 2-Mbyte entries in all TLBs. 16-Kbyte page size refers to PTE coalescing of four physically consecutive and 16-Kbyte aligned 4-Kbyte pages. All caches and TLBs are competitively shared in multi-threaded mode.

System DRAM[edit]

  • Ryzen 7000 "Raphael":
    • Up to PC5-41600 (DDR5-5200) without overclocking
  • EPYC 9004 "Genoa":
    • 12 channels per socket, two 40-bit (32 data, 8 ECC) DDR5 subchannels per channel
    • Up to 24 DIMMs, max. 6 TiB
    • Up to PC5-38400 (DDR5-4800)
    • SR/DR RDIMM, 4R/8R LRDIMM, 3DS DIMM
    • ECC supported (x4, x8, x16, chipkill)
    • DRAM bus parity and write data CRC options

Sources: [2][3][4]

All Zen 4 Processors[edit]

List of all Zen 4-based Processors
Model Codename C T L2$ L3$ Frequ. Turbo Turbo 1C Memory TDP Launched Release
Price
OPN
Uniprocessors  
EPYC 9354P Genoa 32 64 32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.25 GHz
3,250 MHz
3,250,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
3.8 GHz
3,800 MHz
3,800,000 kHz
DDR5-4800 280 W
280,000 mW
0.375 hp
0.28 kW
10 November 2022 $ 2,730.00
€ 2,457.00
£ 2,211.30
¥ 282,090.90
(1k)
100-100000805,
100-100000805WOF
EPYC 9454P Genoa 48 96 48 MiB
49,152 KiB
50,331,648 B
0.0469 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
2.75 GHz
2,750 MHz
2,750,000 kHz
3.65 GHz
3,650 MHz
3,650,000 kHz
3.8 GHz
3,800 MHz
3,800,000 kHz
DDR5-4800 290 W
290,000 mW
0.389 hp
0.29 kW
10 November 2022 $ 4,598.00
€ 4,138.20
£ 3,724.38
¥ 475,111.34
(1k)
100-100000873,
100-100000873WOF
EPYC 9554P Genoa 64 128 64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.1 GHz
3,100 MHz
3,100,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
DDR5-4800 360 W
360,000 mW
0.483 hp
0.36 kW
10 November 2022 $ 7,104.00
€ 6,393.60
£ 5,754.24
¥ 734,056.32
(1k)
100-100000804,
100-100000804WOF
EPYC 9654P Genoa 96 192 96 MiB
98,304 KiB
100,663,296 B
0.0938 GiB
384 MiB
393,216 KiB
402,653,184 B
0.375 GiB
2.4 GHz
2,400 MHz
2,400,000 kHz
3.55 GHz
3,550 MHz
3,550,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 360 W
360,000 mW
0.483 hp
0.36 kW
10 November 2022 $ 10,625.00
€ 9,562.50
£ 8,606.25
¥ 1,097,881.25
(1k)
100-100000803,
100-100000803WOF
Ryzen 5 7600X Raphael 6 12 6 MiB
6,144 KiB
6,291,456 B
0.00586 GiB
32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
4.7 GHz
4,700 MHz
4,700,000 kHz
5.3 GHz
5,300 MHz
5,300,000 kHz
105 W
105,000 mW
0.141 hp
0.105 kW
Ryzen 7 7700 Raphael 8 16 8 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
3.8 GHz
3,800 MHz
3,800,000 kHz
5.3 GHz
5,300 MHz
5,300,000 kHz
65 W
65,000 mW
0.0872 hp
0.065 kW
10 January 2023 $ 339.00
€ 305.10
£ 274.59
¥ 35,028.87
100-000000592,
100-100000592BOX
Ryzen 7 7700X Raphael 8 16 8 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
4.5 GHz
4,500 MHz
4,500,000 kHz
5.4 GHz
5,400 MHz
5,400,000 kHz
105 W
105,000 mW
0.141 hp
0.105 kW
27 September 2022 $ 399.00
€ 359.10
£ 323.19
¥ 41,228.67
100-000000591,
100-100000591WOF
Ryzen 7 7800X3D Raphael 8 16 8 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
96 MiB
98,304 KiB
100,663,296 B
0.0938 GiB
4.2 GHz
4,200 MHz
4,200,000 kHz
5 GHz
5,000 MHz
5,000,000 kHz
120 W
120,000 mW
0.161 hp
0.12 kW
Ryzen 9 7900X3D Raphael 12 24 12 MiB
12,288 KiB
12,582,912 B
0.0117 GiB
128 MiB
131,072 KiB
134,217,728 B
0.125 GiB
4.4 GHz
4,400 MHz
4,400,000 kHz
5.6 GHz
5,600 MHz
5,600,000 kHz
120 W
120,000 mW
0.161 hp
0.12 kW
Ryzen 9 7950X3D Raphael 16 32 16 MiB
16,384 KiB
16,777,216 B
0.0156 GiB
128 MiB
131,072 KiB
134,217,728 B
0.125 GiB
4.2 GHz
4,200 MHz
4,200,000 kHz
5.7 GHz
5,700 MHz
5,700,000 kHz
120 W
120,000 mW
0.161 hp
0.12 kW
28 February 2023 $ 699.00
€ 629.10
£ 566.19
¥ 72,227.67
100-000000908,
100-000000908WOF
Multiprocessors (dual-socket)  
EPYC 9124 Genoa 16 32 16 MiB
16,384 KiB
16,777,216 B
0.0156 GiB
64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
3 GHz
3,000 MHz
3,000,000 kHz
3.6 GHz
3,600 MHz
3,600,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 200 W
200,000 mW
0.268 hp
0.2 kW
10 November 2022 $ 1,083.00
€ 974.70
£ 877.23
¥ 111,906.39
(1k)
100-100000802,
100-100000802WOF
EPYC 9174F Genoa 16 32 16 MiB
16,384 KiB
16,777,216 B
0.0156 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
4.1 GHz
4,100 MHz
4,100,000 kHz
4.15 GHz
4,150 MHz
4,150,000 kHz
4.4 GHz
4,400 MHz
4,400,000 kHz
DDR5-4800 320 W
320,000 mW
0.429 hp
0.32 kW
10 November 2022 $ 3,850.00
€ 3,465.00
£ 3,118.50
¥ 397,820.50
(1k)
100-100000796,
100-100000796WOF
EPYC 9224 Genoa 24 48 24 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
2.5 GHz
2,500 MHz
2,500,000 kHz
3.65 GHz
3,650 MHz
3,650,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 200 W
200,000 mW
0.268 hp
0.2 kW
10 November 2022 $ 1,825.00
€ 1,642.50
£ 1,478.25
¥ 188,577.25
(1k)
100-100000939,
100-100000939WOF
EPYC 9254 Genoa 24 48 24 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
128 MiB
131,072 KiB
134,217,728 B
0.125 GiB
2.9 GHz
2,900 MHz
2,900,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
4.15 GHz
4,150 MHz
4,150,000 kHz
DDR5-4800 200 W
200,000 mW
0.268 hp
0.2 kW
10 November 2022 $ 2,299.00
€ 2,069.10
£ 1,862.19
¥ 237,555.67
(1k)
100-100000480,
100-100000480WOF
EPYC 9274F Genoa 24 48 24 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
4.05 GHz
4,050 MHz
4,050,000 kHz
4.1 GHz
4,100 MHz
4,100,000 kHz
4.3 GHz
4,300 MHz
4,300,000 kHz
DDR5-4800 320 W
320,000 mW
0.429 hp
0.32 kW
10 November 2022 $ 3,060.00
€ 2,754.00
£ 2,478.60
¥ 316,189.80
(1k)
100-100000794,
100-100000794WOF
EPYC 9334 Genoa 32 64 32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
128 MiB
131,072 KiB
134,217,728 B
0.125 GiB
2.7 GHz
2,700 MHz
2,700,000 kHz
3.85 GHz
3,850 MHz
3,850,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
DDR5-4800 210 W
210,000 mW
0.282 hp
0.21 kW
10 November 2022 $ 2,990.00
€ 2,691.00
£ 2,421.90
¥ 308,956.70
(1k)
100-100000800,
100-100000800WOF
EPYC 9354 Genoa 32 64 32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.25 GHz
3,250 MHz
3,250,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
3.8 GHz
3,800 MHz
3,800,000 kHz
DDR5-4800 280 W
280,000 mW
0.375 hp
0.28 kW
10 November 2022 $ 3,420.00
€ 3,078.00
£ 2,770.20
¥ 353,388.60
(1k)
100-100000798,
100-100000798WOF
EPYC 9374F Genoa 32 64 32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.85 GHz
3,850 MHz
3,850,000 kHz
4.1 GHz
4,100 MHz
4,100,000 kHz
4.3 GHz
4,300 MHz
4,300,000 kHz
DDR5-4800 320 W
320,000 mW
0.429 hp
0.32 kW
10 November 2022 $ 4,850.00
€ 4,365.00
£ 3,928.50
¥ 501,150.50
(1k)
100-100000792,
100-100000792WOF
EPYC 9454 Genoa 48 96 48 MiB
49,152 KiB
50,331,648 B
0.0469 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
2.75 GHz
2,750 MHz
2,750,000 kHz
3.65 GHz
3,650 MHz
3,650,000 kHz
3.8 GHz
3,800 MHz
3,800,000 kHz
DDR5-4800 290 W
290,000 mW
0.389 hp
0.29 kW
10 November 2022 $ 5,225.00
€ 4,702.50
£ 4,232.25
¥ 539,899.25
(1k)
100-100000478,
100-100000478WOF
EPYC 9474F Genoa 48 96 48 MiB
49,152 KiB
50,331,648 B
0.0469 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.6 GHz
3,600 MHz
3,600,000 kHz
3.95 GHz
3,950 MHz
3,950,000 kHz
4.1 GHz
4,100 MHz
4,100,000 kHz
DDR5-4800 360 W
360,000 mW
0.483 hp
0.36 kW
10 November 2022 $ 6,780.00
€ 6,102.00
£ 5,491.80
¥ 700,577.40
(1k)
100-100000788,
100-100000788WOF
EPYC 9534 Genoa 64 128 64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
2.45 GHz
2,450 MHz
2,450,000 kHz
3.55 GHz
3,550 MHz
3,550,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 280 W
280,000 mW
0.375 hp
0.28 kW
10 November 2022 $ 8,803.00
€ 7,922.70
£ 7,130.43
¥ 909,613.99
(1k)
100-100000799,
100-100000799WOF
EPYC 9554 Genoa 64 128 64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
256 MiB
262,144 KiB
268,435,456 B
0.25 GiB
3.1 GHz
3,100 MHz
3,100,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
3.75 GHz
3,750 MHz
3,750,000 kHz
DDR5-4800 360 W
360,000 mW
0.483 hp
0.36 kW
10 November 2022 $ 9,087.00
€ 8,178.30
£ 7,360.47
¥ 938,959.71
(1k)
100-100000790,
100-100000790WOF
EPYC 9634 Genoa 84 168 84 MiB
86,016 KiB
88,080,384 B
0.082 GiB
384 MiB
393,216 KiB
402,653,184 B
0.375 GiB
2.25 GHz
2,250 MHz
2,250,000 kHz
3.1 GHz
3,100 MHz
3,100,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 290 W
290,000 mW
0.389 hp
0.29 kW
10 November 2022 $ 10,304.00
€ 9,273.60
£ 8,346.24
¥ 1,064,712.32
(1k)
100-100000797,
100-100000797WOF
EPYC 9654 Genoa 96 192 96 MiB
98,304 KiB
100,663,296 B
0.0938 GiB
384 MiB
393,216 KiB
402,653,184 B
0.375 GiB
2.4 GHz
2,400 MHz
2,400,000 kHz
3.55 GHz
3,550 MHz
3,550,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
DDR5-4800 360 W
360,000 mW
0.483 hp
0.36 kW
10 November 2022 $ 11,805.00
€ 10,624.50
£ 9,562.05
¥ 1,219,810.65
(1k)
100-100000789,
100-100000789WOF
Count: 24

Designers[edit]

  • Mike Clark(?), chief architect

Bibliography[edit]

References[edit]

  1. "Ryzen 7000 Desktop Preview", Angstronomics, August 29, 2022
  2. "Processor Programming Reference (PPR) for AMD Family 19h Models 11h, Revision B1 Processors", AMD Publ. #55901, Rev. 0.25, November 10, 2022
  3. "Software Optimization Guide for the AMD Zen4 Microarchitecture", AMD Publ. #57647, Rev. 1.00, January 6, 2023
  4. "AMD EPYC™ 9004 Series Architecture Overview", AMD Publ. #58015, Rev. 1.1, December 2022

See Also[edit]

codenameZen 4 +
designerAMD +
full page nameamd/microarchitectures/zen 4 +
instance ofmicroarchitecture +
manufacturerTSMC +
microarchitecture typeCPU +
nameZen 4 +
process5 nm (0.005 μm, 5.0e-6 mm) + and 6 nm (0.006 μm, 6.0e-6 mm) +