Editing amd/microarchitectures/zen 4

{{amd title|Zen 4|arch}}
{{microarchitecture
|atype=CPU
|name=Zen 4
|designer=AMD
|manufacturer=TSMC
|process=5 nm
|process 2=6 nm
|predecessor=Zen 3
|predecessor link=amd/microarchitectures/zen 3
|successor=Zen 5
|successor link=amd/microarchitectures/zen 5
|succession=Yes
}}
'''Zen 4''' is a [[microarchitecture]] developed by [[AMD]] as a successor to {{\\|Zen 3}}. See press release for details: [https://www.amd.com/en/press-releases/2022-08-29-amd-launches-ryzen-7000-series-desktop-processors-zen-4-architecture-the AMD Launches Ryzen 7000 Series Desktop Processors]

== History ==
[[File:next-horizon-zen3-4-roadmap.png|right|thumb|400px|Zen 4 on the roadmap.]]
Zen 4 was first mentioned by Forrest Norrod during AMD's EPYC One Year Anniversary webinar. During the next horizon event which was held on November 6, 2018, AMD stated that Zen 4 was at the design completion phase.

== Products ==
{{future information}}

{| class="wikitable"
|-
! Processor Series !! Cores/Threads !! Market
|-
| EPYC 9004 "{{amd|Genoa|l=core}}" || Up to 96/192 || High-end server [[multiprocessors]]
|-
| Ryzen Threadripper 7000 "{{amd|Storm Peak|l=core}}" || Up to 96/192 || Workstation & enthusiasts
|-
| Ryzen 7000 "{{amd|Raphael|l=core}}" || Up to 16/32 || Mainstream to high-end desktops & enthusiasts
|-
| Ryzen 7000 APU "{{amd|Dragon Range|l=core}}" || Up to 16/32 || High-end mobile processors with GPU 
|-
| Ryzen 7000 APU "{{amd|Phoenix Point|l=core}}" || Up to 8/16 || Mainstream desktop & mobile processors with GPU 
|}

Cores using variant Zen 4 uarch:

{| class="wikitable"
|-
! Processor Series !! Cores/Threads !! Market
|-
| EPYC 9004 {{amd|Bergamo|l=core}} || Up to 128/256  || Cloud [[multiprocessors]] (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.)
|}

'''Architectural Codenames:'''
{| class="wikitable"
|-
! Arch !! Codename
|-
| Core || Persephone
|-
| {{abbr|CCD}} || Durango
|}

== Process Technology ==
Processors implementing Zen 4 are {{abbr|SoC}}s configured as a Multi-Chip Module or monolithic chip. MCMs consist of a single I/O die and up to 12 Core Complex Dies attached with full-duplex serial point-to-point links. The IOD contains memory controllers, I/O controllers, microcontrollers for security purposes and power management, and other peripherals. The CCDs communicate with peripherals and each other through the Data and Control Fabrics on the I/O die, and each contain a single Core Complex (CCX). The monolithic chips integrate a subset of the IOD facilities and additional peripherals tailored for their target market, a CCX, and a GPU. A CCX contains 8 CPU cores (fewer may be usable on some models) communicating through a shared L3 cache.

("Bergamo" processors configuration TBD.)

The chips are fabricated by [[TSMC]], CCDs and monolithic chips on a [[5&nbsp;nm]] node, IODs on a [[6&nbsp;nm]] node.

== Architecture ==
Zen 4 is a 64-bit superscalar, out-of-order, 2-way [[SMT]] microarchitecture with advanced dynamic branch prediction, 4-way decoding of [[x86]] instructions with a stack optimizer, multiple caches including an Op cache for decoded instructions and prefetchers for code and data, four integer/address and two floating point instruction schedulers, 3-way address generation, 5-way integer execution. 4-way 256-bit wide floating point execution, a speculative, out-of-order load/store unit capable of up to three loads or two stores per cycle with a 48/88-entry load and 64-entry store queue, write-combining, and 5-level paging with four {{abbr|TLB}}s and six hardware page table walkers.

=== Key changes from {{\\|Zen 3}} ===
* {{x86|AVX-512}} instructions support, 256-bit data path<ref name="ryzen-7000-preview"/>
* L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
* Op cache size increased from 4,096 to 6,750 Ops per core
* L2 cache doubled from 512&nbsp;KiB to 1&nbsp;MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum
* L3 cache average load-to-use latency increased from 46 to 50 cycles
* Five-level paging; Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
* Improved cache load, write and prefetch from/to register (less latency)
* Higher Transistor Density, due to 5nm process
* Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
* Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries)

Package level changes:
* EPYC 9004 "{{amd|Genoa|l=core}}": Max. core/thread count 96/192, up from 64/128 on EPYC 7003 "{{amd|Milan|l=core}}"
* EPYC "{{amd|Bergamo|l=core}}": Max. 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space
* Support for DDR5 memory and PCIe Gen 5
* New sockets {{amd|AM5|l=pack}} (client), {{amd|SP5|l=pack}} and {{amd|SP6|l=pack}} (server), {{amd|FP7|FP7/FP7r2|l=pack}} (mobile)
* {{abbr|APU}}s: RDNA2-based iGPU with 2 compute units (128 stream processors)

=== New Instructions ===
Zen 4 introduced the following ISA enhancements:

* {{x86|AVX-512}} - 512-bit Vector Instructions
** {{x86|AVX512F}} - Foundation (first introduced with [[Intel]] {{intel|skylake (server)|Skylake|l=arch}})
** {{x86|AVX512CD}} - Conflict Detection Instructions ({{intel|Skylake X|l=core}})
** {{x86|AVX512VL}} - Vector Length Extensions (Skylake X)
** {{x86|AVX512DQ}} - Doubleword and Quadword Instructions (Skylake X)
** {{x86|AVX512BW}} - Byte and Word Instructions (Skylake X)
** {{x86|AVX512_IFMA}} - Integer Fused Multiply-Add ({{intel|Cannon Lake|l=arch}})
** {{x86|AVX512_VBMI}} - Vector Bit Manipulation Instructions (Cannon Lake)
** {{x86|AVX512_VPOPCNTDQ}} - Vector Population Count Instructions ({{intel|ice lake (server)|Ice Lake|l=arch}})
** {{x86|AVX512_BITALG}} - Bit Algorithms (Ice Lake)
** {{x86|AVX512_VBMI2}} - Vector Bit Manipulation Instructions 2 (Ice Lake) 
** {{x86|AVX512_VNNI}} - Vector Neural Network Instructions (Ice Lake)
** {{x86|AVX512_BF16}} - [[bfloat16|BFloat16]] Instructions ({{intel|Cooper Lake|l=arch}})
** ''Not supported'': AVX512ER, AVX512PF ({{intel|Knights Landing|l=arch}}); AVX512 4VNNIW, 4FMAPS ({{intel|Knights Mill|l=arch}}); VP2INTERSECT ({{intel|Tiger Lake|l=arch}}); FP16 ({{intel|Cooper Lake|l=arch}})
* {{x86|GFNI}} - Galois Field New Instructions (first introduced with [[Intel]] {{intel|ice lake (server)|Ice Lake|l=arch}})
** <code>VGF2P8AFFINEQB</code> - Galois field affine transformation
** <code>VGF2P8AFFINEINVQB</code> - Galois field affine transformation inverse
** <code>VGF2P8MULB</code> - Galois field multiply bytes

=== Memory Hierarchy ===
==== Data and Instruction Caches ====
* L0 Op Cache:
** Up to 6,750 Ops per core, 8-way set associative
** 9 Op line size (restrictions apply depending on instruction type)
** Parity protected
* L1I Cache:
** 32&nbsp;KiB per core, 8-way set associative
** 64&nbsp;B line size
** Parity protected
* L1D Cache:
** 32&nbsp;KiB per core, 8-way set associative
** 64&nbsp;B line size
** Write-back policy
** 4-5 cycles latency for Int
** 7-8 cycles latency for FP
** ECC
* L2 Cache:
** 512&nbsp;KiB or 1&nbsp;MiB per core (varies by processor model), 8-way set associative
** 64&nbsp;B line size
** Write-back policy
** Inclusive of L1
** ≥ 14 cycles latency
** {{abbr|DEC-TED}} ECC, tag & state arrays {{abbr|SEC-DED}}<!--7 check bits for 42 tag bits; AMD-55901-0.97 Sec 3.5-->
* L3 Cache:
** "{{amd|Genoa|l=core}}": up to 32&nbsp;MiB/{{abbr|CCX}} (8 cores), up to 384&nbsp;MiB total
** Shared by all cores in the CCX, configurable
** 16-way set associative
** 64&nbsp;B line size
** L2 [[victim cache]]
** Write-back policy
** 50 cycles average load-to-use latency
** DEC-TED ECC, tag array & shadow tags SEC-DED<!--AMD-55901-0.97 Sec 3.5-->
** QoS Monitoring and Enforcement with {{abbr|BMEC|Bandwidth Monitoring Event Configuration}}, {{abbr|L3RR|L3 Range Reservation}}, {{abbr|L3SBE|L3 External Slow Memory Bandwidth Enforcement}}

==== Translation Lookaside Buffers ====
* ITLB
** 64 entry L1 TLB, fully associative
*** 4-Kbyte, 2-Mbyte, 1-Gbyte page sizes
** 512 entry L2 TLB, 8-way set associative 
*** 4-Kbyte, 2-Mbyte, and 4-Mbyte pages
** Parity protected
* DTLB
** 72 entry L1 TLB, fully associative
*** 4-Kbyte, 16-Kbyte, 2-Mbyte, 1-Gbyte page sizes
** 3,072 entry L2 TLB, 24-way set associative
*** 4-Kbyte, 16-Kbyte, 2-Mbyte, and 4-Mbyte pages, {{abbr|PDE|Page Directory Entry}}s to speed up table walks
** Parity protected

4-Mbyte pages require two 2-Mbyte entries in all TLBs. 16-Kbyte page size refers to {{abbr|PTE|Page Table Entry}} coalescing of four physically consecutive and 16-Kbyte aligned 4-Kbyte pages. All caches and TLBs are competitively shared in multi-threaded mode.

==== System DRAM ====
* Ryzen 7000 "{{amd|Raphael|l=core}}":
** Up to PC5-41600 (DDR5-5200) without overclocking

* EPYC 9004 "{{amd|Genoa|l=core}}":
** 12 channels per socket, two 40-bit (32 data, 8 ECC) DDR5 subchannels per channel
** Up to 24 DIMMs, max. 6&nbsp;TiB
** Up to PC5-38400 (DDR5-4800)
** {{abbr|SR}}/{{abbr|DR}} {{abbr|RDIMM}}, {{abbr|4R}}/{{abbr|8R}} {{abbr|LRDIMM}}, {{abbr|3DS DIMM}}
** ECC supported (x4, x8, x16, chipkill)<!--AMD-55901-0.97 Sec 3.7-->
** DRAM bus parity and write data CRC options<!--ibid-->

Sources: <ref name="amd-55901-ppr-1911"/><ref name="amd-57647-zen4-optim"/><ref name="amd-58015-9004-overv"/>

== All Zen 4 Processors ==
<!-- NOTE:
This table is generated automatically from the data in the actual articles.
If a microprocessor is missing from the list, an appropriate article for it needs to be
created and tagged accordingly.
Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
-->
{| class="comptable3"
! List of all Zen 4-based Processors
|}
<div class="comptable-scroller sticky">
{| class="comptable3 stickycol1 sortable"
|- class="header continued"
! Model
! Codename
! {{abbr|C|Cores}}
! {{abbr|T|Threads}}
! data-sort-type=number | L2$
! data-sort-type=number | L3$
! data-sort-type=number | Frequ.
! data-sort-type=number | Turbo
! data-sort-type=number | Turbo 1C
! Memory
! data-sort-type=number | {{abbr|TDP}}
! data-sort-type=date | Launched
! Release<br />Price
! {{abbr|OPN}}
|- class="separator sortbottom"
| colspan=4 | [[Uniprocessors]]
| colspan=10 | &nbsp;
{{#invoke:comptable|askt
|condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::1]]
|sort=name |valuesep=,<br /> |template=<nowiki>
|-
| data-sort-value="{{{name#-}}}" | {{amd|{{{microprocessor family#-}}}}} [[{{{page#-}}}|{{{model number#-}}}]]
| {{amd|{{{core name#-}}}|l=core}}
| {{{core count}}}
| {{{thread count}}}
| {{{l2$ size}}}
| {{{l3$ size}}}
| {{{base frequency#GHz}}}
| {{{turbo frequency#GHz}}}
| {{{turbo frequency (1 core)#GHz}}}
| {{{supported memory type}}}
| {{{tdp}}}
| {{{first launched}}}
| {{#if:{{{release price}}}|{{{release price}}}{{#ifeq:{{{release price}}}|{{{release price (tray)}}}|&#32;(1k)}} }}
| {{{part number}}}</nowiki>|outrotemplate=<nowiki>
|- class="separator sortbottom"
| colspan=4 | [[Multiprocessors]] (dual-socket)
| colspan=10 | &nbsp;
{{#invoke:comptable|askt
|condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::>>1]]
|sort=name |valuesep=,<br /> |template=&lt;nowiki>{{{#template}}}&lt;/nowiki>}}</nowiki>}}
|-
! Count: {{#ask:[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] |format=count}}
|}
</div>

== Designers ==
* Mike Clark(?), chief architect

== Bibliography ==

== References ==
<references>
<ref name="ryzen-7000-preview">{{cite techdoc|title=Ryzen 7000 Desktop Preview|url=https://www.angstronomics.com/p/ryzen-7000-desktop-preview|publ=Angstronomics|date=2022-08-29}}</ref>
<ref name="amd-55901-ppr-1911">{{cite techdoc|title=Processor Programming Reference (PPR) for AMD Family 19h Models 11h, Revision B1 Processors|url=https://www.amd.com/system/files/TechDocs/55901_0.25.zip|publ=AMD|pid=55901|rev=0.25|date=2022-11-10}}</ref>
<ref name="amd-57647-zen4-optim">{{cite techdoc|title=Software Optimization Guide for the AMD Zen4 Microarchitecture|url=https://www.amd.com/system/files/TechDocs/57647.zip|publ=AMD|pid=57647|rev=1.00|date=2023-01-06}}</ref>
<ref name="amd-58015-9004-overv">{{cite techdoc|title=AMD EPYC™ 9004 Series Architecture Overview|url=https://www.amd.com/system/files/documents/58015-epyc-9004-tg-architecture-overview.pdf|publ=AMD|pid=58015|rev=1.1|date=2022-12}}</ref>
</references>

== See Also ==
* AMD {{\\|Zen}}, {{\\|Zen 2}}, {{\\|Zen 3}}
* Intel {{intel|Meteor Lake|l=arch}}
codename	Zen 4 +
designer	AMD +
full page name	amd/microarchitectures/zen 4 +
instance of	microarchitecture +
manufacturer	TSMC +
microarchitecture type	CPU +
name	Zen 4 +
process	5 nm (0.005 μm, 5.0e-6 mm) + and 6 nm (0.006 μm, 6.0e-6 mm) +