Atomsymbol (talk | contribs) (→Key changes from {{\\|Zen 3}}: Add a line about iGPU) |
(6.75k is not 6750, increase from 4096=8*8*2^6 to 6912=9*12*2^6 (=4096*9/8 +50%)) |
||
(11 intermediate revisions by 7 users not shown) | |||
Line 5: | Line 5: | ||
|designer=AMD | |designer=AMD | ||
|manufacturer=TSMC | |manufacturer=TSMC | ||
− | |process= | + | |process=5 nm |
+ | |process 2=6 nm | ||
|predecessor=Zen 3 | |predecessor=Zen 3 | ||
|predecessor link=amd/microarchitectures/zen 3 | |predecessor link=amd/microarchitectures/zen 3 | ||
Line 12: | Line 13: | ||
|succession=Yes | |succession=Yes | ||
}} | }} | ||
− | '''Zen 4''' is a | + | '''Zen 4''' is a [[microarchitecture]] developed by [[AMD]] as a successor to {{\\|Zen 3}}. See press release for details: [https://www.amd.com/en/press-releases/2022-08-29-amd-launches-ryzen-7000-series-desktop-processors-zen-4-architecture-the AMD Launches Ryzen 7000 Series Desktop Processors] |
== History == | == History == | ||
Line 27: | Line 28: | ||
| EPYC 9004 "{{amd|Genoa|l=core}}" || Up to 96/192 || High-end server [[multiprocessors]] | | EPYC 9004 "{{amd|Genoa|l=core}}" || Up to 96/192 || High-end server [[multiprocessors]] | ||
|- | |- | ||
− | | Ryzen Threadripper 7000 "{{amd|Storm Peak|l=core}}" || Up to 96/192 || Workstation & enthusiasts | + | | Ryzen Threadripper 7000 "{{amd|Storm Peak|l=core}}" || Up to 96/192 || Workstation & enthusiasts |
|- | |- | ||
− | | Ryzen 7000 "{{amd|Raphael|l=core}}" || Up to 16/32 || Mainstream to high-end desktops & enthusiasts | + | | Ryzen 7000 "{{amd|Raphael|l=core}}" || Up to 16/32 || Mainstream to high-end desktops & enthusiasts |
|- | |- | ||
| Ryzen 7000 APU "{{amd|Dragon Range|l=core}}" || Up to 16/32 || High-end mobile processors with GPU | | Ryzen 7000 APU "{{amd|Dragon Range|l=core}}" || Up to 16/32 || High-end mobile processors with GPU | ||
Line 42: | Line 43: | ||
! Processor Series !! Cores/Threads !! Market | ! Processor Series !! Cores/Threads !! Market | ||
|- | |- | ||
− | | EPYC 9004 {{amd|Bergamo|l=core}} || Up to 128/256 || Cloud [[multiprocessors]] (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.) | + | | EPYC 9004 "{{amd|Bergamo|l=core}}" || Up to 128/256 || Cloud [[multiprocessors]] (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.) |
+ | |- | ||
+ | | EPYC 8004 "{{amd|Siena|l=core}}" || Up to 64/128 || Edge-optimized server chips | ||
+ | |} | ||
+ | |||
+ | '''Architectural Codenames:''' | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Arch !! Codename | ||
+ | |- | ||
+ | | Core || Persephone | ||
+ | |- | ||
+ | | {{abbr|CCD}} || Durango | ||
|} | |} | ||
== Process Technology == | == Process Technology == | ||
− | Zen 4 | + | Processors implementing Zen 4 are {{abbr|SoC}}s configured as a Multi-Chip Module or monolithic chip. MCMs consist of a single I/O die and up to 12 Core Complex Dies attached with full-duplex serial point-to-point links. The IOD contains memory controllers, I/O controllers, microcontrollers for security purposes and power management, and other peripherals. The CCDs communicate with peripherals and each other through the Data and Control Fabrics on the I/O die, and each contain a single Core Complex (CCX). The monolithic chips integrate a subset of the IOD facilities and additional peripherals tailored for their target market, a CCX, and a GPU. A CCX contains 8 CPU cores (fewer may be usable on some models) communicating through a shared L3 cache. |
+ | |||
+ | ("Bergamo" processors configuration TBD.) | ||
+ | |||
+ | The chips are fabricated by [[TSMC]], CCDs and monolithic chips on a [[5 nm]] node, IODs on a [[6 nm]] node. | ||
== Architecture == | == Architecture == | ||
− | {{ | + | Zen 4 is a 64-bit superscalar, out-of-order, 2-way [[SMT]] microarchitecture with advanced dynamic branch prediction, 4-way decoding of [[x86]] instructions with a stack optimizer, multiple caches including an Op cache for decoded instructions and prefetchers for code and data, four integer/address and two floating point instruction schedulers, 3-way address generation, 5-way integer execution. 4-way 256-bit wide floating point execution, a speculative, out-of-order load/store unit capable of up to three loads or two stores per cycle with a 48/88-entry load and 64-entry store queue, write-combining, and 5-level paging with four {{abbr|TLB}}s and six hardware page table walkers. |
=== Key changes from {{\\|Zen 3}} === | === Key changes from {{\\|Zen 3}} === | ||
− | * | + | * {{x86|AVX-512}} instructions support, 256-bit data path<ref name="ryzen-7000-preview"/> |
− | + | * L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries | |
− | + | * Op cache size increased from 4,096 to 6,912 Ops per core | |
− | ** L2 cache doubled from 512 KiB to 1 MiB per core | + | * L2 cache doubled from 512 KiB to 1 MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum |
− | ** Max. physical and linear address size raised from 48 to 52 and 57 bits respectively | + | * L3 cache average load-to-use latency increased from 46 to 50 cycles |
− | + | * Five-level paging; Max. physical and linear address size raised from 48 to 52 and 57 bits respectively | |
− | + | * Improved cache load, write and prefetch from/to register (less latency) | |
− | + | * Higher Transistor Density, due to 5nm process | |
− | + | * Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores) | |
− | * Package | + | * Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries) |
− | * | + | * REPE CMPSB (sometimes used to implement string comparison) is significantly sped up, processes more than 32 bytes/cycle when operating on L1 data. |
− | + | * BSF, BSR, and BMI1 instructions BLSI, BLSMSK, BLSR, TZCNT have smaller latency of 1 and x2 throughput (4 insn/cycle). | |
− | + | * Latency and/or throughput of VPERMx, V[P]BROADCASTx, VPMOV{S,Z}Xx instructions improved. | |
− | * | + | * Some ALU operations on vector registers increased throughput from 2 to 3 ops/cycle. |
+ | * Some ALU operations on vector registers (VPABSx,VPHADDx,VPHSUBx,VPSLLx,VPSRLx,VPSRAx,VPACKx,VPSIGNx,VMAXx,VMINx) increased latency by 1 cycle. | ||
+ | |||
+ | |||
+ | Package level changes: | ||
+ | * EPYC 9004 "{{amd|Genoa|l=core}}": Max. core/thread count 96/192, up from 64/128 on EPYC 7003 "{{amd|Milan|l=core}}" | ||
+ | * EPYC "{{amd|Bergamo|l=core}}": Max. 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space | ||
+ | * Support for DDR5 memory and PCIe Gen 5 | ||
+ | * New sockets {{amd|AM5|l=pack}} (client), {{amd|SP5|l=pack}} and {{amd|SP6|l=pack}} (server), {{amd|FP7|FP7/FP7r2|l=pack}} (mobile) | ||
+ | * {{abbr|APU}}s: RDNA2-based iGPU with 2 compute units (128 stream processors) | ||
=== New Instructions === | === New Instructions === | ||
Zen 4 introduced the following ISA enhancements: | Zen 4 introduced the following ISA enhancements: | ||
− | |||
* {{x86|AVX-512}} - 512-bit Vector Instructions | * {{x86|AVX-512}} - 512-bit Vector Instructions | ||
** {{x86|AVX512F}} - Foundation (first introduced with [[Intel]] {{intel|skylake (server)|Skylake|l=arch}}) | ** {{x86|AVX512F}} - Foundation (first introduced with [[Intel]] {{intel|skylake (server)|Skylake|l=arch}}) | ||
Line 77: | Line 102: | ||
** {{x86|AVX512DQ}} - Doubleword and Quadword Instructions (Skylake X) | ** {{x86|AVX512DQ}} - Doubleword and Quadword Instructions (Skylake X) | ||
** {{x86|AVX512BW}} - Byte and Word Instructions (Skylake X) | ** {{x86|AVX512BW}} - Byte and Word Instructions (Skylake X) | ||
− | ** {{x86| | + | ** {{x86|AVX512_IFMA}} - Integer Fused Multiply-Add ({{intel|Cannon Lake|l=arch}}) |
− | ** {{x86| | + | ** {{x86|AVX512_VBMI}} - Vector Bit Manipulation Instructions (Cannon Lake) |
− | ** {{x86| | + | ** {{x86|AVX512_VPOPCNTDQ}} - Vector Population Count Instructions ({{intel|ice lake (server)|Ice Lake|l=arch}}) |
− | ** {{x86| | + | ** {{x86|AVX512_BITALG}} - Bit Algorithms (Ice Lake) |
− | ** {{x86| | + | ** {{x86|AVX512_VBMI2}} - Vector Bit Manipulation Instructions 2 (Ice Lake) |
− | ** {{x86| | + | ** {{x86|AVX512_VNNI}} - Vector Neural Network Instructions (Ice Lake) |
− | ** {{x86| | + | ** {{x86|AVX512_BF16}} - [[bfloat16|BFloat16]] Instructions ({{intel|Cooper Lake|l=arch}}) |
− | ** ''Not supported'': AVX512ER, AVX512PF ({{intel|Knights Landing|l=arch}}); AVX512 4VNNIW, 4FMAPS ({{intel|Knights Mill|l=arch}}); VP2INTERSECT ({{intel|Tiger Lake|l=arch}}) | + | ** ''Not supported'': AVX512ER, AVX512PF ({{intel|Knights Landing|l=arch}}); AVX512 4VNNIW, 4FMAPS ({{intel|Knights Mill|l=arch}}); VP2INTERSECT ({{intel|Tiger Lake|l=arch}}); FP16 ({{intel|Sapphire Rapids|l=arch}}) |
− | * GFNI - Galois Field New Instructions (first introduced with [[Intel]] {{intel|ice lake (server)|Ice Lake|l=arch}}) | + | * {{x86|GFNI}} - Galois Field New Instructions (first introduced with [[Intel]] {{intel|ice lake (server)|Ice Lake|l=arch}}) |
** <code>VGF2P8AFFINEQB</code> - Galois field affine transformation | ** <code>VGF2P8AFFINEQB</code> - Galois field affine transformation | ||
** <code>VGF2P8AFFINEINVQB</code> - Galois field affine transformation inverse | ** <code>VGF2P8AFFINEINVQB</code> - Galois field affine transformation inverse | ||
Line 93: | Line 118: | ||
==== Data and Instruction Caches ==== | ==== Data and Instruction Caches ==== | ||
* L0 Op Cache: | * L0 Op Cache: | ||
− | ** 6 | + | ** Up to 6,912 Ops per core, 12-way set associative |
− | ** 9 Op line size( | + | ** 9 Op line size (restrictions apply depending on instruction type) |
** Parity protected | ** Parity protected | ||
* L1I Cache: | * L1I Cache: | ||
− | ** 32 KiB per core, 8-way set associative | + | ** 32 KiB per core, 8-way set associative |
− | ** 64 B line size | + | ** 64 B line size |
** Parity protected | ** Parity protected | ||
* L1D Cache: | * L1D Cache: | ||
− | ** 32 KiB per core, 8-way set associative | + | ** 32 KiB per core, 8-way set associative |
− | ** 64 B line size | + | ** 64 B line size |
** Write-back policy | ** Write-back policy | ||
− | ** | + | ** 4-5 cycles latency for Int |
− | ** | + | ** 7-8 cycles latency for FP |
** ECC | ** ECC | ||
* L2 Cache: | * L2 Cache: | ||
− | ** 1 MiB per core, 8-way set associative | + | ** 512 KiB or 1 MiB per core (varies by processor model), 8-way set associative |
− | ** 64 B line size | + | ** 64 B line size |
** Write-back policy | ** Write-back policy | ||
− | ** Inclusive of L1 | + | ** Inclusive of L1 |
− | ** 14 cycles latency | + | ** ≥ 14 cycles latency |
** {{abbr|DEC-TED}} ECC, tag & state arrays {{abbr|SEC-DED}}<!--7 check bits for 42 tag bits; AMD-55901-0.97 Sec 3.5--> | ** {{abbr|DEC-TED}} ECC, tag & state arrays {{abbr|SEC-DED}}<!--7 check bits for 42 tag bits; AMD-55901-0.97 Sec 3.5--> | ||
* L3 Cache: | * L3 Cache: | ||
− | + | ** "{{amd|Genoa|l=core}}": up to 32 MiB/{{abbr|CCX}} (8 cores), up to 384 MiB total | |
** Shared by all cores in the CCX, configurable | ** Shared by all cores in the CCX, configurable | ||
** 16-way set associative | ** 16-way set associative | ||
− | ** 64 B line size | + | ** 64 B line size |
− | ** L2 [[victim cache]] | + | ** L2 [[victim cache]] |
** Write-back policy | ** Write-back policy | ||
** 50 cycles average load-to-use latency | ** 50 cycles average load-to-use latency | ||
** DEC-TED ECC, tag array & shadow tags SEC-DED<!--AMD-55901-0.97 Sec 3.5--> | ** DEC-TED ECC, tag array & shadow tags SEC-DED<!--AMD-55901-0.97 Sec 3.5--> | ||
− | ** QoS Monitoring and Enforcement | + | ** QoS Monitoring and Enforcement with {{abbr|BMEC|Bandwidth Monitoring Event Configuration}}, {{abbr|L3RR|L3 Range Reservation}}, {{abbr|L3SBE|L3 External Slow Memory Bandwidth Enforcement}} |
==== Translation Lookaside Buffers ==== | ==== Translation Lookaside Buffers ==== | ||
* ITLB | * ITLB | ||
− | ** 64 entry L1 TLB, fully associative, | + | ** 64 entry L1 TLB, fully associative |
− | ** 512 entry L2 TLB, | + | *** 4-Kbyte, 2-Mbyte, 1-Gbyte page sizes |
+ | ** 512 entry L2 TLB, 8-way set associative | ||
*** 4-Kbyte, 2-Mbyte, and 4-Mbyte pages | *** 4-Kbyte, 2-Mbyte, and 4-Mbyte pages | ||
** Parity protected | ** Parity protected | ||
* DTLB | * DTLB | ||
− | ** 72 entry L1 TLB, fully associative, | + | ** 72 entry L1 TLB, fully associative |
− | ** 3,072 entry L2 TLB, | + | *** 4-Kbyte, 16-Kbyte, 2-Mbyte, 1-Gbyte page sizes |
− | *** 4-Kbyte, 2-Mbyte, and 4-Mbyte pages, | + | ** 3,072 entry L2 TLB, 24-way set associative |
+ | *** 4-Kbyte, 16-Kbyte, 2-Mbyte, and 4-Mbyte pages, {{abbr|PDE|Page Directory Entry}}s to speed up table walks | ||
** Parity protected | ** Parity protected | ||
− | 4-Mbyte pages require two 2-Mbyte entries in all TLBs. | + | 4-Mbyte pages require two 2-Mbyte entries in all TLBs. 16-Kbyte page size refers to {{abbr|PTE|Page Table Entry}} coalescing of four physically consecutive and 16-Kbyte aligned 4-Kbyte pages. All caches and TLBs are competitively shared in multi-threaded mode. |
==== System DRAM ==== | ==== System DRAM ==== | ||
* Ryzen 7000 "{{amd|Raphael|l=core}}": | * Ryzen 7000 "{{amd|Raphael|l=core}}": | ||
** Up to PC5-41600 (DDR5-5200) without overclocking | ** Up to PC5-41600 (DDR5-5200) without overclocking | ||
+ | |||
* EPYC 9004 "{{amd|Genoa|l=core}}": | * EPYC 9004 "{{amd|Genoa|l=core}}": | ||
− | ** 12 channels per socket, two 40-bit DDR5 subchannels per channel | + | ** 12 channels per socket, two 40-bit (32 data, 8 ECC) DDR5 subchannels per channel |
− | ** Up to 24 DIMMs, max. | + | ** Up to 24 DIMMs, max. 6 TiB |
− | ** Up to PC5- | + | ** Up to PC5-38400 (DDR5-4800) |
** {{abbr|SR}}/{{abbr|DR}} {{abbr|RDIMM}}, {{abbr|4R}}/{{abbr|8R}} {{abbr|LRDIMM}}, {{abbr|3DS DIMM}} | ** {{abbr|SR}}/{{abbr|DR}} {{abbr|RDIMM}}, {{abbr|4R}}/{{abbr|8R}} {{abbr|LRDIMM}}, {{abbr|3DS DIMM}} | ||
** ECC supported (x4, x8, x16, chipkill)<!--AMD-55901-0.97 Sec 3.7--> | ** ECC supported (x4, x8, x16, chipkill)<!--AMD-55901-0.97 Sec 3.7--> | ||
** DRAM bus parity and write data CRC options<!--ibid--> | ** DRAM bus parity and write data CRC options<!--ibid--> | ||
− | Sources: <ref name="amd-55901-ppr- | + | Sources: <ref name="amd-55901-ppr-1911"/><ref name="amd-57647-zen4-optim"/><ref name="amd-58015-9004-overv"/> |
== All Zen 4 Processors == | == All Zen 4 Processors == | ||
Line 159: | Line 187: | ||
Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips | Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips | ||
--> | --> | ||
− | { | + | {| class="comptable3" |
− | + | ! List of all Zen 4-based Processors | |
− | + | |} | |
− | { | + | <div class="comptable-scroller sticky"> |
− | + | {| class="comptable3 stickycol1 sortable" | |
− | {{# | + | |- class="header continued" |
− | | | + | ! Model |
− | | | + | ! Codename |
− | | | + | ! {{abbr|C|Cores}} |
− | | | + | ! {{abbr|T|Threads}} |
− | | | + | ! data-sort-type=number | L2$ |
− | | | + | ! data-sort-type=number | L3$ |
− | | | + | ! data-sort-type=number | Frequ. |
− | | | + | ! data-sort-type=number | Turbo |
− | | | + | ! data-sort-type=number | Turbo 1C |
− | | | + | ! Memory |
− | | | + | ! data-sort-type=number | {{abbr|TDP}} |
− | | | + | ! data-sort-type=date | Launched |
− | | | + | ! Release<br />Price |
− | | | + | ! {{abbr|OPN}} |
− | | | + | |- class="separator sortbottom" |
− | | | + | | colspan=4 | [[Uniprocessors]] |
− | | | + | | colspan=10 | |
− | | | + | {{#invoke:comptable|askt |
− | | | + | |condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::1]] |
− | | | + | |sort=name |valuesep=,<br /> |template=<nowiki> |
− | | | + | |- |
− | + | | data-sort-value="{{{name#-}}}" | {{amd|{{{microprocessor family#-}}}}} [[{{{page#-}}}|{{{model number#-}}}]] | |
− | + | | {{amd|{{{core name#-}}}|l=core}} | |
− | {{# | + | | {{{core count}}} |
− | + | | {{{thread count}}} | |
− | + | | {{{l2$ size}}} | |
− | + | | {{{l3$ size}}} | |
− | + | | {{{base frequency#GHz}}} | |
− | + | | {{{turbo frequency#GHz}}} | |
− | + | | {{{turbo frequency (1 core)#GHz}}} | |
− | + | | {{{supported memory type}}} | |
− | + | | {{{tdp}}} | |
− | + | | {{{first launched}}} | |
− | + | | {{#if:{{{release price}}}|{{{release price}}}{{#ifeq:{{{release price}}}|{{{release price (tray)}}}| (1k)}} }} | |
− | + | | {{{part number}}}</nowiki>|outrotemplate=<nowiki> | |
− | + | |- class="separator sortbottom" | |
− | + | | colspan=4 | [[Multiprocessors]] (dual-socket) | |
− | + | | colspan=10 | | |
− | + | {{#invoke:comptable|askt | |
− | |sort= | + | |condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::>>1]] |
− | | | + | |sort=name |valuesep=,<br /> |template=<nowiki>{{{#template}}}</nowiki>}}</nowiki>}} |
− | |template= | + | |- |
− | + | ! Count: {{#ask:[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] |format=count}} | |
− | + | |} | |
− | + | </div> | |
− | }} | ||
− | {{ | ||
− | </ | ||
− | |||
== Designers == | == Designers == | ||
Line 222: | Line 246: | ||
== References == | == References == | ||
<references> | <references> | ||
− | |||
<ref name="ryzen-7000-preview">{{cite techdoc|title=Ryzen 7000 Desktop Preview|url=https://www.angstronomics.com/p/ryzen-7000-desktop-preview|publ=Angstronomics|date=2022-08-29}}</ref> | <ref name="ryzen-7000-preview">{{cite techdoc|title=Ryzen 7000 Desktop Preview|url=https://www.angstronomics.com/p/ryzen-7000-desktop-preview|publ=Angstronomics|date=2022-08-29}}</ref> | ||
+ | <ref name="amd-55901-ppr-1911">{{cite techdoc|title=Processor Programming Reference (PPR) for AMD Family 19h Models 11h, Revision B1 Processors|url=https://www.amd.com/system/files/TechDocs/55901_0.25.zip|publ=AMD|pid=55901|rev=0.25|date=2022-11-10}}</ref> | ||
+ | <ref name="amd-57647-zen4-optim">{{cite techdoc|title=Software Optimization Guide for the AMD Zen4 Microarchitecture|url=https://www.amd.com/system/files/TechDocs/57647.zip|publ=AMD|pid=57647|rev=1.00|date=2023-01-06}}</ref> | ||
+ | <ref name="amd-58015-9004-overv">{{cite techdoc|title=AMD EPYC™ 9004 Series Architecture Overview|url=https://www.amd.com/system/files/documents/58015-epyc-9004-tg-architecture-overview.pdf|publ=AMD|pid=58015|rev=1.1|date=2022-12}}</ref> | ||
</references> | </references> | ||
Latest revision as of 19:28, 13 November 2023
Edit Values | |
Zen 4 µarch | |
General Info | |
Arch Type | CPU |
Designer | AMD |
Manufacturer | TSMC |
Process | 5 nm, 6 nm |
Succession | |
Zen 4 is a microarchitecture developed by AMD as a successor to Zen 3. See press release for details: AMD Launches Ryzen 7000 Series Desktop Processors
Contents
History[edit]
Zen 4 was first mentioned by Forrest Norrod during AMD's EPYC One Year Anniversary webinar. During the next horizon event which was held on November 6, 2018, AMD stated that Zen 4 was at the design completion phase.
Products[edit]
Processor Series | Cores/Threads | Market |
---|---|---|
EPYC 9004 "Genoa" | Up to 96/192 | High-end server multiprocessors |
Ryzen Threadripper 7000 "Storm Peak" | Up to 96/192 | Workstation & enthusiasts |
Ryzen 7000 "Raphael" | Up to 16/32 | Mainstream to high-end desktops & enthusiasts |
Ryzen 7000 APU "Dragon Range" | Up to 16/32 | High-end mobile processors with GPU |
Ryzen 7000 APU "Phoenix Point" | Up to 8/16 | Mainstream desktop & mobile processors with GPU |
Cores using variant Zen 4 uarch:
Processor Series | Cores/Threads | Market |
---|---|---|
EPYC 9004 "Bergamo" | Up to 128/256 | Cloud multiprocessors (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.) |
EPYC 8004 "Siena" | Up to 64/128 | Edge-optimized server chips |
Architectural Codenames:
Arch | Codename |
---|---|
Core | Persephone |
CCD | Durango |
Process Technology[edit]
Processors implementing Zen 4 are SoCs configured as a Multi-Chip Module or monolithic chip. MCMs consist of a single I/O die and up to 12 Core Complex Dies attached with full-duplex serial point-to-point links. The IOD contains memory controllers, I/O controllers, microcontrollers for security purposes and power management, and other peripherals. The CCDs communicate with peripherals and each other through the Data and Control Fabrics on the I/O die, and each contain a single Core Complex (CCX). The monolithic chips integrate a subset of the IOD facilities and additional peripherals tailored for their target market, a CCX, and a GPU. A CCX contains 8 CPU cores (fewer may be usable on some models) communicating through a shared L3 cache.
("Bergamo" processors configuration TBD.)
The chips are fabricated by TSMC, CCDs and monolithic chips on a 5 nm node, IODs on a 6 nm node.
Architecture[edit]
Zen 4 is a 64-bit superscalar, out-of-order, 2-way SMT microarchitecture with advanced dynamic branch prediction, 4-way decoding of x86 instructions with a stack optimizer, multiple caches including an Op cache for decoded instructions and prefetchers for code and data, four integer/address and two floating point instruction schedulers, 3-way address generation, 5-way integer execution. 4-way 256-bit wide floating point execution, a speculative, out-of-order load/store unit capable of up to three loads or two stores per cycle with a 48/88-entry load and 64-entry store queue, write-combining, and 5-level paging with four TLBs and six hardware page table walkers.
Key changes from Zen 3[edit]
- AVX-512 instructions support, 256-bit data path[1]
- L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
- Op cache size increased from 4,096 to 6,912 Ops per core
- L2 cache doubled from 512 KiB to 1 MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum
- L3 cache average load-to-use latency increased from 46 to 50 cycles
- Five-level paging; Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
- Improved cache load, write and prefetch from/to register (less latency)
- Higher Transistor Density, due to 5nm process
- Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
- Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries)
- REPE CMPSB (sometimes used to implement string comparison) is significantly sped up, processes more than 32 bytes/cycle when operating on L1 data.
- BSF, BSR, and BMI1 instructions BLSI, BLSMSK, BLSR, TZCNT have smaller latency of 1 and x2 throughput (4 insn/cycle).
- Latency and/or throughput of VPERMx, V[P]BROADCASTx, VPMOV{S,Z}Xx instructions improved.
- Some ALU operations on vector registers increased throughput from 2 to 3 ops/cycle.
- Some ALU operations on vector registers (VPABSx,VPHADDx,VPHSUBx,VPSLLx,VPSRLx,VPSRAx,VPACKx,VPSIGNx,VMAXx,VMINx) increased latency by 1 cycle.
Package level changes:
- EPYC 9004 "Genoa": Max. core/thread count 96/192, up from 64/128 on EPYC 7003 "Milan"
- EPYC "Bergamo": Max. 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space
- Support for DDR5 memory and PCIe Gen 5
- New sockets AM5 (client), SP5 and SP6 (server), FP7/FP7r2 (mobile)
- APUs: RDNA2-based iGPU with 2 compute units (128 stream processors)
New Instructions[edit]
Zen 4 introduced the following ISA enhancements:
- AVX-512 - 512-bit Vector Instructions
- AVX512F - Foundation (first introduced with Intel Skylake)
- AVX512CD - Conflict Detection Instructions (Skylake X)
- AVX512VL - Vector Length Extensions (Skylake X)
- AVX512DQ - Doubleword and Quadword Instructions (Skylake X)
- AVX512BW - Byte and Word Instructions (Skylake X)
- AVX512_IFMA - Integer Fused Multiply-Add (Cannon Lake)
- AVX512_VBMI - Vector Bit Manipulation Instructions (Cannon Lake)
- AVX512_VPOPCNTDQ - Vector Population Count Instructions (Ice Lake)
- AVX512_BITALG - Bit Algorithms (Ice Lake)
- AVX512_VBMI2 - Vector Bit Manipulation Instructions 2 (Ice Lake)
- AVX512_VNNI - Vector Neural Network Instructions (Ice Lake)
- AVX512_BF16 - BFloat16 Instructions (Cooper Lake)
- Not supported: AVX512ER, AVX512PF (Knights Landing); AVX512 4VNNIW, 4FMAPS (Knights Mill); VP2INTERSECT (Tiger Lake); FP16 (Sapphire Rapids)
- GFNI - Galois Field New Instructions (first introduced with Intel Ice Lake)
-
VGF2P8AFFINEQB
- Galois field affine transformation -
VGF2P8AFFINEINVQB
- Galois field affine transformation inverse -
VGF2P8MULB
- Galois field multiply bytes
-
Memory Hierarchy[edit]
Data and Instruction Caches[edit]
- L0 Op Cache:
- Up to 6,912 Ops per core, 12-way set associative
- 9 Op line size (restrictions apply depending on instruction type)
- Parity protected
- L1I Cache:
- 32 KiB per core, 8-way set associative
- 64 B line size
- Parity protected
- L1D Cache:
- 32 KiB per core, 8-way set associative
- 64 B line size
- Write-back policy
- 4-5 cycles latency for Int
- 7-8 cycles latency for FP
- ECC
- L2 Cache:
- 512 KiB or 1 MiB per core (varies by processor model), 8-way set associative
- 64 B line size
- Write-back policy
- Inclusive of L1
- ≥ 14 cycles latency
- DEC-TED ECC, tag & state arrays SEC-DED
- L3 Cache:
- "Genoa": up to 32 MiB/CCX (8 cores), up to 384 MiB total
- Shared by all cores in the CCX, configurable
- 16-way set associative
- 64 B line size
- L2 victim cache
- Write-back policy
- 50 cycles average load-to-use latency
- DEC-TED ECC, tag array & shadow tags SEC-DED
- QoS Monitoring and Enforcement with BMEC, L3RR, L3SBE
Translation Lookaside Buffers[edit]
- ITLB
- 64 entry L1 TLB, fully associative
- 4-Kbyte, 2-Mbyte, 1-Gbyte page sizes
- 512 entry L2 TLB, 8-way set associative
- 4-Kbyte, 2-Mbyte, and 4-Mbyte pages
- Parity protected
- 64 entry L1 TLB, fully associative
- DTLB
- 72 entry L1 TLB, fully associative
- 4-Kbyte, 16-Kbyte, 2-Mbyte, 1-Gbyte page sizes
- 3,072 entry L2 TLB, 24-way set associative
- 4-Kbyte, 16-Kbyte, 2-Mbyte, and 4-Mbyte pages, PDEs to speed up table walks
- Parity protected
- 72 entry L1 TLB, fully associative
4-Mbyte pages require two 2-Mbyte entries in all TLBs. 16-Kbyte page size refers to PTE coalescing of four physically consecutive and 16-Kbyte aligned 4-Kbyte pages. All caches and TLBs are competitively shared in multi-threaded mode.
System DRAM[edit]
- Ryzen 7000 "Raphael":
- Up to PC5-41600 (DDR5-5200) without overclocking
- EPYC 9004 "Genoa":
All Zen 4 Processors[edit]
List of all Zen 4-based Processors |
---|
Model | Codename | C | T | L2$ | L3$ | Frequ. | Turbo | Turbo 1C | Memory | TDP | Launched | Release Price |
OPN |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Uniprocessors | |||||||||||||
EPYC 9354P | Genoa | 32 | 64 | 32 MiB 32,768 KiB
33,554,432 B 0.0313 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
3.25 GHz 3,250 MHz
3,250,000 kHz |
3.75 GHz 3,750 MHz
3,750,000 kHz |
3.8 GHz 3,800 MHz
3,800,000 kHz |
DDR5-4800 | 280 W 280,000 mW
0.375 hp 0.28 kW |
10 November 2022 | $ 2,730.00 € 2,457.00 (1k)
£ 2,211.30 ¥ 282,090.90 |
100-100000805, 100-100000805WOF |
EPYC 9454P | Genoa | 48 | 96 | 48 MiB 49,152 KiB
50,331,648 B 0.0469 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
2.75 GHz 2,750 MHz
2,750,000 kHz |
3.65 GHz 3,650 MHz
3,650,000 kHz |
3.8 GHz 3,800 MHz
3,800,000 kHz |
DDR5-4800 | 290 W 290,000 mW
0.389 hp 0.29 kW |
10 November 2022 | $ 4,598.00 € 4,138.20 (1k)
£ 3,724.38 ¥ 475,111.34 |
100-100000873, 100-100000873WOF |
EPYC 9554P | Genoa | 64 | 128 | 64 MiB 65,536 KiB
67,108,864 B 0.0625 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
3.1 GHz 3,100 MHz
3,100,000 kHz |
3.75 GHz 3,750 MHz
3,750,000 kHz |
3.75 GHz 3,750 MHz
3,750,000 kHz |
DDR5-4800 | 360 W 360,000 mW
0.483 hp 0.36 kW |
10 November 2022 | $ 7,104.00 € 6,393.60 (1k)
£ 5,754.24 ¥ 734,056.32 |
100-100000804, 100-100000804WOF |
EPYC 9654P | Genoa | 96 | 192 | 96 MiB 98,304 KiB
100,663,296 B 0.0938 GiB |
384 MiB 393,216 KiB
402,653,184 B 0.375 GiB |
2.4 GHz 2,400 MHz
2,400,000 kHz |
3.55 GHz 3,550 MHz
3,550,000 kHz |
3.7 GHz 3,700 MHz
3,700,000 kHz |
DDR5-4800 | 360 W 360,000 mW
0.483 hp 0.36 kW |
10 November 2022 | $ 10,625.00 € 9,562.50 (1k)
£ 8,606.25 ¥ 1,097,881.25 |
100-100000803, 100-100000803WOF |
Ryzen 5 7600X | Raphael | 6 | 12 | 6 MiB 6,144 KiB
6,291,456 B 0.00586 GiB |
32 MiB 32,768 KiB
33,554,432 B 0.0313 GiB |
4.7 GHz 4,700 MHz
4,700,000 kHz |
5.3 GHz 5,300 MHz
5,300,000 kHz |
105 W 105,000 mW
0.141 hp 0.105 kW |
|||||
Ryzen 7 7700 | Raphael | 8 | 16 | 8 MiB 8,192 KiB
8,388,608 B 0.00781 GiB |
32 MiB 32,768 KiB
33,554,432 B 0.0313 GiB |
3.8 GHz 3,800 MHz
3,800,000 kHz |
5.3 GHz 5,300 MHz
5,300,000 kHz |
65 W 65,000 mW
0.0872 hp 0.065 kW |
10 January 2023 | $ 339.00 € 305.10
£ 274.59 ¥ 35,028.87 |
100-000000592, 100-100000592BOX | ||
Ryzen 7 7700X | Raphael | 8 | 16 | 8 MiB 8,192 KiB
8,388,608 B 0.00781 GiB |
32 MiB 32,768 KiB
33,554,432 B 0.0313 GiB |
4.5 GHz 4,500 MHz
4,500,000 kHz |
5.4 GHz 5,400 MHz
5,400,000 kHz |
105 W 105,000 mW
0.141 hp 0.105 kW |
27 September 2022 | $ 399.00 € 359.10
£ 323.19 ¥ 41,228.67 |
100-000000591, 100-100000591WOF | ||
Ryzen 7 7800X3D | Raphael | 8 | 16 | 8 MiB 8,192 KiB
8,388,608 B 0.00781 GiB |
96 MiB 98,304 KiB
100,663,296 B 0.0938 GiB |
4.2 GHz 4,200 MHz
4,200,000 kHz |
5 GHz 5,000 MHz
5,000,000 kHz |
120 W 120,000 mW
0.161 hp 0.12 kW |
|||||
Ryzen 9 7900X3D | Raphael | 12 | 24 | 12 MiB 12,288 KiB
12,582,912 B 0.0117 GiB |
128 MiB 131,072 KiB
134,217,728 B 0.125 GiB |
4.4 GHz 4,400 MHz
4,400,000 kHz |
5.6 GHz 5,600 MHz
5,600,000 kHz |
120 W 120,000 mW
0.161 hp 0.12 kW |
|||||
Ryzen 9 7950X3D | Raphael | 16 | 32 | 16 MiB 16,384 KiB
16,777,216 B 0.0156 GiB |
128 MiB 131,072 KiB
134,217,728 B 0.125 GiB |
4.2 GHz 4,200 MHz
4,200,000 kHz |
5.7 GHz 5,700 MHz
5,700,000 kHz |
120 W 120,000 mW
0.161 hp 0.12 kW |
28 February 2023 | $ 699.00 € 629.10
£ 566.19 ¥ 72,227.67 |
100-000000908, 100-000000908WOF | ||
Multiprocessors (dual-socket) | |||||||||||||
EPYC 9124 | Genoa | 16 | 32 | 16 MiB 16,384 KiB
16,777,216 B 0.0156 GiB |
64 MiB 65,536 KiB
67,108,864 B 0.0625 GiB |
3 GHz 3,000 MHz
3,000,000 kHz |
3.6 GHz 3,600 MHz
3,600,000 kHz |
3.7 GHz 3,700 MHz
3,700,000 kHz |
DDR5-4800 | 200 W 200,000 mW
0.268 hp 0.2 kW |
10 November 2022 | $ 1,083.00 € 974.70 (1k)
£ 877.23 ¥ 111,906.39 |
100-100000802, 100-100000802WOF |
EPYC 9174F | Genoa | 16 | 32 | 16 MiB 16,384 KiB
16,777,216 B 0.0156 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
4.1 GHz 4,100 MHz
4,100,000 kHz |
4.15 GHz 4,150 MHz
4,150,000 kHz |
4.4 GHz 4,400 MHz
4,400,000 kHz |
DDR5-4800 | 320 W 320,000 mW
0.429 hp 0.32 kW |
10 November 2022 | $ 3,850.00 € 3,465.00 (1k)
£ 3,118.50 ¥ 397,820.50 |
100-100000796, 100-100000796WOF |
EPYC 9224 | Genoa | 24 | 48 | 24 MiB 24,576 KiB
25,165,824 B 0.0234 GiB |
64 MiB 65,536 KiB
67,108,864 B 0.0625 GiB |
2.5 GHz 2,500 MHz
2,500,000 kHz |
3.65 GHz 3,650 MHz
3,650,000 kHz |
3.7 GHz 3,700 MHz
3,700,000 kHz |
DDR5-4800 | 200 W 200,000 mW
0.268 hp 0.2 kW |
10 November 2022 | $ 1,825.00 € 1,642.50 (1k)
£ 1,478.25 ¥ 188,577.25 |
100-100000939, 100-100000939WOF |
EPYC 9254 | Genoa | 24 | 48 | 24 MiB 24,576 KiB
25,165,824 B 0.0234 GiB |
128 MiB 131,072 KiB
134,217,728 B 0.125 GiB |
2.9 GHz 2,900 MHz
2,900,000 kHz |
3.9 GHz 3,900 MHz
3,900,000 kHz |
4.15 GHz 4,150 MHz
4,150,000 kHz |
DDR5-4800 | 200 W 200,000 mW
0.268 hp 0.2 kW |
10 November 2022 | $ 2,299.00 € 2,069.10 (1k)
£ 1,862.19 ¥ 237,555.67 |
100-100000480, 100-100000480WOF |
EPYC 9274F | Genoa | 24 | 48 | 24 MiB 24,576 KiB
25,165,824 B 0.0234 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
4.05 GHz 4,050 MHz
4,050,000 kHz |
4.1 GHz 4,100 MHz
4,100,000 kHz |
4.3 GHz 4,300 MHz
4,300,000 kHz |
DDR5-4800 | 320 W 320,000 mW
0.429 hp 0.32 kW |
10 November 2022 | $ 3,060.00 € 2,754.00 (1k)
£ 2,478.60 ¥ 316,189.80 |
100-100000794, 100-100000794WOF |
EPYC 9334 | Genoa | 32 | 64 | 32 MiB 32,768 KiB
33,554,432 B 0.0313 GiB |
128 MiB 131,072 KiB
134,217,728 B 0.125 GiB |
2.7 GHz 2,700 MHz
2,700,000 kHz |
3.85 GHz 3,850 MHz
3,850,000 kHz |
3.9 GHz 3,900 MHz
3,900,000 kHz |
DDR5-4800 | 210 W 210,000 mW
0.282 hp 0.21 kW |
10 November 2022 | $ 2,990.00 € 2,691.00 (1k)
£ 2,421.90 ¥ 308,956.70 |
100-100000800, 100-100000800WOF |
EPYC 9354 | Genoa | 32 | 64 | 32 MiB 32,768 KiB
33,554,432 B 0.0313 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
3.25 GHz 3,250 MHz
3,250,000 kHz |
3.75 GHz 3,750 MHz
3,750,000 kHz |
3.8 GHz 3,800 MHz
3,800,000 kHz |
DDR5-4800 | 280 W 280,000 mW
0.375 hp 0.28 kW |
10 November 2022 | $ 3,420.00 € 3,078.00 (1k)
£ 2,770.20 ¥ 353,388.60 |
100-100000798, 100-100000798WOF |
EPYC 9374F | Genoa | 32 | 64 | 32 MiB 32,768 KiB
33,554,432 B 0.0313 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
3.85 GHz 3,850 MHz
3,850,000 kHz |
4.1 GHz 4,100 MHz
4,100,000 kHz |
4.3 GHz 4,300 MHz
4,300,000 kHz |
DDR5-4800 | 320 W 320,000 mW
0.429 hp 0.32 kW |
10 November 2022 | $ 4,850.00 € 4,365.00 (1k)
£ 3,928.50 ¥ 501,150.50 |
100-100000792, 100-100000792WOF |
EPYC 9454 | Genoa | 48 | 96 | 48 MiB 49,152 KiB
50,331,648 B 0.0469 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
2.75 GHz 2,750 MHz
2,750,000 kHz |
3.65 GHz 3,650 MHz
3,650,000 kHz |
3.8 GHz 3,800 MHz
3,800,000 kHz |
DDR5-4800 | 290 W 290,000 mW
0.389 hp 0.29 kW |
10 November 2022 | $ 5,225.00 € 4,702.50 (1k)
£ 4,232.25 ¥ 539,899.25 |
100-100000478, 100-100000478WOF |
EPYC 9474F | Genoa | 48 | 96 | 48 MiB 49,152 KiB
50,331,648 B 0.0469 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
3.6 GHz 3,600 MHz
3,600,000 kHz |
3.95 GHz 3,950 MHz
3,950,000 kHz |
4.1 GHz 4,100 MHz
4,100,000 kHz |
DDR5-4800 | 360 W 360,000 mW
0.483 hp 0.36 kW |
10 November 2022 | $ 6,780.00 € 6,102.00 (1k)
£ 5,491.80 ¥ 700,577.40 |
100-100000788, 100-100000788WOF |
EPYC 9534 | Genoa | 64 | 128 | 64 MiB 65,536 KiB
67,108,864 B 0.0625 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
2.45 GHz 2,450 MHz
2,450,000 kHz |
3.55 GHz 3,550 MHz
3,550,000 kHz |
3.7 GHz 3,700 MHz
3,700,000 kHz |
DDR5-4800 | 280 W 280,000 mW
0.375 hp 0.28 kW |
10 November 2022 | $ 8,803.00 € 7,922.70 (1k)
£ 7,130.43 ¥ 909,613.99 |
100-100000799, 100-100000799WOF |
EPYC 9554 | Genoa | 64 | 128 | 64 MiB 65,536 KiB
67,108,864 B 0.0625 GiB |
256 MiB 262,144 KiB
268,435,456 B 0.25 GiB |
3.1 GHz 3,100 MHz
3,100,000 kHz |
3.75 GHz 3,750 MHz
3,750,000 kHz |
3.75 GHz 3,750 MHz
3,750,000 kHz |
DDR5-4800 | 360 W 360,000 mW
0.483 hp 0.36 kW |
10 November 2022 | $ 9,087.00 € 8,178.30 (1k)
£ 7,360.47 ¥ 938,959.71 |
100-100000790, 100-100000790WOF |
EPYC 9634 | Genoa | 84 | 168 | 84 MiB 86,016 KiB
88,080,384 B 0.082 GiB |
384 MiB 393,216 KiB
402,653,184 B 0.375 GiB |
2.25 GHz 2,250 MHz
2,250,000 kHz |
3.1 GHz 3,100 MHz
3,100,000 kHz |
3.7 GHz 3,700 MHz
3,700,000 kHz |
DDR5-4800 | 290 W 290,000 mW
0.389 hp 0.29 kW |
10 November 2022 | $ 10,304.00 € 9,273.60 (1k)
£ 8,346.24 ¥ 1,064,712.32 |
100-100000797, 100-100000797WOF |
EPYC 9654 | Genoa | 96 | 192 | 96 MiB 98,304 KiB
100,663,296 B 0.0938 GiB |
384 MiB 393,216 KiB
402,653,184 B 0.375 GiB |
2.4 GHz 2,400 MHz
2,400,000 kHz |
3.55 GHz 3,550 MHz
3,550,000 kHz |
3.7 GHz 3,700 MHz
3,700,000 kHz |
DDR5-4800 | 360 W 360,000 mW
0.483 hp 0.36 kW |
10 November 2022 | $ 11,805.00 € 10,624.50 (1k)
£ 9,562.05 ¥ 1,219,810.65 |
100-100000789, 100-100000789WOF |
Count: 24 |
Designers[edit]
- Mike Clark(?), chief architect
Bibliography[edit]
References[edit]
- ↑ "Ryzen 7000 Desktop Preview", Angstronomics, August 29, 2022
- ↑ "Processor Programming Reference (PPR) for AMD Family 19h Models 11h, Revision B1 Processors", AMD Publ. #55901, Rev. 0.25, November 10, 2022
- ↑ "Software Optimization Guide for the AMD Zen4 Microarchitecture", AMD Publ. #57647, Rev. 1.00, January 6, 2023
- ↑ "AMD EPYC™ 9004 Series Architecture Overview", AMD Publ. #58015, Rev. 1.1, December 2022
See Also[edit]
- AMD Zen, Zen 2, Zen 3
- Intel Meteor Lake
codename | Zen 4 + |
designer | AMD + |
full page name | amd/microarchitectures/zen 4 + |
instance of | microarchitecture + |
manufacturer | TSMC + |
microarchitecture type | CPU + |
name | Zen 4 + |
process | 5 nm (0.005 μm, 5.0e-6 mm) + and 6 nm (0.006 μm, 6.0e-6 mm) + |