From WikiChip
Editing amd/microarchitectures/zen 4

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 6: Line 6:
 
|manufacturer=TSMC
 
|manufacturer=TSMC
 
|process=5 nm
 
|process=5 nm
|process 2=6 nm
 
 
|predecessor=Zen 3
 
|predecessor=Zen 3
 
|predecessor link=amd/microarchitectures/zen 3
 
|predecessor link=amd/microarchitectures/zen 3
Line 13: Line 12:
 
|succession=Yes
 
|succession=Yes
 
}}
 
}}
'''Zen 4''' is a [[microarchitecture]] developed by [[AMD]] as a successor to {{\\|Zen 3}}. See press release for details: [https://www.amd.com/en/press-releases/2022-08-29-amd-launches-ryzen-7000-series-desktop-processors-zen-4-architecture-the AMD Launches Ryzen 7000 Series Desktop Processors]
+
'''Zen 4''' is a planned [[microarchitecture]] being developed by [[AMD]] as a successor to {{\\|Zen 3}}.
  
 
== History ==
 
== History ==
Line 19: Line 18:
 
Zen 4 was first mentioned by Forrest Norrod during AMD's EPYC One Year Anniversary webinar. During the next horizon event which was held on November 6, 2018, AMD stated that Zen 4 was at the design completion phase.
 
Zen 4 was first mentioned by Forrest Norrod during AMD's EPYC One Year Anniversary webinar. During the next horizon event which was held on November 6, 2018, AMD stated that Zen 4 was at the design completion phase.
  
== Products ==
+
== Process Technology ==
 +
AMD claims that Zen4 is going to be produced on a [[5nm]] node by [[TSMC]].
 +
 
 +
== Codenames ==
 
{{future information}}
 
{{future information}}
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Processor Series !! Cores/Threads !! Market
+
! Core !! C/T !! Target
|-
 
| EPYC 9004 "{{amd|Genoa|l=core}}" || Up to 96/192 || High-end server [[multiprocessors]]
 
|-
 
| Ryzen Threadripper 7000 "{{amd|Storm Peak|l=core}}" || Up to 96/192 || Workstation & enthusiasts
 
 
|-
 
|-
| Ryzen 7000 "{{amd|Raphael|l=core}}" || Up to 16/32 || Mainstream to high-end desktops & enthusiasts
+
| {{amd|Genoa|l=core}} || Up to 96/192 || High-end server [[multiprocessors]]
 
|-
 
|-
| Ryzen 7000 APU "{{amd|Dragon Range|l=core}}" || Up to 16/32 || High-end mobile processors with GPU
+
| {{amd|Warhol|l=core}} || Up to 20/40 || Mainstream to high-end desktops & enthusiasts market processors
 
|-
 
|-
| Ryzen 7000 APU "{{amd|Phoenix Point|l=core}}" || Up to 8/16 || Mainstream desktop & mobile processors with GPU  
+
| {{amd|Rembrandt|l=core}} || Up to 8/16 || Mainstream desktop & mobile processors with GPU  
 
|}
 
|}
  
Line 41: Line 39:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Processor Series !! Cores/Threads !! Market
+
! Core !! C/T !! Target
|-
 
| EPYC 9004 "{{amd|Bergamo|l=core}}" || Up to 128/256  || Cloud [[multiprocessors]] (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.)
 
|-
 
| EPYC 8004 "{{amd|Siena|l=core}}" || Up to 64/128 || Edge-optimized server chips
 
|}
 
 
 
'''Architectural Codenames:'''
 
{| class="wikitable"
 
|-
 
! Arch !! Codename
 
|-
 
| Core || Persephone
 
 
|-
 
|-
| {{abbr|CCD}} || Durango
+
| {{amd|Bergamo|l=core}} || Up to 128/255 || Cloud multiprocessing (smaller Zen 4c [referred to as “Zen 4D” in leaks] core likely sacrificing AVX-512)
 
|}
 
|}
 
== Process Technology ==
 
Processors implementing Zen 4 are {{abbr|SoC}}s configured as a Multi-Chip Module or monolithic chip. MCMs consist of a single I/O die and up to 12 Core Complex Dies attached with full-duplex serial point-to-point links. The IOD contains memory controllers, I/O controllers, microcontrollers for security purposes and power management, and other peripherals. The CCDs communicate with peripherals and each other through the Data and Control Fabrics on the I/O die, and each contain a single Core Complex (CCX). The monolithic chips integrate a subset of the IOD facilities and additional peripherals tailored for their target market, a CCX, and a GPU. A CCX contains 8 CPU cores (fewer may be usable on some models) communicating through a shared L3 cache.
 
 
("Bergamo" processors configuration TBD.)
 
 
The chips are fabricated by [[TSMC]], CCDs and monolithic chips on a [[5 nm]] node, IODs on a [[6 nm]] node.
 
 
 
== Architecture ==
 
== Architecture ==
Zen 4 is a 64-bit superscalar, out-of-order, 2-way [[SMT]] microarchitecture with advanced dynamic branch prediction, 4-way decoding of [[x86]] instructions with a stack optimizer, multiple caches including an Op cache for decoded instructions and prefetchers for code and data, four integer/address and two floating point instruction schedulers, 3-way address generation, 5-way integer execution. 4-way 256-bit wide floating point execution, a speculative, out-of-order load/store unit capable of up to three loads or two stores per cycle with a 48/88-entry load and 64-entry store queue, write-combining, and 5-level paging with four {{abbr|TLB}}s and six hardware page table walkers.
+
Little is currently known about the architectural improvements that are being done to Zen 4.
  
 
=== Key changes from {{\\|Zen 3}} ===
 
=== Key changes from {{\\|Zen 3}} ===
* {{x86|AVX-512}} instructions support, 256-bit data path<ref name="ryzen-7000-preview"/>
+
{{empty section}}
* L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
+
* raised core/thread count from 64/128 to at least 96/192 (vastly due to 5nm process allowing more space, therefore more cores).
* Op cache size increased from 4,096 to 6,912 Ops per core
+
* improved cache load, write and prefetch from/to register (less latency).
* L2 cache doubled from 512&nbsp;KiB to 1&nbsp;MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum
+
* improved iGPUs for APU variants; navi integrated gpu with up to 3.4 TFLOPs FP32 (clock frequency unknown, at least 2 GHz).
* L3 cache average load-to-use latency increased from 46 to 50 cycles
+
* utilizes new AM5 socket and is expected to support DDR5 and possibly PCIe 5.
* Five-level paging; Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
+
* more transistors (depending on AM5 socket as well and not just the CPU it self).
* Improved cache load, write and prefetch from/to register (less latency)
 
* Higher Transistor Density, due to 5nm process
 
* Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
 
* Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries)
 
* REPE CMPSB (sometimes used to implement string comparison) is significantly sped up, processes more than 32 bytes/cycle when operating on L1 data.
 
* BSF, BSR, and BMI1 instructions BLSI, BLSMSK, BLSR, TZCNT have smaller latency of 1 and x2 throughput (4 insn/cycle).
 
* Latency and/or throughput of VPERMx, V[P]BROADCASTx, VPMOV{S,Z}Xx instructions improved.
 
* Some ALU operations on vector registers increased throughput from 2 to 3 ops/cycle.
 
* Some ALU operations on vector registers (VPABSx,VPHADDx,VPHSUBx,VPSLLx,VPSRLx,VPSRAx,VPACKx,VPSIGNx,VMAXx,VMINx) increased latency by 1 cycle.
 
  
 
+
== Bibliography ==
Package level changes:
+
{{reflist}}
* EPYC 9004 "{{amd|Genoa|l=core}}": Max. core/thread count 96/192, up from 64/128 on EPYC 7003 "{{amd|Milan|l=core}}"
 
* EPYC "{{amd|Bergamo|l=core}}": Max. 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space
 
* Support for DDR5 memory and PCIe Gen 5
 
* New sockets {{amd|AM5|l=pack}} (client), {{amd|SP5|l=pack}} and {{amd|SP6|l=pack}} (server), {{amd|FP7|FP7/FP7r2|l=pack}} (mobile)
 
* {{abbr|APU}}s: RDNA2-based iGPU with 2 compute units (128 stream processors)
 
 
 
=== New Instructions ===
 
Zen 4 introduced the following ISA enhancements:
 
 
 
* {{x86|AVX-512}} - 512-bit Vector Instructions
 
** {{x86|AVX512F}} - Foundation (first introduced with [[Intel]] {{intel|skylake (server)|Skylake|l=arch}})
 
** {{x86|AVX512CD}} - Conflict Detection Instructions ({{intel|Skylake X|l=core}})
 
** {{x86|AVX512VL}} - Vector Length Extensions (Skylake X)
 
** {{x86|AVX512DQ}} - Doubleword and Quadword Instructions (Skylake X)
 
** {{x86|AVX512BW}} - Byte and Word Instructions (Skylake X)
 
** {{x86|AVX512_IFMA}} - Integer Fused Multiply-Add ({{intel|Cannon Lake|l=arch}})
 
** {{x86|AVX512_VBMI}} - Vector Bit Manipulation Instructions (Cannon Lake)
 
** {{x86|AVX512_VPOPCNTDQ}} - Vector Population Count Instructions ({{intel|ice lake (server)|Ice Lake|l=arch}})
 
** {{x86|AVX512_BITALG}} - Bit Algorithms (Ice Lake)
 
** {{x86|AVX512_VBMI2}} - Vector Bit Manipulation Instructions 2 (Ice Lake)
 
** {{x86|AVX512_VNNI}} - Vector Neural Network Instructions (Ice Lake)
 
** {{x86|AVX512_BF16}} - [[bfloat16|BFloat16]] Instructions ({{intel|Cooper Lake|l=arch}})
 
** ''Not supported'': AVX512ER, AVX512PF ({{intel|Knights Landing|l=arch}}); AVX512 4VNNIW, 4FMAPS ({{intel|Knights Mill|l=arch}}); VP2INTERSECT ({{intel|Tiger Lake|l=arch}}); FP16 ({{intel|Sapphire Rapids|l=arch}})
 
* {{x86|GFNI}} - Galois Field New Instructions (first introduced with [[Intel]] {{intel|ice lake (server)|Ice Lake|l=arch}})
 
** <code>VGF2P8AFFINEQB</code> - Galois field affine transformation
 
** <code>VGF2P8AFFINEINVQB</code> - Galois field affine transformation inverse
 
** <code>VGF2P8MULB</code> - Galois field multiply bytes
 
 
 
=== Memory Hierarchy ===
 
==== Data and Instruction Caches ====
 
* L0 Op Cache:
 
** Up to 6,912 Ops per core, 12-way set associative
 
** 9 Op line size (restrictions apply depending on instruction type)
 
** Parity protected
 
* L1I Cache:
 
** 32&nbsp;KiB per core, 8-way set associative
 
** 64&nbsp;B line size
 
** Parity protected
 
* L1D Cache:
 
** 32&nbsp;KiB per core, 8-way set associative
 
** 64&nbsp;B line size
 
** Write-back policy
 
** 4-5 cycles latency for Int
 
** 7-8 cycles latency for FP
 
** ECC
 
* L2 Cache:
 
** 512&nbsp;KiB or 1&nbsp;MiB per core (varies by processor model), 8-way set associative
 
** 64&nbsp;B line size
 
** Write-back policy
 
** Inclusive of L1
 
** ≥ 14 cycles latency
 
** {{abbr|DEC-TED}} ECC, tag & state arrays {{abbr|SEC-DED}}<!--7 check bits for 42 tag bits; AMD-55901-0.97 Sec 3.5-->
 
* L3 Cache:
 
** "{{amd|Genoa|l=core}}": up to 32&nbsp;MiB/{{abbr|CCX}} (8 cores), up to 384&nbsp;MiB total
 
** Shared by all cores in the CCX, configurable
 
** 16-way set associative
 
** 64&nbsp;B line size
 
** L2 [[victim cache]]
 
** Write-back policy
 
** 50 cycles average load-to-use latency
 
** DEC-TED ECC, tag array & shadow tags SEC-DED<!--AMD-55901-0.97 Sec 3.5-->
 
** QoS Monitoring and Enforcement with {{abbr|BMEC|Bandwidth Monitoring Event Configuration}}, {{abbr|L3RR|L3 Range Reservation}}, {{abbr|L3SBE|L3 External Slow Memory Bandwidth Enforcement}}
 
 
 
==== Translation Lookaside Buffers ====
 
* ITLB
 
** 64 entry L1 TLB, fully associative
 
*** 4-Kbyte, 2-Mbyte, 1-Gbyte page sizes
 
** 512 entry L2 TLB, 8-way set associative
 
*** 4-Kbyte, 2-Mbyte, and 4-Mbyte pages
 
** Parity protected
 
* DTLB
 
** 72 entry L1 TLB, fully associative
 
*** 4-Kbyte, 16-Kbyte, 2-Mbyte, 1-Gbyte page sizes
 
** 3,072 entry L2 TLB, 24-way set associative
 
*** 4-Kbyte, 16-Kbyte, 2-Mbyte, and 4-Mbyte pages, {{abbr|PDE|Page Directory Entry}}s to speed up table walks
 
** Parity protected
 
 
 
4-Mbyte pages require two 2-Mbyte entries in all TLBs. 16-Kbyte page size refers to {{abbr|PTE|Page Table Entry}} coalescing of four physically consecutive and 16-Kbyte aligned 4-Kbyte pages. All caches and TLBs are competitively shared in multi-threaded mode.
 
 
 
==== System DRAM ====
 
* Ryzen 7000 "{{amd|Raphael|l=core}}":
 
** Up to PC5-41600 (DDR5-5200) without overclocking
 
 
 
* EPYC 9004 "{{amd|Genoa|l=core}}":
 
** 12 channels per socket, two 40-bit (32 data, 8 ECC) DDR5 subchannels per channel
 
** Up to 24 DIMMs, max. 6&nbsp;TiB
 
** Up to PC5-38400 (DDR5-4800)
 
** {{abbr|SR}}/{{abbr|DR}} {{abbr|RDIMM}}, {{abbr|4R}}/{{abbr|8R}} {{abbr|LRDIMM}}, {{abbr|3DS DIMM}}
 
** ECC supported (x4, x8, x16, chipkill)<!--AMD-55901-0.97 Sec 3.7-->
 
** DRAM bus parity and write data CRC options<!--ibid-->
 
 
 
Sources: <ref name="amd-55901-ppr-1911"/><ref name="amd-57647-zen4-optim"/><ref name="amd-58015-9004-overv"/>
 
 
 
== All Zen 4 Processors ==
 
<!-- NOTE:
 
This table is generated automatically from the data in the actual articles.
 
If a microprocessor is missing from the list, an appropriate article for it needs to be
 
created and tagged accordingly.
 
Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 
-->
 
{| class="comptable3"
 
! List of all Zen 4-based Processors
 
|}
 
<div class="comptable-scroller sticky">
 
{| class="comptable3 stickycol1 sortable"
 
|- class="header continued"
 
! Model
 
! Codename
 
! {{abbr|C|Cores}}
 
! {{abbr|T|Threads}}
 
! data-sort-type=number | L2$
 
! data-sort-type=number | L3$
 
! data-sort-type=number | Frequ.
 
! data-sort-type=number | Turbo
 
! data-sort-type=number | Turbo 1C
 
! Memory
 
! data-sort-type=number | {{abbr|TDP}}
 
! data-sort-type=date | Launched
 
! Release<br />Price
 
! {{abbr|OPN}}
 
|- class="separator sortbottom"
 
| colspan=4 | [[Uniprocessors]]
 
| colspan=10 | &nbsp;
 
{{#invoke:comptable|askt
 
|condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::1]]
 
|sort=name |valuesep=,<br /> |template=<nowiki>
 
|-
 
| data-sort-value="{{{name#-}}}" | {{amd|{{{microprocessor family#-}}}}} [[{{{page#-}}}|{{{model number#-}}}]]
 
| {{amd|{{{core name#-}}}|l=core}}
 
| {{{core count}}}
 
| {{{thread count}}}
 
| {{{l2$ size}}}
 
| {{{l3$ size}}}
 
| {{{base frequency#GHz}}}
 
| {{{turbo frequency#GHz}}}
 
| {{{turbo frequency (1 core)#GHz}}}
 
| {{{supported memory type}}}
 
| {{{tdp}}}
 
| {{{first launched}}}
 
| {{#if:{{{release price}}}|{{{release price}}}{{#ifeq:{{{release price}}}|{{{release price (tray)}}}|&#32;(1k)}} }}
 
| {{{part number}}}</nowiki>|outrotemplate=<nowiki>
 
|- class="separator sortbottom"
 
| colspan=4 | [[Multiprocessors]] (dual-socket)
 
| colspan=10 | &nbsp;
 
{{#invoke:comptable|askt
 
|condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::>>1]]
 
|sort=name |valuesep=,<br /> |template=&lt;nowiki>{{{#template}}}&lt;/nowiki>}}</nowiki>}}
 
|-
 
! Count: {{#ask:[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] |format=count}}
 
|}
 
</div>
 
  
 
== Designers ==
 
== Designers ==
Line 243: Line 61:
  
 
== Bibliography ==
 
== Bibliography ==
 
+
{{reflist}}
== References ==
 
<references>
 
<ref name="ryzen-7000-preview">{{cite techdoc|title=Ryzen 7000 Desktop Preview|url=https://www.angstronomics.com/p/ryzen-7000-desktop-preview|publ=Angstronomics|date=2022-08-29}}</ref>
 
<ref name="amd-55901-ppr-1911">{{cite techdoc|title=Processor Programming Reference (PPR) for AMD Family 19h Models 11h, Revision B1 Processors|url=https://www.amd.com/system/files/TechDocs/55901_0.25.zip|publ=AMD|pid=55901|rev=0.25|date=2022-11-10}}</ref>
 
<ref name="amd-57647-zen4-optim">{{cite techdoc|title=Software Optimization Guide for the AMD Zen4 Microarchitecture|url=https://www.amd.com/system/files/TechDocs/57647.zip|publ=AMD|pid=57647|rev=1.00|date=2023-01-06}}</ref>
 
<ref name="amd-58015-9004-overv">{{cite techdoc|title=AMD EPYC™ 9004 Series Architecture Overview|url=https://www.amd.com/system/files/documents/58015-epyc-9004-tg-architecture-overview.pdf|publ=AMD|pid=58015|rev=1.1|date=2022-12}}</ref>
 
</references>
 
  
 
== See Also ==
 
== See Also ==
* AMD {{\\|Zen}}, {{\\|Zen 2}}, {{\\|Zen 3}}
+
* AMD {{\\|Zen}}
 
* Intel {{intel|Meteor Lake|l=arch}}
 
* Intel {{intel|Meteor Lake|l=arch}}

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameZen 4 +
designerAMD +
full page nameamd/microarchitectures/zen 4 +
instance ofmicroarchitecture +
manufacturerTSMC +
microarchitecture typeCPU +
nameZen 4 +
process5 nm (0.005 μm, 5.0e-6 mm) + and 6 nm (0.006 μm, 6.0e-6 mm) +