Latest revision |
Your text |
Line 4: |
Line 4: |
| |name=Zen 4 | | |name=Zen 4 |
| |designer=AMD | | |designer=AMD |
− | |manufacturer=TSMC | + | |manufacturer=GlobalFoundries |
− | |process=5 nm | + | |process=7 nm+ |
− | |process 2=6 nm
| |
| |predecessor=Zen 3 | | |predecessor=Zen 3 |
| |predecessor link=amd/microarchitectures/zen 3 | | |predecessor link=amd/microarchitectures/zen 3 |
Line 13: |
Line 12: |
| |succession=Yes | | |succession=Yes |
| }} | | }} |
− | '''Zen 4''' is a [[microarchitecture]] developed by [[AMD]] as a successor to {{\\|Zen 3}}. See press release for details: [https://www.amd.com/en/press-releases/2022-08-29-amd-launches-ryzen-7000-series-desktop-processors-zen-4-architecture-the AMD Launches Ryzen 7000 Series Desktop Processors] | + | '''Zen 4''' is a planned [[microarchitecture]] being developed by [[AMD]] as a successor to {{\\|Zen 3}}. |
| | | |
| == History == | | == History == |
− | [[File:next-horizon-zen3-4-roadmap.png|right|thumb|400px|Zen 4 on the roadmap.]]
| + | Zen 5 was first mentioned by Forrest Norrod during AMD's EPYC One Year Anniversary webinar. |
− | Zen 4 was first mentioned by Forrest Norrod during AMD's EPYC One Year Anniversary webinar. During the next horizon event which was held on November 6, 2018, AMD stated that Zen 4 was at the design completion phase.
| |
− | | |
− | == Products ==
| |
− | {{future information}}
| |
− | | |
− | {| class="wikitable"
| |
− | |-
| |
− | ! Processor Series !! Cores/Threads !! Market
| |
− | |-
| |
− | | EPYC 9004 "{{amd|Genoa|l=core}}" || Up to 96/192 || High-end server [[multiprocessors]]
| |
− | |-
| |
− | | Ryzen Threadripper 7000 "{{amd|Storm Peak|l=core}}" || Up to 96/192 || Workstation & enthusiasts
| |
− | |-
| |
− | | Ryzen 7000 "{{amd|Raphael|l=core}}" || Up to 16/32 || Mainstream to high-end desktops & enthusiasts
| |
− | |-
| |
− | | Ryzen 7000 APU "{{amd|Dragon Range|l=core}}" || Up to 16/32 || High-end mobile processors with GPU
| |
− | |-
| |
− | | Ryzen 7000 APU "{{amd|Phoenix Point|l=core}}" || Up to 8/16 || Mainstream desktop & mobile processors with GPU
| |
− | |}
| |
− | | |
− | Cores using variant Zen 4 uarch:
| |
− | | |
− | {| class="wikitable"
| |
− | |-
| |
− | ! Processor Series !! Cores/Threads !! Market
| |
− | |-
| |
− | | EPYC 9004 "{{amd|Bergamo|l=core}}" || Up to 128/256 || Cloud [[multiprocessors]] (smaller, almost half-size Zen 4c [referred to as “Zen 4D” in leaks] core sacrificing half of the L3 cache.)
| |
− | |-
| |
− | | EPYC 8004 "{{amd|Siena|l=core}}" || Up to 64/128 || Edge-optimized server chips
| |
− | |}
| |
− | | |
− | '''Architectural Codenames:'''
| |
− | {| class="wikitable"
| |
− | |-
| |
− | ! Arch !! Codename
| |
− | |-
| |
− | | Core || Persephone
| |
− | |-
| |
− | | {{abbr|CCD}} || Durango
| |
− | |}
| |
| | | |
| == Process Technology == | | == Process Technology == |
− | Processors implementing Zen 4 are {{abbr|SoC}}s configured as a Multi-Chip Module or monolithic chip. MCMs consist of a single I/O die and up to 12 Core Complex Dies attached with full-duplex serial point-to-point links. The IOD contains memory controllers, I/O controllers, microcontrollers for security purposes and power management, and other peripherals. The CCDs communicate with peripherals and each other through the Data and Control Fabrics on the I/O die, and each contain a single Core Complex (CCX). The monolithic chips integrate a subset of the IOD facilities and additional peripherals tailored for their target market, a CCX, and a GPU. A CCX contains 8 CPU cores (fewer may be usable on some models) communicating through a shared L3 cache.
| + | Zen 4 is speculated to be produced on enhanced [[7nm process|7nm+ process]]. |
− | | |
− | ("Bergamo" processors configuration TBD.)
| |
| | | |
− | The chips are fabricated by [[TSMC]], CCDs and monolithic chips on a [[5 nm]] node, IODs on a [[6 nm]] node.
| + | == Codenames == |
| + | {{empty section}} |
| | | |
| == Architecture == | | == Architecture == |
− | Zen 4 is a 64-bit superscalar, out-of-order, 2-way [[SMT]] microarchitecture with advanced dynamic branch prediction, 4-way decoding of [[x86]] instructions with a stack optimizer, multiple caches including an Op cache for decoded instructions and prefetchers for code and data, four integer/address and two floating point instruction schedulers, 3-way address generation, 5-way integer execution. 4-way 256-bit wide floating point execution, a speculative, out-of-order load/store unit capable of up to three loads or two stores per cycle with a 48/88-entry load and 64-entry store queue, write-combining, and 5-level paging with four {{abbr|TLB}}s and six hardware page table walkers.
| + | Nothing is currently known about the architectural improvements that are being done to Zen 5. |
− | | |
− | === Key changes from {{\\|Zen 3}} ===
| |
− | * {{x86|AVX-512}} instructions support, 256-bit data path<ref name="ryzen-7000-preview"/>
| |
− | * L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
| |
− | * Op cache size increased from 4,096 to 6,912 Ops per core
| |
− | * L2 cache doubled from 512 KiB to 1 MiB per core (not all processor models), latency increased from 12 to 14 cycles minimum
| |
− | * L3 cache average load-to-use latency increased from 46 to 50 cycles
| |
− | * Five-level paging; Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
| |
− | * Improved cache load, write and prefetch from/to register (less latency)
| |
− | * Higher Transistor Density, due to 5nm process
| |
− | * Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
| |
− | * Larger integer register file (from 192 to 224), floating-point register file (from 160 to 192) and reorder buffer (from 256 to 320 entries)
| |
− | * REPE CMPSB (sometimes used to implement string comparison) is significantly sped up, processes more than 32 bytes/cycle when operating on L1 data.
| |
− | * BSF, BSR, and BMI1 instructions BLSI, BLSMSK, BLSR, TZCNT have smaller latency of 1 and x2 throughput (4 insn/cycle).
| |
− | * Latency and/or throughput of VPERMx, V[P]BROADCASTx, VPMOV{S,Z}Xx instructions improved.
| |
− | * Some ALU operations on vector registers increased throughput from 2 to 3 ops/cycle.
| |
− | * Some ALU operations on vector registers (VPABSx,VPHADDx,VPHSUBx,VPSLLx,VPSRLx,VPSRAx,VPACKx,VPSIGNx,VMAXx,VMINx) increased latency by 1 cycle.
| |
− | | |
− | | |
− | Package level changes:
| |
− | * EPYC 9004 "{{amd|Genoa|l=core}}": Max. core/thread count 96/192, up from 64/128 on EPYC 7003 "{{amd|Milan|l=core}}"
| |
− | * EPYC "{{amd|Bergamo|l=core}}": Max. 128 cores but preliminary data shows a slightly altered architecture featuring cores that take up less space
| |
− | * Support for DDR5 memory and PCIe Gen 5
| |
− | * New sockets {{amd|AM5|l=pack}} (client), {{amd|SP5|l=pack}} and {{amd|SP6|l=pack}} (server), {{amd|FP7|FP7/FP7r2|l=pack}} (mobile)
| |
− | * {{abbr|APU}}s: RDNA2-based iGPU with 2 compute units (128 stream processors)
| |
− | | |
− | === New Instructions ===
| |
− | Zen 4 introduced the following ISA enhancements:
| |
− | | |
− | * {{x86|AVX-512}} - 512-bit Vector Instructions
| |
− | ** {{x86|AVX512F}} - Foundation (first introduced with [[Intel]] {{intel|skylake (server)|Skylake|l=arch}})
| |
− | ** {{x86|AVX512CD}} - Conflict Detection Instructions ({{intel|Skylake X|l=core}})
| |
− | ** {{x86|AVX512VL}} - Vector Length Extensions (Skylake X)
| |
− | ** {{x86|AVX512DQ}} - Doubleword and Quadword Instructions (Skylake X)
| |
− | ** {{x86|AVX512BW}} - Byte and Word Instructions (Skylake X)
| |
− | ** {{x86|AVX512_IFMA}} - Integer Fused Multiply-Add ({{intel|Cannon Lake|l=arch}})
| |
− | ** {{x86|AVX512_VBMI}} - Vector Bit Manipulation Instructions (Cannon Lake)
| |
− | ** {{x86|AVX512_VPOPCNTDQ}} - Vector Population Count Instructions ({{intel|ice lake (server)|Ice Lake|l=arch}})
| |
− | ** {{x86|AVX512_BITALG}} - Bit Algorithms (Ice Lake)
| |
− | ** {{x86|AVX512_VBMI2}} - Vector Bit Manipulation Instructions 2 (Ice Lake)
| |
− | ** {{x86|AVX512_VNNI}} - Vector Neural Network Instructions (Ice Lake)
| |
− | ** {{x86|AVX512_BF16}} - [[bfloat16|BFloat16]] Instructions ({{intel|Cooper Lake|l=arch}})
| |
− | ** ''Not supported'': AVX512ER, AVX512PF ({{intel|Knights Landing|l=arch}}); AVX512 4VNNIW, 4FMAPS ({{intel|Knights Mill|l=arch}}); VP2INTERSECT ({{intel|Tiger Lake|l=arch}}); FP16 ({{intel|Sapphire Rapids|l=arch}})
| |
− | * {{x86|GFNI}} - Galois Field New Instructions (first introduced with [[Intel]] {{intel|ice lake (server)|Ice Lake|l=arch}})
| |
− | ** <code>VGF2P8AFFINEQB</code> - Galois field affine transformation
| |
− | ** <code>VGF2P8AFFINEINVQB</code> - Galois field affine transformation inverse
| |
− | ** <code>VGF2P8MULB</code> - Galois field multiply bytes
| |
− | | |
− | === Memory Hierarchy ===
| |
− | ==== Data and Instruction Caches ====
| |
− | * L0 Op Cache:
| |
− | ** Up to 6,912 Ops per core, 12-way set associative
| |
− | ** 9 Op line size (restrictions apply depending on instruction type)
| |
− | ** Parity protected
| |
− | * L1I Cache:
| |
− | ** 32 KiB per core, 8-way set associative
| |
− | ** 64 B line size
| |
− | ** Parity protected
| |
− | * L1D Cache:
| |
− | ** 32 KiB per core, 8-way set associative
| |
− | ** 64 B line size
| |
− | ** Write-back policy
| |
− | ** 4-5 cycles latency for Int
| |
− | ** 7-8 cycles latency for FP
| |
− | ** ECC
| |
− | * L2 Cache:
| |
− | ** 512 KiB or 1 MiB per core (varies by processor model), 8-way set associative
| |
− | ** 64 B line size
| |
− | ** Write-back policy
| |
− | ** Inclusive of L1
| |
− | ** ≥ 14 cycles latency
| |
− | ** {{abbr|DEC-TED}} ECC, tag & state arrays {{abbr|SEC-DED}}<!--7 check bits for 42 tag bits; AMD-55901-0.97 Sec 3.5-->
| |
− | * L3 Cache:
| |
− | ** "{{amd|Genoa|l=core}}": up to 32 MiB/{{abbr|CCX}} (8 cores), up to 384 MiB total
| |
− | ** Shared by all cores in the CCX, configurable
| |
− | ** 16-way set associative
| |
− | ** 64 B line size
| |
− | ** L2 [[victim cache]]
| |
− | ** Write-back policy
| |
− | ** 50 cycles average load-to-use latency
| |
− | ** DEC-TED ECC, tag array & shadow tags SEC-DED<!--AMD-55901-0.97 Sec 3.5-->
| |
− | ** QoS Monitoring and Enforcement with {{abbr|BMEC|Bandwidth Monitoring Event Configuration}}, {{abbr|L3RR|L3 Range Reservation}}, {{abbr|L3SBE|L3 External Slow Memory Bandwidth Enforcement}}
| |
− | | |
− | ==== Translation Lookaside Buffers ====
| |
− | * ITLB
| |
− | ** 64 entry L1 TLB, fully associative
| |
− | *** 4-Kbyte, 2-Mbyte, 1-Gbyte page sizes
| |
− | ** 512 entry L2 TLB, 8-way set associative
| |
− | *** 4-Kbyte, 2-Mbyte, and 4-Mbyte pages
| |
− | ** Parity protected
| |
− | * DTLB
| |
− | ** 72 entry L1 TLB, fully associative
| |
− | *** 4-Kbyte, 16-Kbyte, 2-Mbyte, 1-Gbyte page sizes
| |
− | ** 3,072 entry L2 TLB, 24-way set associative
| |
− | *** 4-Kbyte, 16-Kbyte, 2-Mbyte, and 4-Mbyte pages, {{abbr|PDE|Page Directory Entry}}s to speed up table walks
| |
− | ** Parity protected
| |
− | | |
− | 4-Mbyte pages require two 2-Mbyte entries in all TLBs. 16-Kbyte page size refers to {{abbr|PTE|Page Table Entry}} coalescing of four physically consecutive and 16-Kbyte aligned 4-Kbyte pages. All caches and TLBs are competitively shared in multi-threaded mode.
| |
− | | |
− | ==== System DRAM ====
| |
− | * Ryzen 7000 "{{amd|Raphael|l=core}}":
| |
− | ** Up to PC5-41600 (DDR5-5200) without overclocking
| |
− | | |
− | * EPYC 9004 "{{amd|Genoa|l=core}}":
| |
− | ** 12 channels per socket, two 40-bit (32 data, 8 ECC) DDR5 subchannels per channel
| |
− | ** Up to 24 DIMMs, max. 6 TiB
| |
− | ** Up to PC5-38400 (DDR5-4800)
| |
− | ** {{abbr|SR}}/{{abbr|DR}} {{abbr|RDIMM}}, {{abbr|4R}}/{{abbr|8R}} {{abbr|LRDIMM}}, {{abbr|3DS DIMM}}
| |
− | ** ECC supported (x4, x8, x16, chipkill)<!--AMD-55901-0.97 Sec 3.7-->
| |
− | ** DRAM bus parity and write data CRC options<!--ibid-->
| |
− | | |
− | Sources: <ref name="amd-55901-ppr-1911"/><ref name="amd-57647-zen4-optim"/><ref name="amd-58015-9004-overv"/>
| |
− | | |
− | == All Zen 4 Processors ==
| |
− | <!-- NOTE:
| |
− | This table is generated automatically from the data in the actual articles.
| |
− | If a microprocessor is missing from the list, an appropriate article for it needs to be
| |
− | created and tagged accordingly.
| |
− | Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
| |
− | -->
| |
− | {| class="comptable3"
| |
− | ! List of all Zen 4-based Processors
| |
− | |}
| |
− | <div class="comptable-scroller sticky">
| |
− | {| class="comptable3 stickycol1 sortable"
| |
− | |- class="header continued"
| |
− | ! Model
| |
− | ! Codename
| |
− | ! {{abbr|C|Cores}}
| |
− | ! {{abbr|T|Threads}}
| |
− | ! data-sort-type=number | L2$
| |
− | ! data-sort-type=number | L3$
| |
− | ! data-sort-type=number | Frequ.
| |
− | ! data-sort-type=number | Turbo
| |
− | ! data-sort-type=number | Turbo 1C
| |
− | ! Memory
| |
− | ! data-sort-type=number | {{abbr|TDP}}
| |
− | ! data-sort-type=date | Launched
| |
− | ! Release<br />Price
| |
− | ! {{abbr|OPN}}
| |
− | |- class="separator sortbottom"
| |
− | | colspan=4 | [[Uniprocessors]]
| |
− | | colspan=10 |
| |
− | {{#invoke:comptable|askt
| |
− | |condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::1]]
| |
− | |sort=name |valuesep=,<br /> |template=<nowiki>
| |
− | |-
| |
− | | data-sort-value="{{{name#-}}}" | {{amd|{{{microprocessor family#-}}}}} [[{{{page#-}}}|{{{model number#-}}}]]
| |
− | | {{amd|{{{core name#-}}}|l=core}}
| |
− | | {{{core count}}}
| |
− | | {{{thread count}}}
| |
− | | {{{l2$ size}}}
| |
− | | {{{l3$ size}}}
| |
− | | {{{base frequency#GHz}}}
| |
− | | {{{turbo frequency#GHz}}}
| |
− | | {{{turbo frequency (1 core)#GHz}}}
| |
− | | {{{supported memory type}}}
| |
− | | {{{tdp}}}
| |
− | | {{{first launched}}}
| |
− | | {{#if:{{{release price}}}|{{{release price}}}{{#ifeq:{{{release price}}}|{{{release price (tray)}}}| (1k)}} }}
| |
− | | {{{part number}}}</nowiki>|outrotemplate=<nowiki>
| |
− | |- class="separator sortbottom"
| |
− | | colspan=4 | [[Multiprocessors]] (dual-socket)
| |
− | | colspan=10 |
| |
− | {{#invoke:comptable|askt
| |
− | |condition=[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] [[max cpu count::>>1]]
| |
− | |sort=name |valuesep=,<br /> |template=<nowiki>{{{#template}}}</nowiki>}}</nowiki>}}
| |
− | |-
| |
− | ! Count: {{#ask:[[Category:microprocessor models by amd]] [[microarchitecture::Zen 4]] |format=count}}
| |
− | |}
| |
− | </div>
| |
− | | |
− | == Designers ==
| |
− | * Mike Clark(?), chief architect
| |
| | | |
− | == Bibliography == | + | === Key changes from {{\\|Zen 4}} === |
| + | {{empty section}} |
| | | |
| == References == | | == References == |
− | <references>
| + | {{reflist}} |
− | <ref name="ryzen-7000-preview">{{cite techdoc|title=Ryzen 7000 Desktop Preview|url=https://www.angstronomics.com/p/ryzen-7000-desktop-preview|publ=Angstronomics|date=2022-08-29}}</ref>
| |
− | <ref name="amd-55901-ppr-1911">{{cite techdoc|title=Processor Programming Reference (PPR) for AMD Family 19h Models 11h, Revision B1 Processors|url=https://www.amd.com/system/files/TechDocs/55901_0.25.zip|publ=AMD|pid=55901|rev=0.25|date=2022-11-10}}</ref>
| |
− | <ref name="amd-57647-zen4-optim">{{cite techdoc|title=Software Optimization Guide for the AMD Zen4 Microarchitecture|url=https://www.amd.com/system/files/TechDocs/57647.zip|publ=AMD|pid=57647|rev=1.00|date=2023-01-06}}</ref>
| |
− | <ref name="amd-58015-9004-overv">{{cite techdoc|title=AMD EPYC™ 9004 Series Architecture Overview|url=https://www.amd.com/system/files/documents/58015-epyc-9004-tg-architecture-overview.pdf|publ=AMD|pid=58015|rev=1.1|date=2022-12}}</ref>
| |
− | </references>
| |
| | | |
| == See Also == | | == See Also == |
− | * AMD {{\\|Zen}}, {{\\|Zen 2}}, {{\\|Zen 3}} | + | * AMD {{\\|Zen}} |
− | * Intel {{intel|Meteor Lake|l=arch}} | + | * Intel {{intel|Alder lake|l=arch}} |