From WikiChip
Difference between revisions of "amd/microarchitectures/zen 5"
(→See Also) |
(→Architecture) |
||
Line 271: | Line 271: | ||
== Architecture == | == Architecture == | ||
− | LITTLE design | + | AMD Zen 5 released in July [[2024]]. The seventh microarchitecture in the Zen [[microarchitecture]] series. |
− | - Improved 16% IPC and clock speed | + | :Codenamed {{amd|Granite Ridge|l=arch}}, {{amd|Strix Point|l=arch}}, and {{amd|Turin|l=arch}}, it is slated for [[TSMC]] [[4 nm]] or [[3 nm]] manufacturing. |
− | - possibly more L3 cache per chiplet | + | |
+ | *LITTLE design | ||
+ | :- Improved 16% IPC and clock speed | ||
+ | :- possibly more L3 cache per chiplet | ||
=== Key changes from {{\\|Zen 4}} === | === Key changes from {{\\|Zen 4}} === | ||
− | {{ | + | |
+ | :;Core level (vs. Zen 4 {{amd|microarchitectures}}) | ||
+ | *Instruction set | ||
+ | :'''[[AVX-512]]''' ''VP2INTERSECT'' support | ||
+ | :'''AVX-VNNI''' support | ||
+ | *Front end | ||
+ | :• Branch prediction improvements | ||
+ | :- L1 BTB size increased significantly from 1.5K → 16K (10.7x) | ||
+ | :- L2 BTB size increases from 7K → 8K''' | ||
+ | :- Increased size of TAGE | ||
+ | :- Introduction of 2-ahead predictor structure | ||
+ | :- Return stack size increased from 32 → 52 entries (+62.5%) | ||
+ | :• Improved instruction cache latency and bandwidth | ||
+ | :- Instruction fetch bandwidth increased from 32B → 64B per cycle | ||
+ | :- L2 instruction TLB size increased from 512 → 2048 entries (4x) | ||
+ | :• Introducing a dual decode pipeline | ||
+ | :- Decoder throughput scaled from 4 to 8 (2x4) per cycle (4 per thread, 4 in single thread) | ||
+ | :- Op cache throughput expanded from 9 → 12 (2x6) per cycle (6 per thread, 6 for single thread) | ||
+ | :- Unlike [[Intel]] E-Core, where a single thread can utilize multiple clusters, one cluster is used per SMT thread. | ||
+ | * Back end | ||
+ | :• Dispatch width of integer operations expanded from 6 → 8 | ||
+ | :• The size of ROB (reorder buffer) has been expanded from 320 to 448 entries (+40%) | ||
+ | :• Integer register file capacity expanded from 192 → 240 entries (+25%) | ||
+ | :• Floating point register file capacity expanded from 192 to 384 entries (2x) | ||
+ | :• Flag register file capacity expanded to 192 entries | ||
+ | :• Increased size of integer scheduler | ||
+ | :- Scheduler size expanded from 4x24 (=96) → 88+56 (=144) entries (+50%) | ||
+ | :- Adoption of integrated scheduler configuration similar to Intel P-Core | ||
+ | :• Increased size of floating point scheduler | ||
+ | :- The size of the pre-scheduler queue has been expanded from 64 to 96 entries (+50%). | ||
+ | :- Scheduler size expanded from 2x32 (=64) → 3x38 (=114) entries (+78%) | ||
+ | :• Number of ALUs increased from 4 → 6 (+50%) | ||
+ | :• Number of multiplication units increases from 1 → 3 (3x) | ||
+ | :• Number of branch units increased from 2 → 3 (+50%) | ||
+ | :• Number of AGU increased from 3 → 4 (+33%) | ||
+ | :- Number of loads that can be processed per cycle increased from 3 → 4 (same as 2 for 128 bits or more) | ||
+ | :- Number of 128/256 bit stores that can be processed per cycle increased from 1 → 2 | ||
+ | :Desktop and server products such as Granite Ridge can process [[AVX-512]] SIMD in one cycle. | ||
+ | :However, mobile products process 256 bits in two cycles like the previous Zen 4. | ||
+ | *Memory subsystem | ||
+ | :• Load/Store Queue | ||
+ | :- Increased size | ||
+ | :• Prefetcher | ||
+ | :- Added 2D stride prefetcher | ||
+ | :- Improved stream & region prefetcher | ||
+ | :• L1 data cache | ||
+ | :- Capacity increased from 32 KB → 48 KB | ||
+ | :- Associativity increases from 8-way → 12-way | ||
+ | :- Bandwidth doubled | ||
+ | :• L2 data cache | ||
+ | :- Associativity increases from 8-way → 16-way | ||
+ | :- Bandwidth increases from 32B → 64B per cycle | ||
+ | :• L3 data cache | ||
+ | :- Slight improvement in latency | ||
+ | :- Maximum number of in-flight misses increased to 320 | ||
+ | *Physical design | ||
+ | :Improved power gating technology | ||
+ | |||
+ | *The overall expansion of the architecture has improved performance per clock | ||
+ | :by an average of 16% compared to the previous generation. | ||
== Designers == | == Designers == |
Latest revision as of 19:37, 13 March 2025
Edit Values | |
Zen 5 µarch | |
General Info | |
Arch Type | CPU |
Designer | AMD |
Manufacturer | TSMC |
Introduction | 2024 |
Process | 4 nm, N4X |
Core Configs | 256, 224, 192, 144, 128, 96, 72, 64, 56, 48, 32, 28, 36, 24, 18, 16, 8, 6 |
PE Configs | 512, 448, 384, 288, 256, 192, 144, 128, 112, 96, 64, 56, 60, 40, 30, 20 |
Pipeline | |
Type | Superscalar |
OoOE | Yes |
Speculative | Yes |
Reg Renaming | Yes |
Instructions | |
ISA | AMD64, x86-64 |
Extensions | AMX, AVX, AVX2, AVX-512 |
Cores | |
Core Names | Turin, Da Vinci, Granite Ridge, Strix Point |
Succession | |
Zen 5 is a microarchitecture Already released and sold being by AMD as a successor to Zen 4
Contents
History[edit]
Zen 5 was first mentioned by lead architect Michael Clark during a discussion on April 9th, 2018 [1]
Codenames[edit]
Product Codenames:
Core | Model | C/T | Target |
---|---|---|---|
Turin | EPYC 9005 | Up to 192/384 | High-end EPYC 5th Gen series server multiprocessors |
Turin Dense | EPYC 9005 | Up to 192/384 | |
Shimada Peak | Ryzen 9000 | Up to ?/? | Threadripper Workstation & enthusiasts market processors |
Granite Ridge | Ryzen 9000 | Up to ?/? | Mainstream to high-end desktops & enthusiasts market processors (Gaming Desktop CPU) |
Fire Range | Ryzen 9000 | Up to ?/? | |
Strix Point | Ryzen AI 300 | Up to ?/? | Mainstream desktop & mobile processors with GPU (Gaming APU with RDNA3 or RDNA4) |
Strix Halo | Ryzen AI 300 | Up to ?/? | |
Krackan Point | Ryzen AI 300 | Up to ?/? | |
Sonoma Valley | Ryzen APU Family | Up to ?/? | AMD Low-end Ryzen APU Family, Samsung 4 nm (TSMC) (Zen 5c Quad-core CPU, RDNA3 2CU GPU, TDP 35W) |
The Zen 5 microarchitecture powers Ryzen 9000 series desktop processors (codenamed "Granite Ridge"), Epyc 9005 server
- processors (codenamed "Turin"), and Ryzen AI 300 thin and light mobile processors (codenamed "Strix Point").
- Turin • AMD EPYC 9005 Series
- Shimada Peak • AMD Ryzen 9000 Series
- Granite Ridge • AMD Ryzen 9000 Series
- Fire Range • AMD Ryzen 9000 Series
- Strix Halo • AMD Ryzen AI 300 Series
- Strix Point • AMD Ryzen AI 300 Series
- Krackan Point • AMD Ryzen AI 300 Series
- Sonoma Valley • AMD Low-end Ryzen APU Family
Architectural Codenames:
Arch | Codename |
---|---|
Core | Nirvana |
CCD | Eldora |
- Comparison
Core | Zen | Zen+ | Zen 2 | Zen 3 | Zen 3+ | Zen 4 | Zen 4c | Zen 5 | Zen 5c | Zen 6 | Zen 6c | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Codename | Core | Valhalla | Cerberus | Persephone | Dionysus | Nirvana | Prometheus | Morpheus | Monarch | |||
CCD | Aspen Highlands |
Breckenridge | Durango | Vindhya | Eldora | |||||||
Cores (threads) |
CCD | 8 (16) | 8 (16) | 16 (32) | ||||||||
CCX | 8 (16) | 8 (16) | 8 (16) | |||||||||
L3 cache | CCD | 32 MB | 32 MB | 32 MB | 32 MB | |||||||
CCX | 32 MB | 32 MB | 16 MB | 32 MB | ||||||||
Die size | CCD area | 44 mm2 | 66.3 mm2 | 72.7 mm2 | 70.6 mm2 | |||||||
Core area | 7 mm2 (14 nm) |
(12 nm) | (7 nm) | (7 nm) | (7 nm) | 3.84 mm2 (5 nm) |
2.48 mm2 (5 nm) |
(4 nm) | (3 nm) | (2 nm) | (2 nm) |
Models[edit]
- Zen Series
- Zen
- Zen+
- Zen 2 (Valhalla)
Process Technology[edit]
Zen 5 is to be produced on a 4nm process,Zen 5c is to be produced on a 3nm process.
Architecture[edit]
AMD Zen 5 released in July 2024. The seventh microarchitecture in the Zen microarchitecture series.
- Codenamed Granite Ridge, Strix Point, and Turin, it is slated for TSMC 4 nm or 3 nm manufacturing.
- LITTLE design
- - Improved 16% IPC and clock speed
- - possibly more L3 cache per chiplet
Key changes from Zen 4[edit]
- Core level (vs. Zen 4 microarchitectures)
- Instruction set
- AVX-512 VP2INTERSECT support
- AVX-VNNI support
- Front end
- • Branch prediction improvements
- - L1 BTB size increased significantly from 1.5K → 16K (10.7x)
- - L2 BTB size increases from 7K → 8K
- - Increased size of TAGE
- - Introduction of 2-ahead predictor structure
- - Return stack size increased from 32 → 52 entries (+62.5%)
- • Improved instruction cache latency and bandwidth
- - Instruction fetch bandwidth increased from 32B → 64B per cycle
- - L2 instruction TLB size increased from 512 → 2048 entries (4x)
- • Introducing a dual decode pipeline
- - Decoder throughput scaled from 4 to 8 (2x4) per cycle (4 per thread, 4 in single thread)
- - Op cache throughput expanded from 9 → 12 (2x6) per cycle (6 per thread, 6 for single thread)
- - Unlike Intel E-Core, where a single thread can utilize multiple clusters, one cluster is used per SMT thread.
- Back end
- • Dispatch width of integer operations expanded from 6 → 8
- • The size of ROB (reorder buffer) has been expanded from 320 to 448 entries (+40%)
- • Integer register file capacity expanded from 192 → 240 entries (+25%)
- • Floating point register file capacity expanded from 192 to 384 entries (2x)
- • Flag register file capacity expanded to 192 entries
- • Increased size of integer scheduler
- - Scheduler size expanded from 4x24 (=96) → 88+56 (=144) entries (+50%)
- - Adoption of integrated scheduler configuration similar to Intel P-Core
- • Increased size of floating point scheduler
- - The size of the pre-scheduler queue has been expanded from 64 to 96 entries (+50%).
- - Scheduler size expanded from 2x32 (=64) → 3x38 (=114) entries (+78%)
- • Number of ALUs increased from 4 → 6 (+50%)
- • Number of multiplication units increases from 1 → 3 (3x)
- • Number of branch units increased from 2 → 3 (+50%)
- • Number of AGU increased from 3 → 4 (+33%)
- - Number of loads that can be processed per cycle increased from 3 → 4 (same as 2 for 128 bits or more)
- - Number of 128/256 bit stores that can be processed per cycle increased from 1 → 2
- Desktop and server products such as Granite Ridge can process AVX-512 SIMD in one cycle.
- However, mobile products process 256 bits in two cycles like the previous Zen 4.
- Memory subsystem
- • Load/Store Queue
- - Increased size
- • Prefetcher
- - Added 2D stride prefetcher
- - Improved stream & region prefetcher
- • L1 data cache
- - Capacity increased from 32 KB → 48 KB
- - Associativity increases from 8-way → 12-way
- - Bandwidth doubled
- • L2 data cache
- - Associativity increases from 8-way → 16-way
- - Bandwidth increases from 32B → 64B per cycle
- • L3 data cache
- - Slight improvement in latency
- - Maximum number of in-flight misses increased to 320
- Physical design
- Improved power gating technology
- The overall expansion of the architecture has improved performance per clock
- by an average of 16% compared to the previous generation.
Designers[edit]
- David Suggs, chief architect
Bibliography[edit]
See also[edit]
- AMD Zen • Ryzen
- Intel Meteor Lake
|
|
|
|
|
. |
Facts about "Zen 5 - Microarchitectures - AMD"
codename | Zen 5 + |
core count | 256 +, 224 +, 192 +, 144 +, 128 +, 96 +, 72 +, 64 +, 56 +, 48 +, 32 +, 28 +, 36 +, 24 +, 18 +, 16 +, 8 + and 6 + |
designer | AMD + |
first launched | 2024 + |
full page name | amd/microarchitectures/zen 5 + |
instance of | microarchitecture + |
instruction set architecture | AMD64 + and x86-64 + |
manufacturer | TSMC + |
microarchitecture type | CPU + |
name | Zen 5 + |
process | 4 nm (0.004 μm, 4.0e-6 mm) + |
processing element count | 512 +, 448 +, 384 +, 288 +, 256 +, 192 +, 144 +, 128 +, 112 +, 96 +, 64 +, 56 +, 60 +, 40 +, 30 + and 20 + |