Difference between revisions of "amd/microarchitectures/zen 5"

	Edit Values
	Zen 5 µarch
	General Info
Arch Type	CPU
Designer	AMD
Manufacturer	TSMC
Introduction	2024
Process	4 nm, N4X
Core Configs	256, 224, 192, 144, 128, 96, 72, 64, 56, 48, 32, 28, 36, 24, 18, 16, 8, 6
PE Configs	512, 448, 384, 288, 256, 192, 144, 128, 112, 96, 64, 56, 60, 40, 30, 20
	Pipeline
Type	Superscalar
OoOE	Yes
Speculative	Yes
Reg Renaming	Yes
	Instructions
ISA	AMD64, x86-64
Extensions	AMX, AVX, AVX2, AVX-512
	Cores
Core Names	Turin,; Da Vinci,; Granite Ridge,; Strix Point
	Succession
	Zen 4 Zen 6

Latest revision as of 19:37, 13 March 2025

Zen 5 is a microarchitecture Already released and sold being by AMD as a successor to Zen 4

History[edit]

Zen 5 was first mentioned by lead architect Michael Clark during a discussion on April 9th, 2018 ^[1]

Codenames[edit]

Product Codenames:

Core	Model	C/T	Target
Turin	EPYC 9005	Up to 192/384	High-end EPYC 5th Gen series server multiprocessors
Turin Dense	EPYC 9005	Up to 192/384
Shimada Peak	Ryzen 9000	Up to ?/?	Threadripper Workstation & enthusiasts market processors
Granite Ridge	Ryzen 9000	Up to ?/?	Mainstream to high-end desktops & enthusiasts market processors (Gaming Desktop CPU)
Fire Range	Ryzen 9000	Up to ?/?
Strix Point	Ryzen AI 300	Up to ?/?	Mainstream desktop & mobile processors with GPU (Gaming APU with RDNA3 or RDNA4)
Strix Halo	Ryzen AI 300	Up to ?/?
Krackan Point	Ryzen AI 300	Up to ?/?
Sonoma Valley	Ryzen APU Family	Up to ?/?	AMD Low-end Ryzen APU Family, Samsung 4 nm (TSMC) (Zen 5c Quad-core CPU, RDNA3 2CU GPU, TDP 35W)

The Zen 5 microarchitecture powers Ryzen 9000 series desktop processors (codenamed "Granite Ridge"), Epyc 9005 server

processors (codenamed "Turin"), and Ryzen AI 300 thin and light mobile processors (codenamed "Strix Point").

AMD Ryzen Series

• AMD Zen 5 • Microarchitectures

Turin • AMD EPYC 9005 Series
Shimada Peak • AMD Ryzen 9000 Series
Granite Ridge • AMD Ryzen 9000 Series
Fire Range • AMD Ryzen 9000 Series
Strix Halo • AMD Ryzen AI 300 Series
Strix Point • AMD Ryzen AI 300 Series
Krackan Point • AMD Ryzen AI 300 Series
Sonoma Valley • AMD Low-end Ryzen APU Family

Architectural Codenames:

Arch	Codename
Core	Nirvana
CCD	Eldora

Comparison

Core		Zen	Zen+	Zen 2	Zen 3	Zen 3+	Zen 4	Zen 4c	Zen 5	Zen 5c	Zen 6	Zen 6c
Codename	Core			Valhalla	Cerberus		Persephone	Dionysus	Nirvana	Prometheus	Morpheus	Monarch
Codename	CCD			Aspen Highlands	Breckenridge		Durango	Vindhya	Eldora
Cores (threads)	CCD				8 (16)		8 (16)	16 (32)
Cores (threads)	CCX				8 (16)		8 (16)	8 (16)
L3 cache	CCD				32 MB		32 MB	32 MB	32 MB
L3 cache	CCX				32 MB		32 MB	16 MB	32 MB
Die size	CCD area	44 mm²					66.3 mm²	72.7 mm²	70.6 mm²
Die size	Core area	7 mm² (14 nm)	(12 nm)	(7 nm)	(7 nm)	(7 nm)	3.84 mm² (5 nm)	2.48 mm² (5 nm)	(4 nm)	(3 nm)	(2 nm)	(2 nm)

Models[edit]

Zen Series
Zen
Zen+
Zen 2 (Valhalla)

Zen 3 (Cerberus)
Zen 3+
Zen 4 (Persephone)
Zen 4c (Dionysus)
Zen 5 (Nirvana)
Zen 5c (Prometheus)
Zen 6 (Morpheus)
Zen 6c (Monarch)

Process Technology[edit]

Zen 5 is to be produced on a 4nm process,Zen 5c is to be produced on a 3nm process.

Architecture[edit]

AMD Zen 5 released in July 2024. The seventh microarchitecture in the Zen microarchitecture series.

Codenamed Granite Ridge, Strix Point, and Turin, it is slated for TSMC 4 nm or 3 nm manufacturing.

LITTLE design

- Improved 16% IPC and clock speed

- possibly more L3 cache per chiplet

Key changes from Zen 4[edit]

Core level (vs. Zen 4 microarchitectures)

Instruction set

AVX-512 VP2INTERSECT support

AVX-VNNI support

Front end

• Branch prediction improvements

- L1 BTB size increased significantly from 1.5K → 16K (10.7x)

- L2 BTB size increases from 7K → 8K

- Increased size of TAGE

- Introduction of 2-ahead predictor structure

- Return stack size increased from 32 → 52 entries (+62.5%)

• Improved instruction cache latency and bandwidth

- Instruction fetch bandwidth increased from 32B → 64B per cycle

- L2 instruction TLB size increased from 512 → 2048 entries (4x)

• Introducing a dual decode pipeline

- Decoder throughput scaled from 4 to 8 (2x4) per cycle (4 per thread, 4 in single thread)

- Op cache throughput expanded from 9 → 12 (2x6) per cycle (6 per thread, 6 for single thread)

- Unlike Intel E-Core, where a single thread can utilize multiple clusters, one cluster is used per SMT thread.

Back end

• Dispatch width of integer operations expanded from 6 → 8

• The size of ROB (reorder buffer) has been expanded from 320 to 448 entries (+40%)

• Integer register file capacity expanded from 192 → 240 entries (+25%)

• Floating point register file capacity expanded from 192 to 384 entries (2x)

• Flag register file capacity expanded to 192 entries

• Increased size of integer scheduler

- Scheduler size expanded from 4x24 (=96) → 88+56 (=144) entries (+50%)

- Adoption of integrated scheduler configuration similar to Intel P-Core

• Increased size of floating point scheduler

- The size of the pre-scheduler queue has been expanded from 64 to 96 entries (+50%).

- Scheduler size expanded from 2x32 (=64) → 3x38 (=114) entries (+78%)

• Number of ALUs increased from 4 → 6 (+50%)

• Number of multiplication units increases from 1 → 3 (3x)

• Number of branch units increased from 2 → 3 (+50%)

• Number of AGU increased from 3 → 4 (+33%)

- Number of loads that can be processed per cycle increased from 3 → 4 (same as 2 for 128 bits or more)

- Number of 128/256 bit stores that can be processed per cycle increased from 1 → 2

Desktop and server products such as Granite Ridge can process AVX-512 SIMD in one cycle.

However, mobile products process 256 bits in two cycles like the previous Zen 4.

Memory subsystem

• Load/Store Queue

- Increased size

• Prefetcher

- Added 2D stride prefetcher

- Improved stream & region prefetcher

• L1 data cache

- Capacity increased from 32 KB → 48 KB

- Associativity increases from 8-way → 12-way

- Bandwidth doubled

• L2 data cache

- Associativity increases from 8-way → 16-way

- Bandwidth increases from 32B → 64B per cycle

• L3 data cache

- Slight improvement in latency

- Maximum number of in-flight misses increased to 320

Physical design

Improved power gating technology

The overall expansion of the architecture has improved performance per clock

by an average of 16% compared to the previous generation.

Designers[edit]

David Suggs, chief architect

Bibliography[edit]

↑ Ryzen Processors: One Year Later

@@ Line 271: / Line 271: @@
 == Architecture ==
-LITTLE design
+AMD Zen 5 released in July [[2024]]. The seventh microarchitecture in the Zen [[microarchitecture]] series.
-- Improved 16% IPC and clock speed
+:Codenamed {{amd|Granite Ridge|l=arch}}, {{amd|Strix Point|l=arch}}, and {{amd|Turin|l=arch}}, it is slated for [[TSMC]] [[4 nm]] or [[3 nm]] manufacturing.
-- possibly more L3 cache per chiplet
+*LITTLE design
+:- Improved 16% IPC and clock speed
+:- possibly more L3 cache per chiplet
 === Key changes from {{\\|Zen 4}} ===
-{{empty section}}
+:;Core level (vs. Zen 4 {{amd|microarchitectures}})
+*Instruction set
+:'''[[AVX-512]]''' ''VP2INTERSECT'' support
+:'''AVX-VNNI''' support
+*Front end
+:• Branch prediction improvements
+:- L1 BTB size increased significantly from 1.5K → 16K (10.7x)
+:- L2 BTB size increases from 7K → 8K'''
+:- Increased size of TAGE
+:- Introduction of 2-ahead predictor structure
+:- Return stack size increased from 32 → 52 entries (+62.5%)
+:• Improved instruction cache latency and bandwidth
+:- Instruction fetch bandwidth increased from 32B → 64B per cycle
+:- L2 instruction TLB size increased from 512 → 2048 entries (4x)
+:• Introducing a dual decode pipeline
+:- Decoder throughput scaled from 4 to 8 (2x4) per cycle (4 per thread, 4 in single thread)
+:- Op cache throughput expanded from 9 → 12 (2x6) per cycle (6 per thread, 6 for single thread)
+:- Unlike [[Intel]] E-Core, where a single thread can utilize multiple clusters, one cluster is used per SMT thread.
+* Back end
+:• Dispatch width of integer operations expanded from 6 → 8
+:• The size of ROB (reorder buffer) has been expanded from 320 to 448 entries (+40%)
+:• Integer register file capacity expanded from 192 → 240 entries (+25%)
+:• Floating point register file capacity expanded from 192 to 384 entries (2x)
+:• Flag register file capacity expanded to 192 entries
+:• Increased size of integer scheduler
+:- Scheduler size expanded from 4x24 (=96) → 88+56 (=144) entries (+50%)
+:- Adoption of integrated scheduler configuration similar to Intel P-Core
+:• Increased size of floating point scheduler
+:- The size of the pre-scheduler queue has been expanded from 64 to 96 entries (+50%).
+:- Scheduler size expanded from 2x32 (=64) → 3x38 (=114) entries (+78%)
+:• Number of ALUs increased from 4 → 6 (+50%)
+:• Number of multiplication units increases from 1 → 3 (3x)
+:• Number of branch units increased from 2 → 3 (+50%)
+:• Number of AGU increased from 3 → 4 (+33%)
+:- Number of loads that can be processed per cycle increased from 3 → 4 (same as 2 for 128 bits or more)
+:- Number of 128/256 bit stores that can be processed per cycle increased from 1 → 2
+:Desktop and server products such as Granite Ridge can process [[AVX-512]] SIMD in one cycle.
+:However, mobile products process 256 bits in two cycles like the previous Zen 4.
+*Memory subsystem
+:• Load/Store Queue
+:- Increased size
+:• Prefetcher
+:- Added 2D stride prefetcher
+:- Improved stream & region prefetcher
+:• L1 data cache
+:- Capacity increased from 32 KB → 48 KB
+:- Associativity increases from 8-way → 12-way
+:- Bandwidth doubled
+:• L2 data cache
+:- Associativity increases from 8-way → 16-way
+:- Bandwidth increases from 32B → 64B per cycle
+:• L3 data cache
+:- Slight improvement in latency
+:- Maximum number of in-flight misses increased to 320
+*Physical design
+:Improved power gating technology
+*The overall expansion of the architecture has improved performance per clock
+:by an average of 16% compared to the previous generation.
 == Designers ==

codename	Zen 5 +
core count	256 +, 224 +, 192 +, 144 +, 128 +, 96 +, 72 +, 64 +, 56 +, 48 +, 32 +, 28 +, 36 +, 24 +, 18 +, 16 +, 8 + and 6 +
designer	AMD +
first launched	2024 +
full page name	amd/microarchitectures/zen 5 +
instance of	microarchitecture +
instruction set architecture	AMD64 + and x86-64 +
manufacturer	TSMC +
microarchitecture type	CPU +
name	Zen 5 +
process	4 nm (0.004 μm, 4.0e-6 mm) +
processing element count	512 +, 448 +, 384 +, 288 +, 256 +, 192 +, 144 +, 128 +, 112 +, 96 +, 64 +, 56 +, 60 +, 40 +, 30 + and 20 +

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas