From WikiChip
Difference between revisions of "amd/microarchitectures/zen"
< amd‎ | microarchitectures

(Broad Overview)
(Pipeline)
Line 229: Line 229:
 
=== Pipeline ===
 
=== Pipeline ===
 
[[File:amd zen hc28 page 0004.jpg|525px|right]]
 
[[File:amd zen hc28 page 0004.jpg|525px|right]]
Zen presents a major design departure from the previous couple of [[microarchitecture]]s. In the pursuit of remaining competitive against [[Intel]], AMD went with a similar approach to Intel's: large beefier core with SoC design that can scale from extremely low [[TDP]] ([[fanless]] devices) to [[supercomputers]] utilizing dozens of cores. As such, Zen is aimed at replacing both {{\\|Excavator}} (AMD's previous performance microarchitecture) and {{\\|Puma}} (AMD's previous ultra-low power arch). In addition to covering the entire computing spectrum through power efficiency and [[core]] [[scalability]], another major design goal was 40% uplift in single-thread performance (i.e. 40% IPC increase) from {{\\|Excavator}}. The large increase in performance is the result of major redesigns in all four areas of the core (the front end, the execution engine, and the memory subsystem) as well as Zen's new [[SoC]] CCX (CPU Complex) modular design. The improvement in power efficiency is the result of the [[14 nm process]] used as well as many low-power design methodologies that were utilized early on in the design process.
+
Zen presents a major design departure from the previous couple of [[microarchitecture]]s. In the pursuit of remaining competitive against [[Intel]], AMD went with a similar approach to Intel's: large beefier core with SoC design that can scale from extremely low [[TDP]] ([[fanless]] devices) to [[supercomputers]] utilizing dozens of cores. As such, Zen is aimed at replacing both {{\\|Excavator}} (AMD's previous performance microarchitecture) and {{\\|Puma}} (AMD's previous ultra-low power arch). In addition to covering the entire computing spectrum through power efficiency and [[core]] [[scalability]], another major design goal was 40% uplift in single-thread performance (i.e. 40% IPC increase) from {{\\|Excavator}}. The large increase in performance is the result of major redesigns in all four areas of the core (the front end, the execution engine, and the memory subsystem) as well as Zen's new [[SoC]] CCX (CPU Complex) modular design. The improvement in power efficiency is the result of the [[14 nm process]] used as well as many low-power design methodologies that were utilized early on in the design process (Excavator has been manufactured on [[globalfoundries|GF's]] [[28 nm process]]).
  
 
==== Broad Overview ====
 
==== Broad Overview ====

Revision as of 00:11, 30 January 2017

Edit Values
Zen µarch
General Info
Arch TypeCPU
DesignerAMD
ManufacturerGlobalFoundries
Introduction2017
Process14 nm
Core Configs2, 4, 8, 16, 32
Pipeline
TypeSuperscalar
SpeculativeYes
Reg RenamingYes
Instructions
ISAx86-16, x86-32, x86-64
ExtensionsMOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, RDRND, F16C, BMI, BMI2, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVE, SHA, CLZERO
Cache
L1I Cache64 KiB/core
4-way set associative
L1D Cache32 KiB/core
8-way set associative
L2 Cache512 KiB/core
8-way set associative
L3 Cache2 MiB/core
Up to 16-way set associative
Cores
Core NamesRaven Ridge,
Summit Ridge,
Snowy Owl,
Naples
Succession

Zen (family 17h) is the microarchitecture developed by AMD as a successor to both Excavator and Puma. Zen is an entirely new design, built from the ground up for optimal balance of performance and power capable of covering the entire computing spectrum from fanless notebooks to high-performance desktop computers. Zen is set to be released in early-2017. Zen is set to be eventually replaced by Zen+.

Etymology

Zen was picked by Michael Clark, AMD's senior fellow and lead architect. Zen was picked to represent the balance needed between the various competing aspects of a microprocessor - transistor allocation/die size, clock/frequency restriction, power limitations, and new instructions to implement.

Codenames

DIL16 Blank.svg Preliminary Data! Information presented in this article deal with a microprocessor or chip that was recently announced or leaked, thus missing information regarding its features and exact specification. Information may be incomplete and can change by final release.
Zen Logo
Core C/T Target
Naples 32/64 High-end server multiprocessors
Snowy Owl 16/32 Mid-range server processors
Summit Ridge 8/16 High-end desktops & enthusiasts market
Raven Ridge 4/8 Mainstream desktop & mobile processors with GPU

Brands

New text document.svg This section is empty; you can help add the missing info by editing this page.

Release Dates

The first set of processors, as part of the Ryzen family is expected to be officially launched before the end of Q1 - likely mid-February 2017 before the Game Developer Conference (GDC). Server processors are set to be released in by the end of Q2, 2017. Mobile processors are expected to be released by the end of 2017.

Process Technology

Zen is planned to be manufactured on Global Foundries' 14 nm process. AMD's previous microarchitectures were based on 32 and 28 nanometer processes. The jump to 14 nm is part of AMD attempt to remain competitive against Intel (Both SkyLake and Kaby Lake are also manufactured on 14 nm although by 2017 Intel plans on moving on to Cannonlake and 10 nm process). The move to 14 nm will bring along related benefits of a smaller node such as reduced heat and power consumption for identical designs.

Compatibility

Microsoft announced that only Windows 10 will have support for Zen. Linux added initial support for Zen starting with Linux Kernel 4.1.

Vendor OS Version Notes
Microsoft Windows Windows 7 No Support
Windows 8 No Support
Windows 10 Support
Linux Linux Kernel 4.1 Initial Support

Compiler support

Compiler Arch-Specific Arch-Favorable
GCC -march=znver1 -mtune=znver1
LLVM -march=znver1 -mtune=znver1
Visual Studio /arch:AVX2  ?

Architecture

AMD Zen is an entirely new design from the ground up which introduces considerable amount of improvements and design changes over Excavator. Zen-based microprocessors will utilize AMD's Socket AM4 unified platform.

Key changes from Excavator

  • Zen was designed to succeed BOTH Excavator (High-performance) and Puma (Low-power) covering the entire range in one architecture
    • Cover the entire spectrum from fanless notebooks to high-performance desktops
    • More aggressive clock gating with multi-level regions
    • Power focus from design, employs low-power design methodologies
  • Utilizes 14 nm process (from 28 nm)
  • 40% improvement in IPC per core per single-thread (From Excavator)
  • Core engine
    • SMT support, 2 threads/core
    • Improved branch mispredictions
      • Better branch predicitons with 2 branches per BTB entry
      • Lower miss latency penalty
    • Large Op cache
    • Wider μop dispatch (6, up from 4)
    • Larger instruction scheduler
      • Integer (84, up form 48)
      • Floating Point (96, up form 60)
    • Larger retire throughput (8, up from 4)
    • Larger Retire Queue (192, up from 128)
    • Larger Load Queue (72, up from 44)
    • Larger Store Queue (44, up from 32)
    • Quad-issue FPU
  • Cache system
    • Write-back L1 cache eviction policy (From write-through)
    • Faster L2 cache
    • Faster L3 cache
    • Large Op cache
    • Faster Load to FPU (down to 7, from 9 cycles)
    • Better L1$ and L2$ data prefetcher
    • 2x the L1 and L2 bandwidth
    • 5x L3 bandwidth
    • Move elimination block added

New instructions

Zen introduced a number of new x86 instructions:

  • ADX - Multi-Precision Add-Carry Instruction extension
  • RdRand - Hardware-based RNG
  • SMAP - Supervisor Mode Access Prevention
  • SHA - SHA extensions
  • CLFLUSHOPT - Flush Cache Line
  • XSAVE - Privileged Save/Restore
  • CLZERO - Zero-out Cache Line (AMD exclusive)

Block Diagram

Individual Core

DIL16 Blank.svg Preliminary Data! Information presented in this article deal with a microprocessor or chip that was recently announced or leaked, thus missing information regarding its features and exact specification. Information may be incomplete and can change by final release.
zen block diagram.svg

Memory Hierarchy

  • Cache
    • L1I Cache:
      • 64 KiB 4-way set associative
        • 32 B line size
        • shared by the two threads, per core
    • L1D Cache:
      • 32 KiB 8-way set associative
        • 32 B line size
        • write-back policy
    • L2 Cache:
      • 512 KiB 8-way set associative
      • 32 B line
      • write-back policy
    • L3 Cache:
      • 2 MiB/core, shared across all cores.
      • Up to 16-way set associative
      • Write-back policy
    • System DRAM:
      • 2 Channels

Zen TLB consists of dedicated level one TLB for instruction cache and another one for data cache. Additionally there is a unified second level TLB.

  • TLBs
    • BP TLB
      • 8 entry L0 TLB, all page sizes
      • 64 entry L1 TLB, all page sizes
      • 512 entry L2 TLB, no 1G pages
    • DTLB
      • 64 entry, all page sizes
    • STLB
      • 1.5K entry, no 1G pages

Pipeline

amd zen hc28 page 0004.jpg

Zen presents a major design departure from the previous couple of microarchitectures. In the pursuit of remaining competitive against Intel, AMD went with a similar approach to Intel's: large beefier core with SoC design that can scale from extremely low TDP (fanless devices) to supercomputers utilizing dozens of cores. As such, Zen is aimed at replacing both Excavator (AMD's previous performance microarchitecture) and Puma (AMD's previous ultra-low power arch). In addition to covering the entire computing spectrum through power efficiency and core scalability, another major design goal was 40% uplift in single-thread performance (i.e. 40% IPC increase) from Excavator. The large increase in performance is the result of major redesigns in all four areas of the core (the front end, the execution engine, and the memory subsystem) as well as Zen's new SoC CCX (CPU Complex) modular design. The improvement in power efficiency is the result of the 14 nm process used as well as many low-power design methodologies that were utilized early on in the design process (Excavator has been manufactured on GF's 28 nm process).

Broad Overview

At a very broad view, Zen shares some similarities with its predecessor but introduces new elements and major changes. Each core is composed of a front end (in-order area) that fetches instructions, decodes them, generates µOPs and fused µOPs, and sends them to the Execution Engine (out-of-order section). Instructions are either fetched from the L1I$ or the µOPs cache (on subsequent fetches). Zen decodes 4 instructions/cycle into the µOP Queue. The µOP Queue dispatches separate µOPs to the Integer side and the FP side.

Unlike many of Intel's recent microarchitectures (such as Skylake and Kaby Lake) which make use of a unified scheduler, AMD continue to use a split pipeline design. µOP are decoupled at the µOP Queue and are sent through the two distinct pipelines to either the Integer side or the FP side. The two sections are completely separate, each featuring separate schedulers, queues, and execution units. The Integer side splits up the µOPs via a set of individual schedulers that feed the various ALU units. On the floating point side, there is a different scheduler to handle the 128-bit FP operations.

Data is fed into the execution units from the L1D$ via the load and store queue via the two Address Generation Units (AGUs) at the rate of 2 loads and 1 store per cycle. Each core also has a 512 KiB level 2 cache. L2 is connected to the L3 cache which is shared across all cores.

amd zen hc28 overview.png

Front End

New text document.svg This section is empty; you can help add the missing info by editing this page.

Execution Engine

New text document.svg This section is empty; you can help add the missing info by editing this page.
Integer
New text document.svg This section is empty; you can help add the missing info by editing this page.
Floating Point
New text document.svg This section is empty; you can help add the missing info by editing this page.

Memory Subsystem

New text document.svg This section is empty; you can help add the missing info by editing this page.

Sockets/Platform

All Zen-based microprocessors utilizes AMD's Socket AM4, a unified socket infrastructure.

Socket AM4 Platform [Edit]
Segment Chipset USB SATA SATAe PCIe RAID Dual PCIe Overclocking
3.1 G1 3.1 G2 2.0
400-series (Zen+)
Mainstream B450 6 2 6 4 + 2x NVMe 1 6x Gen2 0,1,10
Enthusiast X470 6 2 6 6 + 2x NVMe 2 8x Gen2 0,1,10
300-series (Zen)
Small Form Factor A300, B300 4 0 0 2 + 2x NVMe 1 4x Gen3 0,1
X300 4 0 0 2 + 2x NVMe 1 4x Gen3 0,1
Entry-level A320 6 1 6 4 + 2x NVMe 2 4x Gen2 0,1,10
Mainstream B350 6 2 6 4 + 2x NVMe 2 6x Gen2 0,1,10
Enthusiast X370 6 2 6 6 + 2x NVMe 2 8x Gen2 0,1,10

Die Shot

New text document.svg This section is empty; you can help add the missing info by editing this page.

All Zen Chips

Zen Chips
ModelFamilyCoreLaunchedPower DissipationFreqMax Mem
200GEAthlonRaven Ridge6 September 20183,200 MHz
3.2 GHz
3,200,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
220GEAthlonRaven Ridge21 December 20183,400 MHz
3.4 GHz
3,400,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
240GEAthlonRaven Ridge21 December 20183,500 MHz
3.5 GHz
3,500,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
3000GAthlonDali Raven Ridge20 November 20193,500 MHz
3.5 GHz
3,500,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
300UAthlonPicasso6 January 20192,400 MHz
2.4 GHz
2,400,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
3150UAthlon GoldDali6 January 20202,400 MHz
2.4 GHz
2,400,000 kHz
32,768 MiB
33,554,432 KiB
34,359,738,368 B
32 GiB
0.0313 TiB
PRO 200GEAthlonRaven Ridge6 September 20183,200 MHz
3.2 GHz
3,200,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
3050UAthlon SilverDali6 January 20202,300 MHz
2.3 GHz
2,300,000 kHz
32,768 MiB
33,554,432 KiB
34,359,738,368 B
32 GiB
0.0313 TiB
7251EPYCNaples20 June 20172,100 MHz
2.1 GHz
2,100,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7261EPYCNaples14 June 20182,500 MHz
2.5 GHz
2,500,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7281EPYCNaples20 June 20172,100 MHz
2.1 GHz
2,100,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7301EPYCNaples20 June 20172,200 MHz
2.2 GHz
2,200,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7351EPYCNaples20 June 20172,400 MHz
2.4 GHz
2,400,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7351PEPYCNaples20 June 20172,400 MHz
2.4 GHz
2,400,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7371EPYCNaples20193,100 MHz
3.1 GHz
3,100,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7401EPYCNaples20 June 20172,000 MHz
2 GHz
2,000,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7401PEPYCNaples20 June 20172,000 MHz
2 GHz
2,000,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7451EPYCNaples20 June 20172,300 MHz
2.3 GHz
2,300,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7501EPYCNaples20 June 20172,000 MHz
2 GHz
2,000,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7551EPYCNaples20 June 20172,000 MHz
2 GHz
2,000,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7551PEPYCNaples20 June 20172,000 MHz
2 GHz
2,000,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
7601EPYCNaples20 June 20172,200 MHz
2.2 GHz
2,200,000 kHz
2,097,152 MiB
2,147,483,648 KiB
2,199,023,255,552 B
2,048 GiB
2 TiB
3101EPYC EmbeddedSnowy Owl21 February 20182,100 MHz
2.1 GHz
2,100,000 kHz
524,288 MiB
536,870,912 KiB
549,755,813,888 B
512 GiB
0.5 TiB
3151EPYC EmbeddedSnowy Owl21 February 20182,700 MHz
2.7 GHz
2,700,000 kHz
524,288 MiB
536,870,912 KiB
549,755,813,888 B
512 GiB
0.5 TiB
3201EPYC EmbeddedSnowy Owl21 February 20181,500 MHz
1.5 GHz
1,500,000 kHz
524,288 MiB
536,870,912 KiB
549,755,813,888 B
512 GiB
0.5 TiB
3251EPYC EmbeddedSnowy Owl21 February 20182,500 MHz
2.5 GHz
2,500,000 kHz
524,288 MiB
536,870,912 KiB
549,755,813,888 B
512 GiB
0.5 TiB
3255EPYC EmbeddedSnowy Owl2,500 MHz
2.5 GHz
2,500,000 kHz
524,288 MiB
536,870,912 KiB
549,755,813,888 B
512 GiB
0.5 TiB
3301EPYC EmbeddedSnowy Owl21 February 20182,000 MHz
2 GHz
2,000,000 kHz
1,048,576 MiB
1,073,741,824 KiB
1,099,511,627,776 B
1,024 GiB
1 TiB
3351EPYC EmbeddedSnowy Owl21 February 20181,900 MHz
1.9 GHz
1,900,000 kHz
1,048,576 MiB
1,073,741,824 KiB
1,099,511,627,776 B
1,024 GiB
1 TiB
3401EPYC EmbeddedSnowy Owl21 February 20181,850 MHz
1.85 GHz
1,850,000 kHz
1,048,576 MiB
1,073,741,824 KiB
1,099,511,627,776 B
1,024 GiB
1 TiB
3451EPYC EmbeddedSnowy Owl21 February 20182,150 MHz
2.15 GHz
2,150,000 kHz
1,048,576 MiB
1,073,741,824 KiB
1,099,511,627,776 B
1,024 GiB
1 TiB
FireFlight3 August 20183,000 MHz
3 GHz
3,000,000 kHz
8,192 MiB
8,388,608 KiB
8,589,934,592 B
8 GiB
0.00781 TiB
1200Ryzen 3Summit Ridge27 July 20173,100 MHz
3.1 GHz
3,100,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
1300XRyzen 3Summit Ridge27 July 20173,500 MHz
3.5 GHz
3,500,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
2200GRyzen 3Raven Ridge12 February 20183,500 MHz
3.5 GHz
3,500,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
2200GERyzen 3Raven Ridge19 April 20183,200 MHz
3.2 GHz
3,200,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
2200URyzen 3Raven Ridge8 January 20182,500 MHz
2.5 GHz
2,500,000 kHz
32,768 MiB
33,554,432 KiB
34,359,738,368 B
32 GiB
0.0313 TiB
2300URyzen 3Raven Ridge8 January 20182,000 MHz
2 GHz
2,000,000 kHz
32,768 MiB
33,554,432 KiB
34,359,738,368 B
32 GiB
0.0313 TiB
3250URyzen 3Dali6 January 20202,600 MHz
2.6 GHz
2,600,000 kHz
32,768 MiB
33,554,432 KiB
34,359,738,368 B
32 GiB
0.0313 TiB
PRO 1200Ryzen 3Summit Ridge3,100 MHz
3.1 GHz
3,100,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
PRO 1300Ryzen 3Summit Ridge3,500 MHz
3.5 GHz
3,500,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
PRO 2200GRyzen 3Raven Ridge10 May 20183,500 MHz
3.5 GHz
3,500,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
PRO 2200GERyzen 3Raven Ridge10 May 20183,200 MHz
3.2 GHz
3,200,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
PRO 2300URyzen 3Raven Ridge8 January 20182,000 MHz
2 GHz
2,000,000 kHz
32,768 MiB
33,554,432 KiB
34,359,738,368 B
32 GiB
0.0313 TiB
1400Ryzen 5Summit Ridge11 April 20173,200 MHz
3.2 GHz
3,200,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
1500XRyzen 5Summit Ridge11 April 20173,500 MHz
3.5 GHz
3,500,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
1600Ryzen 5Summit Ridge11 April 20173,200 MHz
3.2 GHz
3,200,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
1600XRyzen 5Summit Ridge11 April 20173,600 MHz
3.6 GHz
3,600,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
2400GRyzen 5Raven Ridge12 February 20183,600 MHz
3.6 GHz
3,600,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
2400GERyzen 5Raven Ridge19 April 20183,200 MHz
3.2 GHz
3,200,000 kHz
65,536 MiB
67,108,864 KiB
68,719,476,736 B
64 GiB
0.0625 TiB
Count: 79

See also

codenameZen +
core count2 +, 4 +, 8 +, 16 + and 32 +
designerAMD +
first launched2017 +
full page nameamd/microarchitectures/zen +
instance ofmicroarchitecture +
instruction set architecturex86-16 +, x86-32 + and x86-64 +
manufacturerGlobalFoundries +
microarchitecture typeCPU +
nameZen +
process14 nm (0.014 μm, 1.4e-5 mm) +