Zen - Microarchitectures - AMD

	Edit Values
	Zen µarch
	General Info
Arch Type	CPU
Designer	AMD
Manufacturer	GlobalFoundries
Introduction	March 2, 2017
Process	14 nm
Core Configs	4, 6, 8, 32
	Pipeline
Type	Superscalar
Speculative	Yes
Reg Renaming	Yes
Stages	19
	Instructions
ISA	x86-16, x86-32, x86-64
Extensions	MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, RDRND, F16C, BMI, BMI2, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVE, SHA, CLZERO
	Cache
L1I Cache	64 KiB/core; 4-way set associative
L1D Cache	32 KiB/core; 8-way set associative
L2 Cache	512 KiB/core; 8-way set associative
L3 Cache	2 MiB/core; Up to 16-way set associative
	Cores
Core Names	Raven Ridge,; Summit Ridge,; Snowy Owl,; Naples
	Succession
	Excavator; Puma Zen 2

Zen (family 17h) is the microarchitecture developed by AMD as a successor to both Excavator and Puma. Zen is an entirely new design, built from the ground up for optimal balance of performance and power capable of covering the entire computing spectrum from fanless notebooks to high-performance desktop computers. Zen was officially launched on March 2, 2017. Zen is set to be eventually replaced by Zen 2.

For performance desktop and mobile computing, Zen is branded as Ryzen 3, Ryzen 5, and Ryzen 7 processors.

1 Etymology
2 Codenames
3 Brands
- 3.1 Identification
4 Release Dates
5 Process Technology
6 Compatibility
7 Compiler support
8 Architecture
9 Clock domains
10 Power
11 Features
- 11.1 Simultaneous MultiThreading (SMT)
- 11.2 SenseMI Technology
12 Scalability
- 12.1 CPU Complex (CCX)
- 12.2 Multiprocessors
13 Die
- 13.1 Core
- 13.2 Octa-Core Die
14 Sockets/Platform
15 All Zen Chips
16 References
17 Documents
18 See also

Etymology

Zen was picked by Michael Clark, AMD's senior fellow and lead architect. Zen was picked to represent the balance needed between the various competing aspects of a microprocessor - transistor allocation/die size, clock/frequency restriction, power limitations, and new instructions to implement.

Codenames

Preliminary Data! Information presented in this article deal with future products, data, features, and specifications that have yet to be finalized, announced, or released. Information may be incomplete and can change by final release.

Core	C/T	Target
Naples	Up to 32/64	High-end server multiprocessors
Snowy Owl	16/32	Mid-range server processors
Summit Ridge	Up to 8/16	Mainstream to high-end desktops & enthusiasts market processors
Raven Ridge	4/8	Mainstream desktop & mobile processors with GPU

Brands

Ryzen brand logo

AMD Zen-based processor brands
Logo	Family	General Description	Differentiating Features
Logo	Family	General Description	Cores	Unlocked	AVX2	SMT	XFR	IGP	ECC
	Ryzen 3	Low-end Performance	Quad	✔	✔	✘	✔/✘	✘	✔
	Ryzen 5	Mid-range Performance	Quad	✔	✔	✔	✔/✘	✘	✔
	Ryzen 5	Mid-range Performance	Hexa	✔	✔	✔	✔/✘	✘	✔
	Ryzen 7	High-end Performance / Enthusiasts	Octa	✔	✔	✔	✔/✘	✘	✔

Note: While a model has an unlocked multiplier, not all chipsets support overclocking. (see §Sockets)
Note: 'X' models will enjoy "Full XFR" providing an additional +100 MHz (200 for 1500X) when sufficient thermo/electric requirements are met. Non-X models are limited to just +50 MHz.
Note: All models have ECC support (including Socket AM4), however is not officially supported since AMD did not validate Ryzen models for such capabilities.

Identification

Ryzen

7

1

7

00

X

Ryzen

3

1

2

00

M

Power Segment

(none)	Standard Desktop
U	Standard Mobile
X	High Performance, with XFR
G	Desktop + IGP
T	Low-power Desktop
S	Low-power Desktop + IGP
M	Low-power Mobile
H	High-performance Mobile

Model Number
Reserved for future speed bump/differentiator. Currently all models are "00".

Performance Level

8	Highest
6-7	High
4-5	Mid
1-3	Low

Generation

1	First generation Zen (2017)

Market segment

3	Low-end performance
5	Mid-range performance
7	Enthusiast / High-end performance

Brand Name

Ryzen

Release Dates

The first set of processors, as part of the Ryzen 7 family were introduced at an AMD event on February 22, 2017 before the Game Developer Conference (GDC). However initial models don't get shipped until March 2. Ryzen 5 hexa-core and quad-core variants were released on April 11, 2017. Server processors are set to be released in by the end of Q2, 2017. Mobile processors are expected to be released by the end of 2017.

Process Technology

See also: 14 nm process

Zen is planned to be manufactured on Global Foundries' 14 nm process, same one used by IBM for their POWER9. AMD's previous microarchitectures were based on 32 and 28 nanometer processes. The jump to 14 nm is part of AMD's attempt to remain competitive against Intel (Both SkyLake and Kaby Lake are also manufactured on 14 nm although by late 2017 Intel plans on moving on to Cannonlake and 10 nm process). The move to 14 nm will bring along related benefits of a smaller node such as reduced heat, reduced power consumption, and higher density for identical designs.

Compatibility

Linux added initial support for Zen starting with Linux Kernel 4.1. Microsoft will only support Windows 10 for Zen.

Vendor	OS	Version	Notes
Microsoft	Windows	Windows 7	No Support
		Windows 8	No Support
		Windows 10	Support
Linux	Linux	Kernel 4.1	Initial Support

Compiler support

Compiler	Arch-Specific	Arch-Favorable
GCC	`-march=znver1`	`-mtune=znver1`
LLVM	`-march=znver1`	`-mtune=znver1`
Visual Studio	`/arch:AVX2`	?

Architecture

AMD Zen is an entirely new design from the ground up which introduces considerable amount of improvements and design changes over Excavator. Zen-based microprocessors will utilize AMD's Socket AM4 unified platform.

Key changes from Excavator

Zen was designed to succeed BOTH Excavator (High-performance) and Puma (Low-power) covering the entire range in one architecture
- Cover the entire spectrum from fanless notebooks to high-performance desktops
- More aggressive clock gating with multi-level regions
- Power focus from design, employs low-power design methodologies
  - >15% switching capacitance (C_AC) improvement
Utilizes 14 nm process (from 28 nm)
52% improvement in IPC per core for a single-thread (From Excavator)
Up to 3.7x performance/watt improvment
Return to conventional high-performance x86 design
- Traditional design for cores without shared blocks (e.g. shared SIMD units)
- Large beefier core design
Core engine
- Simultaneous Multithreading (SMT) support, 2 threads/core (see § Simultaneous MultiThreading for details)
- Branch Predictor
  - Improved branch mispredictions
    - Better branch predicitons with 2 branches per BTB entry
    - Lower miss latency penalty
  - BP is now decoupled from fetch stage
- Large Op cache (2K instructions)
- Wider μop dispatch (6, up from 4)
- Larger instruction scheduler
  - Integer (84, up from 48)
  - Floating Point (96, up from 60)
- Larger retire throughput (8, up from 4)
- Larger Retire Queue (192, up from 128)
  - duplicated for each thread
- Larger Load Queue (72, up from 44)
- Larger Store Queue (44, up from 32)
  - duplicated for each thread
- Quad-issue FPU (up from 3-issue)
- Faster Load to FPU (down to 7, from 9 cycles)
Cache system
- L1
  - 64 KiB (double from previous capacity of 32 KiB)
  - Write-back L1 cache eviction policy (From write-through)
  - 2x the bandwidth
- L2
  - 2x the bandwidth
  - Faster L2 cache
- Faster L3 cache
- Large Op cache
- Better L1$ and L2$ data prefetcher
- 5x L3 bandwidth
- Move elimination block added
- Page Table Entry (PTE) Coalescing

New instructions

Zen introduced a number of new x86 instructions:

ADX - Multi-Precision Add-Carry Instruction extension
RDSEED - Hardware-based RNG
SMAP - Supervisor Mode Access Prevention
SHA - SHA extensions
CLFLUSHOPT - Flush Cache Line
XSAVE - Privileged Save/Restore
CLZERO - Zero-out Cache Line (AMD exclusive)

While not new, Zen also supports AVX, AVX2, FMA3, BMI1, BMI2, AES, RdRand, SMEP. Note that with Zen, AMD dropped support for FMA4, XOP, TBM, and LWP.

Block Diagram

Entire SoC Overview

Individual Core

Memory Hierarchy

Cache
- µOP cache
  - 2 KiB, 8-way set associative
    - 32-sets, 8-µOP line size
- L1I Cache:
  - 64 KiB 4-way set associative
    - 256-sets, 64 B line size
    - shared by the two threads, per core
- L1D Cache:
  - 32 KiB 8-way set associative
    - 64-sets, 64 B line size
    - write-back policy
  - 4-5 cycles latency for Int
  - 7-8 cycles latency for FP
- L2 Cache:
  - 512 KiB 8-way set associative
  - 1,024-sets, 64 B line size
  - write-back policy
  - Inclusive of L1
  - 17 cycles latency
- L3 Cache:
  - Victim cache
  - 8 MiB/CCX, shared across all cores.
  - 16-way set associative
    - 8,192-sets, 64 B line size
  - 40 cycles latency
- System DRAM:
  - 2 Channels

Zen TLB consists of dedicated level one TLB for instruction cache and another one for data cache.

TLBs
- ITLB
  - 8 entry L0 TLB, all page sizes
  - 64 entry L1 TLB, all page sizes
  - 512 entry L2 TLB, no 1G pages
- DTLB
  - 64 entry, all page sizes
- STLB
  - 1,532-entry data, no 1G pages
  - 512-entry instruction

Pipeline

Zen presents a major design departure from the previous couple of microarchitectures. In the pursuit of remaining competitive against Intel, AMD went with a similar approach to Intel's: large beefier core with SoC design that can scale from extremely low TDP (fanless devices) to supercomputers utilizing dozens of cores. As such, Zen is aimed at replacing both Excavator (AMD's previous performance microarchitecture) and Puma (AMD's previous ultra-low power arch). In addition to covering the entire computing spectrum through power efficiency and core scalability, another major design goal was 40% uplift in single-thread performance (i.e. 40% IPC increase) from Excavator. The large increase in performance is the result of major redesigns in all four areas of the core (the front end, the execution engine, and the memory subsystem) as well as Zen's new SoC CCX (CPU Complex) modular design. The core itself is wider and all around bigger (roughly every component had its capacity substantially increased). The improvement in power efficiency is the result of the 14 nm process used as well as many low-power design methodologies that were utilized early on in the design process (Excavator has been manufactured on GF's 28 nm process). AMD introduced various components (such as their new prediction flow and forwarding mechanisms) that eliminate the need for operations to go through the high power ALUs and decoders, increasing the overall power efficiency and throughput.

Broad Overview

While Zen is an entirely new design, AMD continued to maintain their traditional design philosophy which shows throughout their design choice such as a split scheduler and split FP and int&memory execution units. At a very broad view, Zen shares many similarities with its predecessor but introduces new elements and major changes. Each core is composed of a front end (in-order area) that fetches instructions, decodes them, generates µOPs and fused µOPs, and sends them to the Execution Engine (out-of-order section). Instructions are either fetched from the L1I$ or come from the µOPs cache (on subsequent fetches) eliminating the decoding stage altogether. Zen decodes 4 instructions/cycle into the µOP Queue. The µOP Queue dispatches separate µOPs to the Integer side and the FP side (dispatching to both at the same time when possible).

The biggest departure from previous generation is Zen's return to traditional core partitioning - every core is an independent core with its own floating-point/SIMD units and a L2 cache. Previously, those units were shared between two cores; they are now once again completely private.

Unlike many of Intel's recent microarchitectures (such as Skylake and Kaby Lake) which make use of a unified scheduler, AMD continue to use a split pipeline design. µOP are decoupled at the µOP Queue and are sent through the two distinct pipelines to either the Integer side or the FP side. The two sections are completely separate, each featuring separate schedulers, queues, and execution units. The Integer side splits up the µOPs via a set of individual schedulers that feed the various ALU units. On the floating point side, there is a different scheduler to handle the 128-bit FP operations. Zen support all modern x86 extensions including AVX/AVX2, BMI1/BMI2, and AES. Zen also supports SHA, secure hash implementation instructions that are currently only found in Intel's ultra-low power microarchitectures (e.g. Goldmont) but not in their mainstream processors.

From the memory subsystem point of view, data is fed into the execution units from the L1D$ via the load and store queue (both of which were almost doubled in capacity) via the two Address Generation Units (AGUs) at the rate of 2 loads and 1 store per cycle. Each core also has a 512 KiB level 2 cache. L2 feeds both the the level 1 data and level 1 instruction caches at 32B per cycle (32B can be send in either direction (bidirectional bus) each cycle). L2 is connected to the L3 cache which is shared across all cores. As with the L1 to L2 transfers, the L2 also transfers data to the L3 and vice versa at 32B per cycle (32B in either direction each cycle).

Front End

The Front End of the Zen core deals with the in-order operations such as instruction fetch and instruction decode. The instruction fetch is composed of two paths: a traditional decode path where instructions come from the instruction cache and a µOPs cache that are determined by the branch prediction (BP) unit. The instruction stream and the branch prediction unit track instructions in 64B windows. Zen is AMD's first design to feature a µOPs cache, a unit that not only improves performance, but also saves power (the µOPs cache was first introduced by Intel in their Sandy Bridge microarchitecture).

The branch prediction unit is decoupled and can start working as soon as it receives a desired operation such as a redirect, ahead of traditional instruction fetches. AMD still uses a hashed perceptron system similar to the one used in Jaguar and Bobcat, albeit likely much more finely tuned. AMD stated it's also larger than previous architectures but did not disclose actual sizes. Once the BP detects an indirect target operation, the branch is moved to the Indirect Target Array (ITA) which is 512 entry deep. The BP includes a 32-entry return stack.

In Zen, AMD moved the instruction TLB to BP (to much earlier in the pipeline than in previous architectures). This was done to allow for more-aggressive prefetching by allowing the physical address to be retrieved at an earlier stage. The BP is capable of storing 2 branches per BTB (Branch Target Buffer) entry, reducing the number of BTB reads necessary. ITLB is composed of:

8-entry L0 TLB, all page sizes
64-entry L1 TLB, all page sizes
512-entry L2 TLB, no 1G pages

fetching

Instructions are fetched from the L2 cache at the rate of 32B/cycle. Zen does not have an even L1$. The instruction cache is 64 KiB, double that of the data cache. Depending on the branch prediction decision instructions may be fetched from the instruction cache or from the µOPs in which case costly decoding will be avoided.

On the traditional side of decode, instructions are fetched from the L1$ at 32B/cycle and go to the instruction byte buffer and through the pick stage to the decode. The size of the instruction byte buffer was not given by AMD but it's expected to be larger than the 16-entry structure found in the previous architecture.

µOP cache & x86 tax

Decoding is the biggest weakness of x86, with decoders being one of the most expensive and complicated aspect of the entire microarchitecture. Instructions can vary from a single byte up to fifteen. Determining instruction boundaries is a complex task in itself. The best way to avoid the x86 decoding tax is to not decode instructions at all. Ideally, most instructions get a hit from the BP and acquire a µOP tag, sending them directly to be retrieved from the µOP cache which are then sent to the µOP Queue. This bypasses most of the expensive fetching and decoding that would otherwise be needed to be done. This caching mechanism is also a considerable power saving feature.

The µOP cache used in Zen is not a trace cache and much closely resembles the one used by Intel in their microarchitectures since Sandy Bridge. The µOP cache is an independent unit not part of the L1I$ and is not a necessarily a subset of the L1I cache either; I.e., there are instances where there could be a hit in the µOP cache but a miss in the L1$. This happens when an instruction that got stored in the µOP cache gets evicted from L1. During the fetch stage probing must be done from both paths. Zen has a specific unit called 'Micro-Tags' which does the probing and determines whether the instruction should be accessed from the µOP cache or from the L1I$. The µOP cache itself has a dedicated $tags for accessing those µOPs.

decode

Having to execute x86, there are instructions that actually include multiple operations. Some of those operations cannot be realized efficiently in an OoOE design and therefore must be converted into simpler operations. In the front-end, complex x86 instructions are broken down into simpler fixed-length operations called macro-operations or MOPs (sometimes also called complex OPs or COPs). Those are often mistaken for being "RISCish" in nature but they retain their CISC characteristics. MOPS can perform both an arithmetic operation and memory operation (e.g. you can read, modify, and write in a single MOP). MOPs can be further cracked into smaller simpler single fixed length operation called micro-operations (µOPs). µOPs are a fixed length operation that performs just a single operation (i.e., only a single load, store, or an arithmetic). Traditionally AMD used to distinguish between the two ops, however with Zen AMD simply refers to everything as µOPs although internally they are still two separate concepts.

Decoding is done by the 4 Zen decoders. The decode stage allows for four x86 instructions to be decoded per cycle which are in turn sent to the µOP Queue. Previously, in the Bulldozer/Jaguar-based designs AMD had two paths: a FastPath Single which emitted a single MOP and a FastPath Double which emitted two MOPs which are in turn sent down the pipe to the schedulers. Michael Clark (Zen's lead architect) noted that Zen's has significantly denser MOPs meaning almost all instructions will be a FastPath Single (i.e., one to one transformations). What would normally get broken down into two MOPs in Bulldozer is now translated into a single dense MOP. It's for those reasons that while up to 8MOPs/cycle can be emitted, usually only 4MOPs/cycle are emitted from the decoders.

optimizations

At the decode stage Zen incorporates the microcode ROM and the Stack Engine Memfile (SEM). The new Memfile sits between the queue and dispatch monitoring the MOP traffic. The Memfile is capable of performing store-to-load forwarding right at dispatch for loads that trail behind known stores with physical addresses. Other things such as eliminating stack PUSH/POP operations are also done at this stage. This is a fairly effective low-power solution that off-loads some of the work that is usually done by the AGU.

Dispatch is capable of sending up to 6 MOPs to Integer EX and an additional 4 MOPs to the Floating Point (FP) EX. Zen can dispatch to both at the same time (i.e. for a maximum of 10 MOPs per cycle).

At this stage of the pipeline, Zen performs additional optimizations such as branch fusion - an operation where a comparison and branch op gets combined into a single µOP (resulting in a single schedule+single execute). An almost identical optimization is also performed by Intel's competing microarchitectures. It's interesting to reiterate the fact that the branch fusion is actually done by the dispatch stage instead of decode. This is a bit unusually because you'd normally perform that operation in decode in order to reduce the number of internal instructions. In Zen, the decoders can still end up emitting two ops just to be fused together in the dispatch stage. This change can likely be attributed to the various optimizations that came along with the introduction of the µOPs cache (which sits parallel to the decoders in the pipeline). It also implies that the decoders are of a simple design intended to be further translated later own in the pipe thereby being limited to a number of key transformations such as instruction boundary detection (i.e., x86 instruction length and rearrangement).

MSROM

A third path that may occasionally be reached is the Micro-code Sequencer ROM. Instructions that end up emitting more than two macro-ops will be redirected to microcode ROM. When this happens the OP Queue is stalled (possibly along with the decoders) and the MSROM gets to emit its MOPs.

Execution Engine

As mentioned early, Zen returns to a fully partitioned core design with a private L2 cache and private FP/SIMD units. Previously those units shared resources spanning two cores. Zen's Execution Engine (Back-End) is split into two major sections: integer & memory operations and floating point operations. The two sections are decoupled with independent schedulers and queues. Both Integer and FP sections have access to the Retire Queue which is 192 entries and can retire 8 instructions per cycle (independent of either Integer or FP). The wider-than-dispatch retire allows Zen to catch up and free the resources much quicker (previous architectures saw bottleneck at this point in situations where an older op is stalling causing a reduction in performance due to retire needing to catch up to the front of the machine).

Integer

The Integer Execute can receive up to 6 µOPs/cycle from Dispatch where it is mapped from logical registers to physical registers. Zen has a 168-entry physical integer register file, an identical size to that of Broadwell. Instead of a large scheduler, Zen has 6 distributed scheduling queues, each 14 entries deep (4xALU, 2xAGU). Zen includes a number of enhancements such as differential checkpoints tracking branch instructions and eliminating redundant values as well as move eliminations. Note that register moves are done internally by modifying the register mapping rather than through an execution of a µOP. While AMD stated that the ALUs are largely symmetric except for a number of exceptions, it's still unknown which operations are reserved to which units.

Two of Zen's ALUs are capable of performing a branch, therefore Zen can peak at 2 branches per cycle. The two branches can simultaneously execute two branch instructions from the same thread or from two separate threads. Since Haswell, Intel also introduced a second branch unit but their reasoning differs significantly from AMD's. The second branch unit in Haswell was added largely in an effort to mitigate port contention. Prior to that change, code involving tight loops that performed SSE operations ended up fighting over the same port as both the SSE operation and the actual branch ended up being scheduled on the same port. Zen doesn't actually have this issue. The addition of a second branch unit in their case serves to purely boost the performance of branch-heavy code. It's interesting to note that on Intel's side, there are constraints on the kind of branches that may execute. For example in Haswell, port 0 can only execute predicted "not-taken" branches whereas port 6 can perform both "taken" and "not taken". It's currently unknown whether Zen as similar restrictions.

Floating Point

The Floating Point side can receive up to 4 µOPs/cycle from Dispatch where it is mapped from logical registers to physical registers. Zen has a 160-entry physical floating point register file which is 8 entries smaller than the one used in Intel's Skylake/Kaby Lake architectures. The register file can perform direct transfers to the Integer register files as needed. Before ops go to the scheduling queue, they go through the Non-Scheduling Queue (NSQ) first which is essentially a wait buffer. Because FP instructions typically have higher latency, they can create a back-up at Dispatch. The non-scheduling queue attempts to reduce this by queuing more FP instructions which lets Dispatch continue on as much as possible on the Integer side. Additionally, the NSQ can go ahead and start working on the memory components of the FP instructions so that they can be ready once they go through the Scheduling Queue. The FP has a single pipe for 128-bit load operations. The FP scheduler has four pipes (1 more than that of Excavator) and operates on 128-bit floating point. In fact, the entire FP side is optimized for 128-bit operations. Zen supports all the latest instructions such as SSE and AVX1/2. The various 265-bit AVX1/2 operations are done by working on individual 128-bit chunks at a time and fuse them together - this does mean 256-bit instructions require twice the resources all-around including (i.e., 2x register+scheduler entries). This does put Zen behind Intel's latest architectures which do have dedicated 256-bit circuitry. Additionally Zen also supports SHA and AES with 2 AES units implemented in an attempt to improve encryption performance.

Memory Subsystem

Loads and Stores are conducted via the two AGUs which can operate simultaneously. Zen has a much larger load queue capable of supporting 72 out-of-order loads (same as Intel's Skylake). There is also a 44-entry Store Queue. Zen employs a split TLB-data pipe design which allows TLB tag access to take place while the data cache is being fed in order to determine if the data is available and send their address to the L2 to start prefetching early on. Zen is capable of up to two loads per cycle (2x16B each) and up to one store per cycle (1x16B). The L1 TLB is 64-entry for all page sizes and the L2 TLB is a 1536-entry with no 1 GiB pages.

Zen incorporates a 64 KiB 4-way set associative L1 instruction cache an a 32 KiB 8-way set associative L2 data cache. Both the instruction cache and the data cache can fetch from the L2 cache at 32 Bytes per cycle. The L2 cache is a 512 KiB 8-way set associative unified cache, inclusive, and private to the core. The L2 cache can fetch and write 32B/cycle into the L3 (32B in either direction each cycle, i.e. bidirectional bus).

Infinity Fabric

Main article: AMD's Infinity Fabric

This section is empty; you can help add the missing info by editing this page.

Clock domains

Zen is divided into a number of clock domains, each operating at a certain frequency:

UClk - UMC Clock - The frequency at which the Unified Memory Controller's (UMC) operates at. This frequency is identical to MemClk.
LClk - Link Clock - The clock at which the I/O Hub Controller communicates with the chip.
FClk - Fabric Clock - The clock at which the data fabric operates at. This frequency is identical to MemClk.
MemClk - Memory Clock - Internal and external memory clock.
CClk - Core Clock - The frequency at which the CPU core and the caches operate at (i.e. advertised frequency).

For example, a stock Ryzen 7 1700 with 2400 MT/s DRAM will have a CClk = 3000 MHz, MemClk = FClk = UClk = 1200 MHz.

Power

RDL - Redistribution layer
LDOs - Regulate RVDD to create VDD per core
RVDD - Ungated supply
VDD - Gated core supply
VDDM - L2/L3 SRAM supply

Zen presented AMD with a number of new challenges in the area of power largely due to their decision to cover the entire spectrum of systems from ultra-low power to high performance. Previously AMD handled this by designing two independent architectures (i.e., Excavator and Puma). In Zen, SoC voltage coming from the Voltage Regulator Module (VRM) is fed to the RVDD, a package metal plane that distributes the highest VID request from all cores. In Zen, each core has a digital LDO regulator (low-dropout) and a digital frequency synthesizer (DFS) to vary frequency and voltage across power states on individual core basis. The LDO regulates RVDD for each power domain and create an optimal VDD per core using a system of sensors they've embedded across the entire chip; this is in addition to other properties such as countermeasures against droop. This is in contrast to some alternative solutions by Intel which attempted to integrated the voltage regulator (FIVR) on die in Haswell (and consequently removing it in Skylake due to a number of thermal restrictions it created). Zen's new voltage control is an attempt at a much finer power tuning on a per core level based on a collection of information it has on that core and overall chip.

AMD uses a Metal-Insulator-Metal Capacitor (MIMCap) layer between the two upper level metal layers for fast current injection in order to mitigate voltage droop. AMD stated that it covers roughly 45% of the core and a slightly smaller coverage of the L3. In addition to the LDO circuit integrated for each core is a low-latency power supply droop detector that can trigger the digital LDOs to turn on more drivers to counter droops.

A larger number of sensors across the entire die are used to measure many of the CPU states including frequency, voltage, power, and temperature. The data is in turn used for workload characterization, adaptive voltage, frequency tuning, and dynamic clocking. Adaptive voltage and frequency scaling (AVFS), an on-die closed-loop system that adjusts the voltage in real time following real-time measurements based on sensory data collected. This is part of AMD's "Precision Boost" technology offering high granularity of 25 MHz clock increments.

Zen implements over 1300 sensors to monitor the state of the die over all critical paths including the CCX and external components such as the memory fabric. Additionally the CCX also incorporates 48 high-speed power supply monitors, 20 thermal diodes, and 9 high-speed droop detectors.

Features

AMD introduced a series of new features in their new Zen microarchitecture:

Simultaneous MultiThreading (SMT)

Perhaps the single biggest enhancement to Zen is the addition of full-fledged simultaneous multithreading (SMT) support (a technology similar to Hyper-Threading found in Intel processors). This is a departure from AMD's previous lightweight (and largely ineffective and to some degree misleading) Clustered Multithreading (CMT). Zen is a properly simultaneous multi-threaded machine capable of handling two threads of execution throughout the entire machine. Below is a breakdown of how the various core components work under SMT:

- Competitively shared structures
- Competitively shared and SMT tagged
- Competitively shared with Algorithmic Priority
- Statically Partitioned

The basics behind SMT are always the same: high utilization of resources through multiple threads of execution. When a single thread is running all structures become fully available to that thread as needed. With the introduction of SMT and a second thread, Zen attempts to share as much of the resources as possible in an attempt to balance out the throughput and deliver the appropriate structures to each thread as the software requires. The various structures can dynamically shift their resources depending on the kind of workload being executed. Structures that are competitively shared by the two threads (shaded in red in the diagram) include the execution units, schedulers, register file, the decode, and cache (including the µOP cache). The load queue, ITLB, and DTLB (shaded in dark cyan) are also competitively shared but require SMT tagging - resources (i.e. entries capacity) are shared between the threads but actual entry values (e.g. addresses) can only be accessed by the owning thread.

The branch predictor and the two register renaming/allocation units (shaded in blue) are competitively shared with algorithmic priority. Zen provides additional logic to give a certain thread temporary priority in resource allocation over the other thread. One such occasion is when the BP encounters a flush on one of the threads. Temporary priority is given to that thread in order to help it fetch much instructions as it could so it can get going again. Additionally, similar logic can be found at dispatch to ensure good throughput by both threads and high utilization of the execution units.

The µOP Queue, Retire Queue, and Store Queue (shaded in green on the diagram) are statically partitioned, i.e. those units have duplicate logic to handle each thread independently. Those were duplicated instead of shared simply due to the high complexity involved in doing so.

SenseMI Technology

SenseMI Technology (pronounced Sense-Em-Eye) is an umbrella term for a number of features AMD added to Zen microprocessors designed to increase performance through various self-tuning using a network of sensors:

Neural Net Prediction - This appears to be largely marketing term for Zen's much beefier and more finely tune branch prediction unit. Zen uses a hashed perceptron system to intelligently anticipate future code flows, allowing warming up of cold blocks in order to avoid possible waits. Most of that functionality is already found on every modern high-end microprocessor (including AMD's own previous microarchitectures). Because AMD has not disclosed any more specific information about BP, it can only be speculated that no new groundbreaking logic was introduced in Zen.

Smart Prefetch - As with the Prediction Unit, this too appears to be a marketing term for the number of changes AMD introduced in the fetch stage where the the branch predictor can get a hit on the next µOP and retrieve it via the µOPs cache directly to the µOPs Queue, eliminating the costly decode pipeline stages. Additionally Zen can detect various data patterns in the program's execution and predict future data requests allowing for prefetching ahead of time reducing latency.

Pure Power - A feature in Zen that allows for dynamic voltage and frequency scaling (DVFS), similar to AMD's PowerTune technology or Cool'n'Quiet, along with a number of other enhancements that extends beyond the core to the Infinity Fabric (AMD's new proprietary interconnect). Pure Power monitors the state of the processor (e.g., workload), which in terms allows it to downclock when not under load in order to save power. Zen incorporates a network of sensors across the entire chip to help aid Pure Power in its monitoring.

Precision Boost - A feature that provides the ability to adjust the frequency of the processor on-the-fly given sufficient headroom (e.g. thermal limits based on the sensory data collected by a network of sensors across the chip), i.e. "Turbo Frequency". Precision Boost adjusts in 25 MHz increments, considerably more granular when compared to Intel's Turbo Boost which operates at 100 MHz bin increments. Having more granular boost increments in theory could allow it to clock slightly higher than competitor's products without reaching thermal limits (e.g., complex workloads involving AVX2).

Extended Frequency Range (XFR) - This is a fully automated solution that attempts to allow higher upper limit on the maximum frequency based on the cooling technique used (e.g. air, water, LN2). Whenever the chip senses that it's suitable enough for a given frequency, it will attempt to increase that limit further. XFR is partially enabled on all models, providing an extra +50 MHz frequency boost whenever possible. For 'X' models, full XFR is enabled providing twice the headroom of up to +100 MHz.

The AMD presentation slide on the right depicts a normal use case for the Ryzen 7 1800X. When under normal workload, the processor will operate at around its base frequency of 3.6 GHz. When expericing heavier workload, Precision Boost will kick in increment it as necessary up to its maximum frequency of 4 GHz. With adequate cooling, XFR will bump it up an additional 100 MHz. When light workload get experienced, the processor will reduce its frequency. As Pure Power senses the workload and CPU state, it can also drastically downclock the CPU when appropriate (such as in the graph during mostly idle).

Scalability

CPU Complex (CCX)

AMD organized Zen in groups of cores called a CPU Complex (CCX). Each CCX consists of four cores connected to an L3 cache. The L3 cache is an 8 MiB 16-way set associative victim cache and is mostly exclusive of the L2. The L3 cache is made of four slices (providing 2 MiB L3 slice/core) organized by low-order address interleaved. Every core can access every L3 cache slice with the same average latency. When a certain core starts working on a chunk of memory it will fill up the L2 and as it continue to execute and fetch new data any spillover will find its way in the L3.

Depending on the exact processor processor model, there may be one or more CCXs joined together. For example, all mainstream Ryzen 3/5/7 models have two CCXs with up to 8 cores (and an equal amount of cores disabled on each CCX as the chips are down-binned to 4/6 cores). It's important to note that the L3 in Zen is not a true last level cache (LLC) as the 16 MiB L3$ will consist of two separate 8 MiB and not one unified L3. The separate CPU complexes can communicate with each other via the Infinity Fabric which connects the CCXs along with the memory controller and I/O. While the CCXs operate at core frequency (CClk), the fabric itself operates at MemClk (see § Clock domains). This design choice allows for the scaling up to large high-performance multi-core system (i.e., high scalability, particularly in the server segment, through high core count and large bandwidth) but it does mean that systems making use of Zen processors have to treat every CPU Complex as a processor of its own - i.e., schedule tasks using cache-coherent non-uniform memory access (ccNUMA-aware) scheduling. This is important to ensure that threads are not moved from one CCX to the other as doing so will likely incur unnecessary performance penalties (as cache data would need to be communicated over via the fabric from one CCX to the next which has additional overhead latency and lower bandwidth).

While specific worst-case scenario performance tests have shown that rapid inter-CCXs data movement incur a substantial performance penalty, real world tests have shown the penalty is rather small in practice as the operating system (e.g. Windows) knows how to do the right thing. Additionally performance can be improved with faster memory kits which in turn increases the frequency of the fabric as well (see § Clock domains).

Multiprocessors

As part of the Zen architecture, AMD also developed a series of dual-socket multiprocessors. These chips are complete system on chips with the northbridge and southbridge integrated on-die (I.e., no chipset is required). Naples-based processors scale all the way up to 32 cores with 64 threads (for up to 64 cores and 128 threads per complete system). Every processor has 128 lanes of PCIe. This is considerably more than any comparable Intel model (either Broadwell EP or Skylake EP). The caveat is that when in 2-way MP mode, half of the lanes are lost. 64 of the 128 of the PCIe lanes get allocated for interchip communication via AMD's Infinity Fabrics protocols with the remaining 64 lanes left for the system. This still leaves the system with a lot more lanes than Intel, but its still identical amount for twice the CPUs.

To reach 32 cores, those high core count Naples chips will utilize multiple CCXs (up to 8 for the 32-core model). This does correctly implies that with eight CCXs there will be eight memory controllers of dual-channel ECC DDR4 memory for up to 16 DIMMs and 2 TiB of memory.

It's currently not known how AMD is planning to package the high core count Naples-based chips, but due to the large die sizes, these chips will make use of multi-chip-module design whereby multiple smaller dice are glued together and packaged together to form the final product.

Die

Core

Fabricated on a 14 nm process, using 12 metal layers.

Core

Area 7 mm²
L2 512 KiB; 1.5 mm²/core

400px

CCX

Area 44 mm²
L3 8 MiB; 16 mm²
1,400,000,000 transistors

Octa-Core Die

14 nm process
12 metal layers
2,000 meters of signals
4,800,000,000 transistors
22.01 mm x 8.87 mm
~195.228 mm² die size

Preliminary Data! Information presented in this article deal with future products, data, features, and specifications that have yet to be finalized, announced, or released. Information may be incomplete and can change by final release.

Sockets/Platform

All Zen-based consumer microprocessors utilizes AMD's Socket AM4, a unified socket infrastructure. It's interesting to note that every Ryzen 7 processor is actually a complete system on a chip integrating the northbridge (memory controller) and the southbridge including 16 PCIe lanes for the GPU, 4 PCIe lanes for I/O along with an NVMe controller as well as USB 3.0 and SATA controllers. Therefore in theory, Ryzen 7 processors do not even require a chipset. The role of the chipsets for Zen is to simply provide a number of additional connections beyond that offered by the SoC.

Socket AM4 Platform [Edit]
Segment	Chipset	USB			SATA	SATAe	PCIe	RAID	Dual PCIe	Overclocking
Segment	Chipset	3.1 G1	3.1 G2	2.0	SATA	SATAe	PCIe	RAID	Dual PCIe	Overclocking
500-series (Zen+, Zen 2, Zen 3)
Mainstream	B550	2	6	6	8 + 4x NVME	0	16x Gen4	0,1,10	✔	✔
Enthusiast	X570	0	8	4	14 + 4x NVME	0	16x Gen4	0,1,10	✔	✔
400-series (Zen+)
Mainstream	B450	2	2	6	6 + 4x NVME	1	6x Gen3	0,1,10	✘	✔
Enthusiast	X470	6	2	6	10 + 4x NVME	2	8x Gen3	0,1,10	✔	✔
300-series (Zen)
Small Form Factor	A300, B300	4	0	0	2 + 2x NVMe	1	4x Gen3	0,1	✘
Small Form Factor	X300	4	0	0	2 + 2x NVMe	1	4x Gen3	0,1	✔
Entry-level	A320	6	1	6	4 + 2x NVMe	2	4x Gen2	0,1,10	✘	✘
Mainstream	B350	6	2	6	4 + 2x NVMe	2	6x Gen2	0,1,10	✘	✔
Enthusiast	X370	6	2	6	6 + 2x NVMe	2	8x Gen2	0,1,10	✔	✔

All Zen Chips

... further results

	List of all Zen-based Processors
	Processor														Features
Model	Price	Process	Launched	Family	Core	C	T	L3$	L2$	L1$	Freq	Turbo	TDP	Max Mem	SMT	AMD-V	XFR
200GE	$ 55.00 € 49.50 £ 44.55 ¥ 5,683.15	14 nm 0.014 μm 1.4e-5 mm	6 September 2018	Athlon	Raven Ridge	2	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	1 MiB 1,024 KiB 1,048,576 B 9.765625e-4 GiB	192 KiB 196,608 B 0.188 MiB	3.2 GHz 3,200 MHz 3,200,000 kHz		35 W 35,000 mW 0.0469 hp 0.035 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✘
220GE	$ 65.00 € 58.50 £ 52.65 ¥ 6,716.45	14 nm 0.014 μm 1.4e-5 mm	21 December 2018	Athlon	Raven Ridge	2	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	1 MiB 1,024 KiB 1,048,576 B 9.765625e-4 GiB	192 KiB 196,608 B 0.188 MiB	3.4 GHz 3,400 MHz 3,400,000 kHz		35 W 35,000 mW 0.0469 hp 0.035 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✘
240GE	$ 75.00 € 67.50 £ 60.75 ¥ 7,749.75	14 nm 0.014 μm 1.4e-5 mm	21 December 2018	Athlon	Raven Ridge	2	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	1 MiB 1,024 KiB 1,048,576 B 9.765625e-4 GiB	192 KiB 196,608 B 0.188 MiB	3.5 GHz 3,500 MHz 3,500,000 kHz		35 W 35,000 mW 0.0469 hp 0.035 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✘
3000G	$ 49.00 € 44.10 £ 39.69 ¥ 5,063.17	14 nm 0.014 μm 1.4e-5 mm	20 November 2019	Athlon	Dali Raven Ridge	2	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	1 MiB 1,024 KiB 1,048,576 B 9.765625e-4 GiB	192 KiB 196,608 B 0.188 MiB	3.5 GHz 3,500 MHz 3,500,000 kHz		35 W 35,000 mW 0.0469 hp 0.035 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✔
300U		14 nm 0.014 μm 1.4e-5 mm	6 January 2019	Athlon	Picasso	2	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	1 MiB 1,024 KiB 1,048,576 B 9.765625e-4 GiB	192 KiB 196,608 B 0.188 MiB	2.4 GHz 2,400 MHz 2,400,000 kHz		15 W 15,000 mW 0.0201 hp 0.015 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✘
3150U		14 nm 0.014 μm 1.4e-5 mm	6 January 2020	Athlon Gold	Dali	2	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	1 MiB 1,024 KiB 1,048,576 B 9.765625e-4 GiB	192 KiB 196,608 B 0.188 MiB	2.4 GHz 2,400 MHz 2,400,000 kHz		15 W 15,000 mW 0.0201 hp 0.015 kW	32 GiB 32,768 MiB 33,554,432 KiB 34,359,738,368 B 0.0313 TiB	✔	✔	✘
PRO 200GE		14 nm 0.014 μm 1.4e-5 mm	6 September 2018	Athlon	Raven Ridge	2	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	1 MiB 1,024 KiB 1,048,576 B 9.765625e-4 GiB	192 KiB 196,608 B 0.188 MiB	3.2 GHz 3,200 MHz 3,200,000 kHz		35 W 35,000 mW 0.0469 hp 0.035 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✘
3050U		14 nm 0.014 μm 1.4e-5 mm	6 January 2020	Athlon Silver	Dali	2	2	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	1 MiB 1,024 KiB 1,048,576 B 9.765625e-4 GiB	192 KiB 196,608 B 0.188 MiB	2.3 GHz 2,300 MHz 2,300,000 kHz		15 W 15,000 mW 0.0201 hp 0.015 kW	32 GiB 32,768 MiB 33,554,432 KiB 34,359,738,368 B 0.0313 TiB	✔	✔	✘
7251	$ 574.00 € 516.60 £ 464.94 ¥ 59,311.42	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	8	16	32 MiB 32,768 KiB 33,554,432 B 0.0313 GiB	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	768 KiB 786,432 B 0.75 MiB	2.1 GHz 2,100 MHz 2,100,000 kHz	2.9 GHz 2,900 MHz 2,900,000 kHz	120 W 120,000 mW 0.161 hp 0.12 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7261		14 nm 0.014 μm 1.4e-5 mm	14 June 2018	EPYC	Naples	8	16	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	768 KiB 786,432 B 0.75 MiB	2.5 GHz 2,500 MHz 2,500,000 kHz	2.9 GHz 2,900 MHz 2,900,000 kHz	155 W 155,000 mW 0.208 hp 0.155 kW 170 W 170,000 mW 0.228 hp 0.17 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7281	$ 650.00 € 585.00 £ 526.50 ¥ 67,164.50	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	16	32	32 MiB 32,768 KiB 33,554,432 B 0.0313 GiB	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	1,536 KiB 1,572,864 B 1.5 MiB	2.1 GHz 2,100 MHz 2,100,000 kHz	2.7 GHz 2,700 MHz 2,700,000 kHz	155 W 155,000 mW 0.208 hp 0.155 kW 170 W 170,000 mW 0.228 hp 0.17 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7301	$ 825.00 € 742.50 £ 668.25 ¥ 85,247.25	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	16	32	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	1,536 KiB 1,572,864 B 1.5 MiB	2.2 GHz 2,200 MHz 2,200,000 kHz	2.7 GHz 2,700 MHz 2,700,000 kHz	155 W 155,000 mW 0.208 hp 0.155 kW 170 W 170,000 mW 0.228 hp 0.17 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7351	$ 1,100.00 € 990.00 £ 891.00 ¥ 113,663.00	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	16	32	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	1,536 KiB 1,572,864 B 1.5 MiB	2.4 GHz 2,400 MHz 2,400,000 kHz	2.9 GHz 2,900 MHz 2,900,000 kHz	155 W 155,000 mW 0.208 hp 0.155 kW 170 W 170,000 mW 0.228 hp 0.17 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7351P	$ 750.00 € 675.00 £ 607.50 ¥ 77,497.50	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	16	32	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	1,536 KiB 1,572,864 B 1.5 MiB	2.4 GHz 2,400 MHz 2,400,000 kHz	2.9 GHz 2,900 MHz 2,900,000 kHz	155 W 155,000 mW 0.208 hp 0.155 kW 170 W 170,000 mW 0.228 hp 0.17 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7371	$ 1,550.00 € 1,395.00 £ 1,255.50 ¥ 160,161.50	14 nm 0.014 μm 1.4e-5 mm	2019	EPYC	Naples	16	32	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	1,536 KiB 1,572,864 B 1.5 MiB	3.1 GHz 3,100 MHz 3,100,000 kHz	3.8 GHz 3,800 MHz 3,800,000 kHz	200 W 200,000 mW 0.268 hp 0.2 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7401	$ 1,850.00 € 1,665.00 £ 1,498.50 ¥ 191,160.50	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	24	48	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	12 MiB 12,288 KiB 12,582,912 B 0.0117 GiB	2,304 KiB 2,359,296 B 2.25 MiB	2 GHz 2,000 MHz 2,000,000 kHz	3 GHz 3,000 MHz 3,000,000 kHz	155 W 155,000 mW 0.208 hp 0.155 kW 170 W 170,000 mW 0.228 hp 0.17 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7401P	$ 1,075.00 € 967.50 £ 870.75 ¥ 111,079.75	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	24	48	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	12 MiB 12,288 KiB 12,582,912 B 0.0117 GiB	2,304 KiB 2,359,296 B 2.25 MiB	2 GHz 2,000 MHz 2,000,000 kHz	3 GHz 3,000 MHz 3,000,000 kHz	155 W 155,000 mW 0.208 hp 0.155 kW 170 W 170,000 mW 0.228 hp 0.17 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7451	$ 2,400.00 € 2,160.00 £ 1,944.00 ¥ 247,992.00	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	24	48	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	12 MiB 12,288 KiB 12,582,912 B 0.0117 GiB	2,304 KiB 2,359,296 B 2.25 MiB	2.3 GHz 2,300 MHz 2,300,000 kHz	3.2 GHz 3,200 MHz 3,200,000 kHz	180 W 180,000 mW 0.241 hp 0.18 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7501	$ 3,400.00 € 3,060.00 £ 2,754.00 ¥ 351,322.00	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	32	64	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	3,072 KiB 3,145,728 B 3 MiB	2 GHz 2,000 MHz 2,000,000 kHz	3 GHz 3,000 MHz 3,000,000 kHz	155 W 155,000 mW 0.208 hp 0.155 kW 170 W 170,000 mW 0.228 hp 0.17 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7551	$ 3,400.00 € 3,060.00 £ 2,754.00 ¥ 351,322.00	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	32	64	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	3,072 KiB 3,145,728 B 3 MiB	2 GHz 2,000 MHz 2,000,000 kHz	3 GHz 3,000 MHz 3,000,000 kHz	180 W 180,000 mW 0.241 hp 0.18 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7551P	$ 2,100.00 € 1,890.00 £ 1,701.00 ¥ 216,993.00	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	32	64	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	3,072 KiB 3,145,728 B 3 MiB	2 GHz 2,000 MHz 2,000,000 kHz	3 GHz 3,000 MHz 3,000,000 kHz	180 W 180,000 mW 0.241 hp 0.18 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
7601	$ 4,200.00 € 3,780.00 £ 3,402.00 ¥ 433,986.00	14 nm 0.014 μm 1.4e-5 mm	20 June 2017	EPYC	Naples	32	64	64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	3,072 KiB 3,145,728 B 3 MiB	2.2 GHz 2,200 MHz 2,200,000 kHz	3.2 GHz 3,200 MHz 3,200,000 kHz	180 W 180,000 mW 0.241 hp 0.18 kW	2,048 GiB 2,097,152 MiB 2,147,483,648 KiB 2,199,023,255,552 B 2 TiB	✔	✔	✘
3101		14 nm 0.014 μm 1.4e-5 mm	21 February 2018	EPYC Embedded	Snowy Owl	4	4	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	2.1 GHz 2,100 MHz 2,100,000 kHz	2.9 GHz 2,900 MHz 2,900,000 kHz	35 W 35,000 mW 0.0469 hp 0.035 kW	512 GiB 524,288 MiB 536,870,912 KiB 549,755,813,888 B 0.5 TiB	✘	✔	✘
3151		14 nm 0.014 μm 1.4e-5 mm	21 February 2018	EPYC Embedded	Snowy Owl	4	8	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	2.7 GHz 2,700 MHz 2,700,000 kHz	2.9 GHz 2,900 MHz 2,900,000 kHz	45 W 45,000 mW 0.0603 hp 0.045 kW	512 GiB 524,288 MiB 536,870,912 KiB 549,755,813,888 B 0.5 TiB	✔	✔	✘
3201		14 nm 0.014 μm 1.4e-5 mm	21 February 2018	EPYC Embedded	Snowy Owl	8	8	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	768 KiB 786,432 B 0.75 MiB	1.5 GHz 1,500 MHz 1,500,000 kHz	3.1 GHz 3,100 MHz 3,100,000 kHz	30 W 30,000 mW 0.0402 hp 0.03 kW	512 GiB 524,288 MiB 536,870,912 KiB 549,755,813,888 B 0.5 TiB	✘	✔	✘
3251	$ 315.00 € 283.50 £ 255.15 ¥ 32,548.95	14 nm 0.014 μm 1.4e-5 mm	21 February 2018	EPYC Embedded	Snowy Owl	8	16	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	768 KiB 786,432 B 0.75 MiB	2.5 GHz 2,500 MHz 2,500,000 kHz	3.1 GHz 3,100 MHz 3,100,000 kHz	55 W 55,000 mW 0.0738 hp 0.055 kW	512 GiB 524,288 MiB 536,870,912 KiB 549,755,813,888 B 0.5 TiB	✔	✔	✘
3255		14 nm 0.014 μm 1.4e-5 mm		EPYC Embedded	Snowy Owl	8	16	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	768 KiB 786,432 B 0.75 MiB	2.5 GHz 2,500 MHz 2,500,000 kHz	3.1 GHz 3,100 MHz 3,100,000 kHz	55 W 55,000 mW 0.0738 hp 0.055 kW	512 GiB 524,288 MiB 536,870,912 KiB 549,755,813,888 B 0.5 TiB	✔	✔	✘
3301	$ 450.00 € 405.00 £ 364.50 ¥ 46,498.50	14 nm 0.014 μm 1.4e-5 mm	21 February 2018	EPYC Embedded	Snowy Owl	12	12	32 MiB 32,768 KiB 33,554,432 B 0.0313 GiB	6 MiB 6,144 KiB 6,291,456 B 0.00586 GiB	1,152 KiB 1,179,648 B 1.125 MiB	2 GHz 2,000 MHz 2,000,000 kHz	3 GHz 3,000 MHz 3,000,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	1,024 GiB 1,048,576 MiB 1,073,741,824 KiB 1,099,511,627,776 B 1 TiB	✘	✔	✘
3351		14 nm 0.014 μm 1.4e-5 mm	21 February 2018	EPYC Embedded	Snowy Owl	12	24	32 MiB 32,768 KiB 33,554,432 B 0.0313 GiB	6 MiB 6,144 KiB 6,291,456 B 0.00586 GiB	1,152 KiB 1,179,648 B 1.125 MiB	1.9 GHz 1,900 MHz 1,900,000 kHz	3 GHz 3,000 MHz 3,000,000 kHz	80 W 80,000 mW 0.107 hp 0.08 kW	1,024 GiB 1,048,576 MiB 1,073,741,824 KiB 1,099,511,627,776 B 1 TiB	✔	✔	✘
3401		14 nm 0.014 μm 1.4e-5 mm	21 February 2018	EPYC Embedded	Snowy Owl	16	16	32 MiB 32,768 KiB 33,554,432 B 0.0313 GiB	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	1,536 KiB 1,572,864 B 1.5 MiB	1.85 GHz 1,850 MHz 1,850,000 kHz	3 GHz 3,000 MHz 3,000,000 kHz	85 W 85,000 mW 0.114 hp 0.085 kW	1,024 GiB 1,048,576 MiB 1,073,741,824 KiB 1,099,511,627,776 B 1 TiB	✘	✔	✘
3451	$ 880.00 € 792.00 £ 712.80 ¥ 90,930.40	14 nm 0.014 μm 1.4e-5 mm	21 February 2018	EPYC Embedded	Snowy Owl	16	32	32 MiB 32,768 KiB 33,554,432 B 0.0313 GiB	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	1,536 KiB 1,572,864 B 1.5 MiB	2.15 GHz 2,150 MHz 2,150,000 kHz	3 GHz 3,000 MHz 3,000,000 kHz	100 W 100,000 mW 0.134 hp 0.1 kW	1,024 GiB 1,048,576 MiB 1,073,741,824 KiB 1,099,511,627,776 B 1 TiB	✔	✔	✘
FireFlight			3 August 2018			4	8	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3 GHz 3,000 MHz 3,000,000 kHz			8 GiB 8,192 MiB 8,388,608 KiB 8,589,934,592 B 0.00781 TiB	✔	✔	✔
1200	$ 109.00 € 98.10 £ 88.29 ¥ 11,262.97	14 nm 0.014 μm 1.4e-5 mm	27 July 2017	Ryzen 3	Summit Ridge	4	4	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.1 GHz 3,100 MHz 3,100,000 kHz	3.4 GHz 3,400 MHz 3,400,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✘	✔	✘
1300X	$ 129.00 € 116.10 £ 104.49 ¥ 13,329.57	14 nm 0.014 μm 1.4e-5 mm	27 July 2017	Ryzen 3	Summit Ridge	4	4	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.5 GHz 3,500 MHz 3,500,000 kHz	3.7 GHz 3,700 MHz 3,700,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✔
2200G	$ 99.00 € 89.10 £ 80.19 ¥ 10,229.67	14 nm 0.014 μm 1.4e-5 mm	12 February 2018	Ryzen 3	Raven Ridge	4	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.5 GHz 3,500 MHz 3,500,000 kHz	3.7 GHz 3,700 MHz 3,700,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✘	✔	✔
2200GE		14 nm 0.014 μm 1.4e-5 mm	19 April 2018	Ryzen 3	Raven Ridge	4	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.2 GHz 3,200 MHz 3,200,000 kHz	3.6 GHz 3,600 MHz 3,600,000 kHz	35 W 35,000 mW 0.0469 hp 0.035 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✘	✔	✔
2200U		14 nm 0.014 μm 1.4e-5 mm	8 January 2018	Ryzen 3	Raven Ridge	2	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	1 MiB 1,024 KiB 1,048,576 B 9.765625e-4 GiB	192 KiB 196,608 B 0.188 MiB	2.5 GHz 2,500 MHz 2,500,000 kHz	3.4 GHz 3,400 MHz 3,400,000 kHz	15 W 15,000 mW 0.0201 hp 0.015 kW	32 GiB 32,768 MiB 33,554,432 KiB 34,359,738,368 B 0.0313 TiB	✔	✔	✘
2300U		14 nm 0.014 μm 1.4e-5 mm	8 January 2018	Ryzen 3	Raven Ridge	4	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	2 GHz 2,000 MHz 2,000,000 kHz	3.4 GHz 3,400 MHz 3,400,000 kHz	15 W 15,000 mW 0.0201 hp 0.015 kW	32 GiB 32,768 MiB 33,554,432 KiB 34,359,738,368 B 0.0313 TiB	✘	✔	✘
3250U		14 nm 0.014 μm 1.4e-5 mm	6 January 2020	Ryzen 3	Dali	2	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	1 MiB 1,024 KiB 1,048,576 B 9.765625e-4 GiB	192 KiB 196,608 B 0.188 MiB	2.6 GHz 2,600 MHz 2,600,000 kHz		15 W 15,000 mW 0.0201 hp 0.015 kW	32 GiB 32,768 MiB 33,554,432 KiB 34,359,738,368 B 0.0313 TiB	✔	✔	✘
PRO 1200		14 nm 0.014 μm 1.4e-5 mm		Ryzen 3	Summit Ridge	4	4	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.1 GHz 3,100 MHz 3,100,000 kHz	3.4 GHz 3,400 MHz 3,400,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✘
PRO 1300		14 nm 0.014 μm 1.4e-5 mm		Ryzen 3	Summit Ridge	4	4	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.5 GHz 3,500 MHz 3,500,000 kHz	3.7 GHz 3,700 MHz 3,700,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✘
PRO 2200G		14 nm 0.014 μm 1.4e-5 mm	10 May 2018	Ryzen 3	Raven Ridge	4	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.5 GHz 3,500 MHz 3,500,000 kHz	3.7 GHz 3,700 MHz 3,700,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✘	✔	✘
PRO 2200GE		14 nm 0.014 μm 1.4e-5 mm	10 May 2018	Ryzen 3	Raven Ridge	4	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.2 GHz 3,200 MHz 3,200,000 kHz	3.6 GHz 3,600 MHz 3,600,000 kHz	35 W 35,000 mW 0.0469 hp 0.035 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✘	✔	✔
PRO 2300U		14 nm 0.014 μm 1.4e-5 mm	8 January 2018	Ryzen 3	Raven Ridge	4	4	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	2 GHz 2,000 MHz 2,000,000 kHz	3.4 GHz 3,400 MHz 3,400,000 kHz	15 W 15,000 mW 0.0201 hp 0.015 kW	32 GiB 32,768 MiB 33,554,432 KiB 34,359,738,368 B 0.0313 TiB	✘	✔	✘
1400	$ 169.00 € 152.10 £ 136.89 ¥ 17,462.77	14 nm 0.014 μm 1.4e-5 mm	11 April 2017	Ryzen 5	Summit Ridge	4	8	8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.2 GHz 3,200 MHz 3,200,000 kHz	3.4 GHz 3,400 MHz 3,400,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✘
1500X	$ 189.00 € 170.10 £ 153.09 ¥ 19,529.37	14 nm 0.014 μm 1.4e-5 mm	11 April 2017	Ryzen 5	Summit Ridge	4	8	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.5 GHz 3,500 MHz 3,500,000 kHz	3.7 GHz 3,700 MHz 3,700,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✔
1600	$ 219.00 € 197.10 £ 177.39 ¥ 22,629.27	14 nm 0.014 μm 1.4e-5 mm	11 April 2017	Ryzen 5	Summit Ridge	6	12	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	3 MiB 3,072 KiB 3,145,728 B 0.00293 GiB	576 KiB 589,824 B 0.563 MiB	3.2 GHz 3,200 MHz 3,200,000 kHz	3.6 GHz 3,600 MHz 3,600,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✘	✘
1600X	$ 249.00 € 224.10 £ 201.69 ¥ 25,729.17	14 nm 0.014 μm 1.4e-5 mm	11 April 2017	Ryzen 5	Summit Ridge	6	12	16 MiB 16,384 KiB 16,777,216 B 0.0156 GiB	3 MiB 3,072 KiB 3,145,728 B 0.00293 GiB	576 KiB 589,824 B 0.563 MiB	3.6 GHz 3,600 MHz 3,600,000 kHz	4 GHz 4,000 MHz 4,000,000 kHz	95 W 95,000 mW 0.127 hp 0.095 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✔
2400G	$ 169.00 € 152.10 £ 136.89 ¥ 17,462.77	14 nm 0.014 μm 1.4e-5 mm	12 February 2018	Ryzen 5	Raven Ridge	4	8	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	386 KiB 395,264 B 0.377 MiB	3.6 GHz 3,600 MHz 3,600,000 kHz	3.9 GHz 3,900 MHz 3,900,000 kHz	65 W 65,000 mW 0.0872 hp 0.065 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✘
2400GE		14 nm 0.014 μm 1.4e-5 mm	19 April 2018	Ryzen 5	Raven Ridge	4	8	4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB	2 MiB 2,048 KiB 2,097,152 B 0.00195 GiB	384 KiB 393,216 B 0.375 MiB	3.2 GHz 3,200 MHz 3,200,000 kHz	3.8 GHz 3,800 MHz 3,800,000 kHz	35 W 35,000 mW 0.0469 hp 0.035 kW	64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB	✔	✔	✘
Count: 79

References

Michael Clark, AMD's senior fellow and lead architect, Hot Chips 28
Lisa Su, AMD CEO, AMD: New Horizon Live Event
Lisa Su, AMD CEO, AMD Annual Meeting of Shareholders Q4 2016
Meet the AMD Experts - AMD Monthly Partner Training, January 2017
Zen: A Next-Generation High-Performance x86 Core, ISSCC 2017
AMD 'Tech Day', February 22, 2017
AMD Zen at GDC 2017, March 3, 2017

codename	Zen +
core count	4 +, 6 +, 8 + and 32 +
designer	AMD +
first launched	March 2, 2017 +
full page name	amd/microarchitectures/zen +
instance of	microarchitecture +
instruction set architecture	x86-16 +, x86-32 + and x86-64 +
manufacturer	GlobalFoundries +
microarchitecture type	CPU +
name	Zen +
pipeline stages	19 +
process	14 nm (0.014 μm, 1.4e-5 mm) +

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung