Edit Values | |
Neoverse V1 µarch | |
General Info | |
Arch Type | CPU |
Designer | ARM Holdings |
Manufacturer | TSMC |
Introduction | 2021 |
Process | 7 nm |
Pipeline | |
OoOE | Yes |
Speculative | Yes |
Reg Renaming | Yes |
Succession | |
Neoverse V1 (codename Zeus) is a high-performance ARM microarchitecture designed by ARM Holdings for the high-performance computing market. This microarchitecture is designed as a synthesizable IP core and is sold to other semiconductor companies to be implemented in their own chips.
Contents
History
Definition for Zeus started in 2016.
Release Dates
Zeus was launched on April 27, 2021.
Architecture
The Neoverse V1 is an off-shoot of the Neoverse N1 that has been primarily optimized for the highest performance possible.
Key changes from Neoverse N1
- Architecture
- Higher performance
- Arm self-reported around 48% performance on average of SPEC CPU2006/SPEC CPU2017 at iso-power/process
- Front-end
- Branch-prediction
- Improved accuracy
- 6x nano BTB (96 entries, up from 16)
- 1.33x larger BTB (8K-entry, up from 6K-entry)
- Up to 90% reduction in branch mispredictions (for BTB misses)
- Up to 50% reduction in front-end stalls
- Faster fetch recovery
- 2x Runahead bandwidth (2x32B/cycle, up from 32B/cycle)
- 2x code regions that are able to be tracked in the front-end
- New L0 MOP cache
- 2x wider decoded instruction fetch (8 instrs/cycle, up from 4 traditional)
- 1 stage shorter
- Decode
- 1.25 wider decode (5-way decode, up from 4)
- Execution engine
- 2x ReOrder Buffer size (256-entry, up from 128)
- New compression capabilities
- Additional instruction fusion cases
- 2x-wide vector units (2x256b/clk, up from 2x128
- 2x256b/cycle SVE or 4x128b/cycle Neon/FP
- 2x ReOrder Buffer size (256-entry, up from 128)
- Memory Subsystem
This list is incomplete; you can help by expanding it.
Block Diagram
Typical SoC
The Neoverse N1 is also expected to be integrated along with Neoverse E1 high-efficiency cores and possibly other custom IP blocks.
Individual Core
Memory Hierarchy
The Neoverse N1 has a private L1I, L1D, and L2 cache.
- Cache
- L1I Cache
- 64 KiB, 4-way set associative
- 64-byte cache lines
- SECDED ECC
- Write-back
- L1D Cache
- 64 KiB, 4-way set associative
- 64-byte cache lines
- 4-cycle fastest load-to-use latency
- SECDED ECC
- Write-back
- L2 Cache
- 512 KiB OR 1 MiB (4 banks)
- 8-way set associative
- 10 cycle
- 10-cycle fastest load-to-use latency
- ECC protection per 64 bits
- Modified Exclusive Shared Invalid (MESI) coherency
- Strictly inclusive of the L1 data cache & non-inclusive of the L1 instruction cache
- Write-back
- System-level cache (SLC)
- 1 Bank per core duplex
- 2 MiB to 4 MiB, 16-way set associative
- L1I Cache
The Neoverse N1 TLB consists of a dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB).
- TLBs
- ITLB
- 4 KiB, 16 KiB, 64 KiB, 2 MiB, and 32 MiB page sizes
- 48-entry fully associative
- DTLB
- 48-entry fully associative
- 4 KiB, 16 KiB, 64 KiB, 2 MiB, and 512 MiB page sizes
- STLB
- 1280-entry 5-way set associative
- ITLB
Overview
Formerly known as Zeus, the Neoverse V1 is an offshoot of the initial Neoverse N1 microarchitecture design that has been heavily modified and optimized for performance. Historically, Arm has imposed major power and area restrictions on their core in order to meet the market requirement for their client devices. With the Neoverse V1, those requirements were finally relaxed in order to extract additional performance. In addition to general integer and floating-point performance, Zeus has also been optimized for HPC workloads with wider vector execution as well as Scalable Vector Extension (SVE) support.
The Neoverse V1 is designed for the absolute highest performance such as that found in high-performance computing systems. The Neoverse V1 is an 11-stage out-of-order core with private L1 and L2 caches as well as an ultra-wide front-end and back-end. The core itself is intended to leverage Arm's Coherent Mesh Network 700 (CMN-700) interconnect to enable scaling from as little as a quad-core design to as much as 256 cores and from a dual DDR channel all the way up to twelve channels, depending on the kind of workload being addressed. Extending the base design is a framework for multiprocessing support as well as chiplets support which can be used by companies who are looking to improve yield and manufacturability with large SoC designs. The V1 is also designed to work seamlessly with the Neoverse E1 which was introduced at the same time as N1 but is optimized for high throughput multithreaded workloads as well as other types of accelerators that may be integrated on the mesh network.
Core
All Neoverse V1 Processors
Die
- Die plot (core + 1 MiB L2 cache)
Bibliography
- Arm Neoverse Tech Day, 2021
codename | Neoverse V1 + |
designer | ARM Holdings + |
first launched | 2021 + |
full page name | arm holdings/microarchitectures/neoverse v1 + |
instance of | microarchitecture + |
manufacturer | TSMC + |
microarchitecture type | CPU + |
name | Neoverse V1 + |
process | 7 nm (0.007 μm, 7.0e-6 mm) + |