From WikiChip
Neoverse V1 - Microarchitectures - ARM
< arm holdings

Edit Values
Neoverse V1 µarch
General Info
Arch TypeCPU
DesignerARM Holdings
ManufacturerTSMC
Introduction2021
Process7 nm
Pipeline
OoOEYes
SpeculativeYes
Reg RenamingYes
Succession

Neoverse V1 (codename Zeus) is a high-performance ARM microarchitecture designed by ARM Holdings for the high-performance computing market. This microarchitecture is designed as a synthesizable IP core and is sold to other semiconductor companies to be implemented in their own chips.

History[edit]

Definition for Zeus started in 2016.

Release Dates[edit]

Zeus was launched on April 27, 2021.

Architecture[edit]

The Neoverse V1 is an off-shoot of the Neoverse N1 that has been primarily optimized for the highest performance possible.

Key changes from Neoverse N1[edit]

  • Architecture
  • Higher performance
  • Front-end
    • Branch-prediction
    • Improved accuracy
      • 6x nano BTB (96 entries, up from 16)
      • 1.33x larger BTB (8K-entry, up from 6K-entry)
      • Up to 90% reduction in branch mispredictions (for BTB misses)
      • Up to 50% reduction in front-end stalls
      • Faster fetch recovery
      • 2x Runahead bandwidth (2x32B/cycle, up from 32B/cycle)
      • 2x code regions that are able to be tracked in the front-end
    • New L0 MOP cache
      • 2x wider decoded instruction fetch (8 instrs/cycle, up from 4 traditional)
      • 1 stage shorter
    • Decode
      • 1.25 wider decode (5-way decode, up from 4)
  • Execution engine
    • 2x ReOrder Buffer size (256-entry, up from 128)
      • New compression capabilities
      • Additional instruction fusion cases
    • 2x-wide vector units (2x256b/clk, up from 2x128
      • 2x256b/cycle SVE or 4x128b/cycle Neon/FP
  • Memory Subsystem

This list is incomplete; you can help by expanding it.

Block Diagram[edit]

Typical SoC[edit]

neoverse n1 soc block diagram.svg


The Neoverse N1 is also expected to be integrated along with Neoverse E1 high-efficiency cores and possibly other custom IP blocks.


neoverse e1 n1 soc example.svg

Individual Core[edit]

neoverse v1 block diagram.svg


Memory Hierarchy[edit]

The Neoverse N1 has a private L1I, L1D, and L2 cache.

  • Cache
    • L1I Cache
      • 64 KiB, 4-way set associative
      • 64-byte cache lines
      • SECDED ECC
      • Write-back
    • L1D Cache
      • 64 KiB, 4-way set associative
      • 64-byte cache lines
      • 4-cycle fastest load-to-use latency
      • SECDED ECC
      • Write-back
    • L2 Cache
      • 512 KiB OR 1 MiB (2 banks)
      • 8-way set associative
      • 9-11 cycle
        • 9-cycle fastest load-to-use latency
      • ECC protection per 64 bits
      • Modified Exclusive Shared Invalid (MESI) coherency
      • Strictly inclusive of the L1 data cache & non-inclusive of the L1 instruction cache
      • Write-back
    • System-level cache (SLC)
      • 1 Bank per core duplex
      • 2 MiB to 4 MiB, 16-way set associative

The Neoverse N1 TLB consists of a dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB).

  • TLBs
    • ITLB
      • 4 KiB, 16 KiB, 64 KiB, 2 MiB, and 32 MiB page sizes
      • 48-entry fully associative
    • DTLB
      • 48-entry fully associative
      • 4 KiB, 16 KiB, 64 KiB, 2 MiB, and 512 MiB page sizes
    • STLB
      • 1280-entry 5-way set associative

Overview[edit]

Formerly known as Zeus, the Neoverse V1 is an offshoot of the initial Neoverse N1 microarchitecture design that has been heavily modified and optimized for performance. Historically, Arm has imposed major power and area restrictions on their core in order to meet the market requirement for their client devices. With the Neoverse V1, those requirements were finally relaxed in order to extract additional performance. In addition to general integer and floating-point performance, Zeus has also been optimized for HPC workloads with wider vector execution as well as Scalable Vector Extension (SVE) support.

The Neoverse V1 is designed for the absolute highest performance such as that found in high-performance computing systems. The Neoverse V1 is an 11-stage out-of-order core with private L1 and L2 caches as well as an ultra-wide front-end and back-end. The core itself is intended to leverage Arm's Coherent Mesh Network 700 (CMN-700) interconnect to enable scaling from as little as a quad-core design to as much as 256 cores and from a dual DDR channel all the way up to twelve channels, depending on the kind of workload being addressed. Extending the base design is a framework for multiprocessing support as well as chiplets support which can be used by companies who are looking to improve yield and manufacturability with large SoC designs. The V1 is also designed to work seamlessly with the Neoverse E1 which was introduced at the same time as N1 but is optimized for high throughput multithreaded workloads as well as other types of accelerators that may be integrated on the mesh network.

Core[edit]

All Neoverse V1 Processors[edit]

Die[edit]

  • Die plot (core + 1 MiB L2 cache)
neoverse v1 die.png

Bibliography[edit]

  • Arm Neoverse Tech Day, 2021
codenameNeoverse V1 +
designerARM Holdings +
first launched2021 +
full page namearm holdings/microarchitectures/neoverse v1 +
instance ofmicroarchitecture +
manufacturerTSMC +
microarchitecture typeCPU +
nameNeoverse V1 +
process7 nm (0.007 μm, 7.0e-6 mm) +