Edit Values | |
Cortex-X1 µarch | |
General Info | |
Arch Type | CPU |
Designer | ARM Holdings |
Manufacturer | TSMC |
Introduction | May 26, 2020 |
Process | 10 nm, 7 nm, 5 nm |
Core Configs | 1, 2, 4, 6, 8 |
Pipeline | |
Type | Superscalar, Pipelined |
OoOE | Yes |
Speculative | Yes |
Reg Renaming | Yes |
Stages | 13 |
Decode | 5-way |
Instructions | |
ISA | ARMv8.2 |
Extensions | FPU, NEON |
Cache | |
L1I Cache | 64 KiB/core 4-way set associative |
L1D Cache | 64 KiB/core 4-way set associative |
L2 Cache | 1 MiB/core 8-way set associative |
L3 Cache | 8 MiB/Cluster 16-way set associative |
Succession | |
Contemporary | |
Cortex-A78 |
Cortex-X1 (codename Hera) is a performance-enhanced version of the Cortex-A78, a low-power high-performance ARM microarchitecture designed by Arm for the mobile market. The Cortex-X1 was designed by Arm's Austin, Texas team. This microarchitecture is designed as a synthesizable IP core and is licensed to other semiconductor companies to be implemented in their own chips.
The Cortex-X1, which implements the ARMv8.2 ISA, is a higher performance core that is designed to be combined with the Cortex-A78 in a DynamIQ big.LITTLE combination in order to provide even higher single-thread performance. This core, along with the Cortex-A78, are often combined with a number of low(er) power cores (e.g. Cortex-A55) in order to achieve better energy/performance.
Contents
Process Technology
Although the Cortex-X1 may be fabricated on various process nodes, it has been primarily designed for the 10 nm, 7 nm, and 5 nm process nodes with performance, power and area numbers mainly targeting the 5-nanometer node.
Compiler support
This section is empty; you can help add the missing info by editing this page. |
Architecture
Key changes from Cortex-A78
- See also: Cortex-A78 § Key changes from Cortex-A77
The Cortex-X1 is a custom performance-enhanced variant of the A78, therefore it inherits most of the changes that were done to the A78 from the A77.
- Higher performance (See § Performance claims)
- Arm self-reported around 30% performance over the A77 (compared to +20% with the A78)
- 2.0x (machine learning) performance
- Arm self-reported around 30% performance over the A77 (compared to +20% with the A78)
- Silicon area
- 15% more silicon area (on N5)
- Front-end
- 1.25x wider decode (5-way, up from 4-way)
- 1.33x wider decoded cache bandwidth (8 MOPs/cycle, up from 6 MOPs/cycle)
- Memory subsystem
This list is incomplete; you can help by expanding it.
Performance claims
Compared to the Cortex-A77, the X1 is said to be 30% faster in peak performance on SPEC CPU2006. The improvement comes from both architectural improvements and frequency improvement with the help of process improvement moving from the 7 nm to the 5 nm node.
Performance | |
---|---|
Cortex-A77 | Cortex-X1 |
1.0x | 1.3x |
2,600 MHz | 3,000 MHz |
7 nm (N7) | 5 nm(N5) |
|
Arm says that, at ISO-process and frequency, the Cortex-X1 achieves 22% higher integer performance (SPEC CPU2006) over the Cortex-A78 and 30% higher integer performance over the Cortex-A77. Likewise, due to the doubling of the number of NEON units, the Cortex-X1 can achieve twice the ML performance as both the A77 and A78.
Performance @ ISO-process/frequency | |
---|---|
Cortex-A77 | Cortex-X1 |
1.0x | 1.3x (integer performance) |
1.0x | 2.0x (ML performance) |
3,000 MHz | 3,000 MHz |
7 nm (N7) | 5 nm(N5) |
|
Overview
The Cortex-X1 is a high-performance synthesizable core designed by Arm. It is delivered as Register Transfer Level (RTL) description in Verilog and is designed to be integrated into customer's SoCs. This core supports the ARMv8.2 extension as well as a number of other partial extensions. This is the first from Arm's Cortex-X custom program. The X1 is a performance-enhanced version of the Cortex-A78, it therefore uses the A78 as the starting point for its modifications.
The Cortex-X1 is built on top of the Cortex-A78, but enhances it in order to extract additional performance, albeit at a slight reduction in power efficiency and area. To that end, whereas the Hercules was said to provide a 20% sustain performance uplift over the A77, the Cortex-X1 offers up to 30% peak performance. In other words, whereas the A78 is designed for high sustained performance at high performance-efficiency, the Cortex-X1 is designed to supplement it with higher peak performance while relaxing the power and area constraints.
The Cortex-X1 is a fatter version of the A78, relying on bigger buffers and a large out-of-order window in order to extract further performance. To that end, the X1 features a 5-way decode, twice as many NEON units, and larger overall buffers in order to allow for a bigger out-of-order window with more in-flight operations. The X1 enlarges the pipeline while still retaining the higher frequency which was introduced in the A77. The X1 is intended to be combined with a number of A78 cores in DynamIQ Shared Unit (DSU) cluster along with possibly with other lower-power cores such as the Cortex-A55 to more efficiently support a wide range of workloads at various performance and power levels beyond what's possible with any one core.
DSU Cluster
The Cortex-X1 provides additional peak performance beyond what the Cortex-A78 can offer. Therefore the X1 is designed to be combined with a number of Cortex-A78 cores in DynamIQ Shared Unit (DSU) cluster in order to provide a balance in both power and performance. Compared to a quad-core A77 cluster on 7 nm, a quad-core A78 cluster provides +20% sustained performance improvement while reducing the silicon area by about 15%. When replacing one of those big A78 cores with a single Cortex-X1 core, the cluster can now provide a peak single-thread performance of up to 30% versus the A77 at the cost of 15% additional silicon area (or neural area-wise from N7 to N5).
codename | Cortex-X1 + |
core count | 1 +, 2 +, 4 +, 6 + and 8 + |
designer | ARM Holdings + |
first launched | May 26, 2020 + |
full page name | arm holdings/microarchitectures/cortex-x1 + |
instance of | microarchitecture + |
instruction set architecture | ARMv8.2 + |
manufacturer | TSMC + |
microarchitecture type | CPU + |
name | Cortex-X1 + |
pipeline stages | 13 + |
process | 10 nm (0.01 μm, 1.0e-5 mm) +, 7 nm (0.007 μm, 7.0e-6 mm) + and 5 nm (0.005 μm, 5.0e-6 mm) + |