(Cortex-X1) |
(fixed) |
||
| (5 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
| − | {{armh title|Cortex-X1|arch}} | + | {{armh title|Cortex-X1 (Hera)|arch}} |
{{microarchitecture | {{microarchitecture | ||
| − | |atype=CPU | + | | atype = CPU |
| − | |name=Cortex-X1 | + | | name = Cortex-X1 (Hera) |
| − | |designer=ARM Holdings | + | | codename = Cortex-X1 |
| − | |manufacturer=TSMC | + | | core name = '''Cortex-X1''' |
| − | |introduction=May 26, 2020 | + | | designer = ARM Holdings |
| − | |process=10 nm | + | | manufacturer = TSMC |
| − | |process 2=7 nm | + | | introduction = May 26, 2020 |
| − | |process 3=5 nm | + | | process = 10 nm |
| − | |cores=1 | + | | process 2 = 7 nm |
| − | |cores 2=2 | + | | process 3 = 5 nm |
| − | |cores 3=4 | + | | cores = 1 |
| − | |cores 4=6 | + | | cores 2 = 2 |
| − | |cores 5=8 | + | | cores 3 = 4 |
| − | |type=Superscalar | + | | cores 4 = 6 |
| − | |type 2=Pipelined | + | | cores 5 = 8 |
| − | |oooe=Yes | + | | type = Superscalar |
| − | |speculative=Yes | + | | type 2 = Pipelined |
| − | |renaming=Yes | + | | oooe = Yes |
| − | |stages=13 | + | | speculative = Yes |
| − | |decode=5-way | + | | renaming = Yes |
| − | |isa=ARMv8.2 | + | | stages = 13 |
| − | |feature=Hardware virtualization | + | | decode = 5-way |
| − | |extension=FPU | + | | isa = ARMv8.2 |
| − | |extension 2=NEON | + | | feature = Hardware virtualization |
| − | |l1i= | + | | extension = FPU |
| − | |l1i per=core | + | | extension 2 = NEON |
| − | |l1i desc=4-way set associative | + | | l1i = 64 KiB |
| − | |l1d= | + | | l1i per = core |
| − | |l1d per=core | + | | l1i desc = 4-way set associative |
| − | |l1d desc=4-way set associative | + | | l1d = 64 KiB |
| − | |l2= | + | | l1d per = core |
| − | |l2 per=core | + | | l1d desc = 4-way set associative |
| − | |l2 desc=8-way set associative | + | | l2 = 1 MiB |
| − | |l3= | + | | l2 per = core |
| − | |l3 per= | + | | l2 desc = 8-way set associative |
| − | |l3 desc=16-way set associative | + | | l3 = 8 MiB |
| − | |contemporary=Cortex-A78 | + | | l3 per = cluster |
| − | |contemporary link=arm holdings/microarchitectures/cortex-a78 | + | | l3 desc = 16-way set associative |
| + | | successor = '''Cortex-X2''' (Matterhorn-ELP) | ||
| + | | successor link = arm holdings/microarchitectures/cortex-x2 | ||
| + | | successor 2 = '''Cortex-X3''' (Makalu-ELP) | ||
| + | | successor 2 link = arm holdings/microarchitectures/cortex-x3 | ||
| + | | successor 3 = '''Cortex-X4''' (Hunter-ELP) | ||
| + | | successor 3 link = arm holdings/microarchitectures/hunter-elp | ||
| + | | contemporary = '''Cortex-A78''' (Hercules) | ||
| + | | contemporary link = arm holdings/microarchitectures/cortex-a78 | ||
}} | }} | ||
| − | '''Cortex-X1''' (codename | + | '''Cortex-X1''' (codename ''Hera'') is a performance-enhanced version of the {{armh|Cortex-A78|l=arch}} ''(Hercules)'', a low-power high-performance [[ARM]] [[microarchitecture]] designed by [[Arm]] for the mobile market. The Cortex-X1 was designed by Arm's Austin, Texas team. This microarchitecture is designed as a synthesizable [[IP core]] and is licensed to other semiconductor companies to be implemented in their own chips. |
| − | The Cortex-X1, which implements the {{arm|ARMv8.2 | + | The '''Cortex-X1''', which implements the {{arm|ARMv8}}.2 ISA, is a higher performance core that is designed to be combined with the {{\\|Cortex-A78}} in a {{armh|big.LITTLE|DynamIQ big.LITTLE}} combination in order to provide even higher single-thread performance. This core, along with the {{\\|Cortex-A78}}, are often combined with a number of low(er) power cores (e.g. {{\\|Cortex-A55}}) in order to achieve better energy/performance. |
| + | |||
| + | === [[Cortex]]-X === | ||
| + | :;[[ARM]] • [[Cortex]] | ||
| + | {| class="wikitable" style="text-align: center; | ||
| + | |- | ||
| + | ! Year !! Cortex-X Core !! Cortex-A Core | ||
| + | |- | ||
| + | | [[2020]] || {{armh|Cortex-X1|l=arch}} (''{{armh|Hera|l=arch}}'') <br>{{armh|Cortex-X1C|l=arch}} (''{{armh|Hera-C|l=arch}}'') || {{armh|Cortex-A78|l=arch}} (''{{armh|Hercules|l=arch}}'') <!--<br>{{armh|Cortex-A78AE|l=arch}} (''{{armh|Hercules-AE|l=arch}}'')--> <br>{{armh|Cortex-A78C|l=arch}} (''{{armh|Hera Prime|l=arch}}'') | ||
| + | |- | ||
| + | | [[2021]] || {{armh|Cortex-X2|l=arch}} <br>(''{{armh|Matterhorn-ELP|l=arch}}'') || {{armh|Cortex-A710|l=arch}} (''{{armh|Matterhorn|l=arch}}'') <br>{{armh|Cortex-A510|l=arch}} (''{{armh|Klein|l=arch}}'') | ||
| + | |- | ||
| + | | [[2022]] || {{armh|Cortex-X3|l=arch}} (''{{armh|Makalu-ELP|l=arch}}'') || {{armh|Cortex-A715|l=arch}} (''{{armh|Makalu|l=arch}}'') | ||
| + | |- | ||
| + | | [[2023]] || {{armh|Cortex-X4|l=arch}} (''{{armh|Hunter-ELP|l=arch}}'') || {{armh|Cortex-A720|l=arch}} (''{{armh|Hunter|l=arch}}'') <br>{{armh|Cortex-A520|l=arch}} (''{{armh|Hayes|l=arch}}'') | ||
| + | |- | ||
| + | | [[2024]] || <s>{{armh|Cortex-X5|l=arch}} (''{{armh|Chaberton-ELP|l=arch}}'')</s> <br>{{armh|Cortex-X925|l=arch}} (''{{armh|Blackhawk|l=arch}}'') || {{armh|Cortex-A720AE|l=arch}} (''{{armh|Hunter-AE|l=arch}}'') <br>{{armh|Cortex-A725|l=arch}} (''{{armh|Chaberton|l=arch}}'') | ||
| + | |- | ||
| + | | [[2025]] || {{armh|Cortex-X930|l=arch}} (''{{armh|Travis|l=arch}}'') || {{armh|Cortex-A730|l=arch}} (''{{armh|Gelas|l=arch}}'') <br>{{armh|Cortex-A530|l=arch}} (''{{armh|Nevis|l=arch}}'') | ||
| + | |- | ||
| + | |} | ||
== Process Technology == | == Process Technology == | ||
| − | Although the Cortex-X1 may be fabricated on various [[process nodes]], it has been primarily designed for the [[10 nm]], [[7 nm]], and [[5 nm]] process nodes with performance, power and area numbers mainly targeting the [[5-nanometer node]]. | + | Although the Cortex-X1 may be fabricated on various [[process nodes]], it has been primarily designed for the [[10 nm]], [[7 nm]], |
| + | :and [[5 nm]] process nodes with performance, power and area numbers mainly targeting the [[5-nanometer node]]. | ||
| − | == | + | == Architecture == |
| − | |||
| − | |||
=== Key changes from {{\\|Cortex-A78}} === | === Key changes from {{\\|Cortex-A78}} === | ||
{{see also|arm_holdings/microarchitectures/cortex-a78#Key_changes_from_Cortex-A77|l1=Cortex-A78 § Key changes from Cortex-A77}} | {{see also|arm_holdings/microarchitectures/cortex-a78#Key_changes_from_Cortex-A77|l1=Cortex-A78 § Key changes from Cortex-A77}} | ||
| − | The Cortex-X1 is a custom performance-enhanced variant of the {{\\|Cortex- | + | The Cortex-X1 is a custom performance-enhanced variant of the {{\\|Cortex-A78}}, therefore it |
| + | :inherits most of the changes that were done to the {{\\|Cortex-A78}} from the {{\\|Cortex-A77}}. | ||
* Higher performance (See [[#Performance claims|§ Performance claims]]) | * Higher performance (See [[#Performance claims|§ Performance claims]]) | ||
| − | ** [[Arm]] self-reported around 30% performance over the A77 (compared to +20% with the A78) | + | ** [[Arm]] self-reported around 30% performance over the {{\\|Cortex-A77}} <br>(compared to +20% with the {{\\|Cortex-A78}}) |
| − | + | ** 2.0x (machine learning) performance | |
* Silicon area | * Silicon area | ||
** 15% more silicon area (on [[N5]]) | ** 15% more silicon area (on [[N5]]) | ||
* Front-end | * Front-end | ||
** 1.25x wider decode (5-way, up from 4-way) | ** 1.25x wider decode (5-way, up from 4-way) | ||
| − | ** 1.33x wider decoded cache bandwidth (8 MOPs/cycle, up from 6 MOPs/cycle) | + | ** 1.33x wider decoded cache bandwidth <br>(8 MOPs/cycle, up from 6 MOPs/cycle) |
* Memory subsystem | * Memory subsystem | ||
| − | ** Only 64 KiB | + | ** Only 64 KiB L1I cache option (from 32-64 KiB) |
| − | ** Only 64 KiB | + | ** Only 64 KiB L1D cache option (from 32-64 KiB) |
| − | ** Up to 1 MiB | + | ** Up to 1 MiB L2 cache option (from 512 KiB) |
| − | ** Up to 8 MiB | + | ** Up to 8 MiB L3 cache option (from 4 MiB) |
| − | {{ | + | === Comparison === |
| + | |||
| + | :;"Prime" core | ||
| + | {| class="wikitable sortable" cellpadding="3px" style="border: 1px solid black; border-spacing: 0px; width: 100%; text-align:center; | ||
| + | |- | ||
| + | ![[Microarchitecture|Architecture]] | ||
| + | !{{armh|Cortex-A78|l=arch}} | ||
| + | !{{armh|Cortex-X1|l=arch}} | ||
| + | !{{armh|Cortex-X2|l=arch}} | ||
| + | !{{armh|Cortex-X3|l=arch}} | ||
| + | !{{armh|Cortex-X4|l=arch}} | ||
| + | !{{armh|Cortex-X925|l=arch}} | ||
| + | !{{armh|Cortex-X930|l=arch}} | ||
| + | |- | ||
| + | !Code name | ||
| + | |''{{armh|Hercules|l=arch}}'' | ||
| + | |''Hera'' | ||
| + | |''{{armh|Matterhorn|l=arch}}-ELP'' | ||
| + | |''{{armh|Makalu|l=arch}}-ELP'' | ||
| + | |''{{armh|Hunter-ELP|l=arch}}'' | ||
| + | |''Blackhawk'' | ||
| + | |''Travis'' | ||
| + | |- | ||
| + | !ISA | ||
| + | | colspan="2" |[[ARMv8]].2-A | ||
| + | | colspan="2" |ARMv9.0-A | ||
| + | | colspan="3" |ARMv9.2-A | ||
| + | |- | ||
| + | !Peak clock speed | ||
| + | | colspan="3" |~3.0 GHz | ||
| + | |~3.3 GHz | ||
| + | |~3.4 GHz | ||
| + | |~3.8 GHz | ||
| + | |~4.2 GHz | ||
| + | |- | ||
| + | !Max in-flight | ||
| + | |2x 160 | ||
| + | |2x 224 | ||
| + | |2x 288 | ||
| + | |2x 320 | ||
| + | |2x 384 | ||
| + | |2x 768 | ||
| + | | | ||
| + | |- | ||
| + | !L0 (Mops entries) | ||
| + | |1536 <ref>{{cite book |title=Arm's New Cortex-A78 and Cortex-X1 Microarchitectures: An Efficiency and Performance Divergence |url=https://www.anandtech.com/show/15813/arm-cortex-a78-cortex-x1-cpu-ip-diverging }}</ref> | ||
| + | | colspan="2" |3072 | ||
| + | |1536 | ||
| + | |0 | ||
| + | | | ||
| + | | | ||
| + | |- | ||
| + | !L1-I + L1-D | ||
| + | |32+32 KiB | ||
| + | | colspan="2" |64+64 KiB | ||
| + | | colspan="2" |64+64 KiB | ||
| + | |64+64 KiB | ||
| + | | | ||
| + | |- | ||
| + | !L2 | ||
| + | |128–512 KiB | ||
| + | | colspan="3" |0.25–1 MiB | ||
| + | |0.5–2 MiB | ||
| + | |2–3 MiB | ||
| + | | | ||
| + | |- | ||
| + | !L3 | ||
| + | | colspan="2" |0–8 MiB <ref>{{cite book |last=Schor |first=David |date=2020-05-26 |title=Arm Cortex-X1: The First From The Cortex-X Custom Program |url=https://fuse.wikichip.org/news/3543/arm-cortex-x1-the-first-from-the-cortex-x-custom-program/ |website=WikiChip Fuse }}</ref> | ||
| + | | colspan="2" |0–16 MiB | ||
| + | | colspan="2" |0–32 MiB | ||
| + | | | ||
| + | |- | ||
| + | !Decode width | ||
| + | |4 | ||
| + | | colspan="2" |5 | ||
| + | |6 | ||
| + | |10 <ref>{{cite book |date=2023-05-29 |title=Arm Cortex-X4, A720, and A520: 2024 smartphone CPUs deep dive |url=https://www.androidauthority.com/arm-cortex-x4-explained-3328008/ |website=Android Authority}}</ref> | ||
| + | |10 | ||
| + | | | ||
| + | |- | ||
| + | !Dispatch | ||
| + | |6/cycle | ||
| + | | colspan="3" |8/cycle | ||
| + | | colspan="2" |10/cycle | ||
| + | | | ||
| + | |- | ||
| + | |} | ||
== Performance claims == | == Performance claims == | ||
| − | Compared to the {{\\|Cortex-A77}}, the X1 is said to be 30% faster in peak performance on [[SPEC CPU2006]]. The improvement comes from both architectural improvements and frequency improvement with the help of process improvement moving from the [[N7|7 nm]] to the [[N5|5 nm node]]. | + | *Compared to the {{\\|Cortex-A77}}, the Cortex-X1 is said to be 30% faster in peak performance on [[SPEC CPU2006]]. |
| + | :The improvement comes from both architectural improvements and frequency improvement with the help | ||
| + | :of process improvement moving from the [[N7|7 nm]] to the [[N5|5 nm node]]. | ||
{| class="wikitable" | {| class="wikitable" | ||
| Line 82: | Line 198: | ||
| 1.0x || 1.3x | | 1.0x || 1.3x | ||
|- | |- | ||
| − | | 2 | + | | 2.6 GHz || 3.0 GHz |
|- | |- | ||
| [[N7|7 nm (N7)]] || [[N5|5 nm(N5)]] | | [[N7|7 nm (N7)]] || [[N5|5 nm(N5)]] | ||
|- | |- | ||
| colspan="2" | | | colspan="2" | | ||
| − | * Cortex-X1 1 MiB L2, 8 MiB L3 cache | + | * '''Cortex-X1''' 1 MiB L2, 8 MiB L3 cache |
* {{\\|Cortex-A77}} 512 KiB L2 , 4 MiB L3 cache | * {{\\|Cortex-A77}} 512 KiB L2 , 4 MiB L3 cache | ||
|} | |} | ||
| − | Arm says that, at ISO-process and frequency, the Cortex-X1 achieves 22% higher integer performance ([[SPEC CPU2006]]) over the {{\\|Cortex-A78}} and 30% higher integer performance over the {{\\|Cortex-A77}}. Likewise, due to the doubling of the number of | + | *Arm says that, at ISO-process and frequency, the Cortex-X1 achieves 22% higher integer performance ([[SPEC CPU2006]]) |
| + | :over the {{\\|Cortex-A78}} and 30% higher integer performance over the {{\\|Cortex-A77}}. Likewise, due to the doubling | ||
| + | :of the number of ''NEON'' units, the Cortex-X1 can achieve twice the ML performance as both the {{\\|Cortex-A77|A77}} and {{\\|Cortex-A78|A78}}. | ||
{| class="wikitable" | {| class="wikitable" | ||
| Line 103: | Line 221: | ||
| 1.0x || 2.0x (ML performance) | | 1.0x || 2.0x (ML performance) | ||
|- | |- | ||
| − | | | + | | 3.0 GHz || 3.0 GHz |
|- | |- | ||
| [[N7|7 nm (N7)]] || [[N5|5 nm(N5)]] | | [[N7|7 nm (N7)]] || [[N5|5 nm(N5)]] | ||
|- | |- | ||
| colspan="2" | | | colspan="2" | | ||
| − | * Cortex-X1 1 MiB L2, 8 MiB L3 cache | + | * '''Cortex-X1''' 1 MiB L2, 8 MiB L3 cache |
* {{\\|Cortex-A77}} 512 KiB L2 , 4 MiB L3 cache | * {{\\|Cortex-A77}} 512 KiB L2 , 4 MiB L3 cache | ||
|} | |} | ||
== Overview == | == Overview == | ||
| − | The Cortex-X1 is a high-performance | + | *The Cortex-X1 is a high-performance synthesizable core designed by [[Arm]]. It is delivered as Register |
| + | :Transfer Level (RTL) description in Verilog and is designed to be integrated into customer's SoCs. | ||
| − | + | *This core supports the {{arm|ARMv8}}.2 extension as well as a number of other partial extensions. | |
| + | :This is the first from [[Arm]]'s [[Cortex]]-X custom program. The X1 is a performance-enhanced | ||
| + | :version of the {{\\|Cortex-A78|A78}}, it therefore uses the {{\\|Cortex-A78|A78}} as the starting point for its modifications. | ||
| − | The Cortex-X1 is a fatter version of the {{\\|Cortex- | + | *The Cortex-X1 is built on top of the {{\\|Cortex-A78}}, but enhances it in order to extract additional performance, |
| + | :albeit at a slight reduction in power efficiency and area. To that end, whereas the {{\\|Hercules}} was said to provide | ||
| + | :a 20% sustain performance uplift over the {{\\|Cortex-A77}}, the Cortex-X1 offers up to 30% peak performance. | ||
| + | *In other words, whereas the {{\\|Cortex-A78}} is designed for high sustained performance at high performance-efficiency, | ||
| + | :the Cortex-X1 is designed to supplement it with higher peak performance while relaxing the power and area constraints. | ||
| + | *The Cortex-X1 is a fatter version of the {{\\|Cortex-A78}}, relying on bigger buffers and a large out-of-order window | ||
| + | :in order to extract further performance. To that end, the X1 features a 5-way decode, twice as many NEON units, | ||
| + | :and larger overall buffers in order to allow for a bigger out-of-order window with more in-flight operations. | ||
| + | *The Cortex-X1 enlarges the pipeline while still retaining the higher frequency which was introduced in the {{\\|Cortex-A77}}. | ||
| + | *The Cortex-X1 is intended to be combined with a number of {{\\|Cortex-A78}} cores in ''DynamIQ Shared Unit'' (DSU) | ||
| + | :cluster along with possibly with other lower-power cores such as the {{\\|Cortex-A55}} to more efficiently support | ||
| + | :a wide range of workloads at various performance and power levels beyond what's possible with any one core. | ||
=== DSU Cluster === | === DSU Cluster === | ||
| − | The Cortex-X1 provides additional peak performance beyond what the {{\\|Cortex-A78}} can offer. Therefore the X1 is designed to be combined with a number of Cortex-A78 cores in | + | *The Cortex-X1 provides additional peak performance beyond what the {{\\|Cortex-A78}} can offer. |
| + | :Therefore the X1 is designed to be combined with a number of Cortex-A78 cores in ''DynamIQ | ||
| + | :Shared Unit'' (DSU) cluster in order to provide a balance in both power and performance. | ||
| + | |||
| + | *Compared to a quad-core {{\\|Cortex-A77}} cluster on [[N7|7 nm]], a quad-core {{\\|Cortex-A78}} cluster provides | ||
| + | :+20% sustained performance improvement while reducing the silicon area by about 15%. | ||
| + | |||
| + | *When replacing one of those [[big core|big]] {{\\|Cortex-A78}} cores with a single Cortex-X1 core, the cluster | ||
| + | :can now provide a peak single-thread performance of up to 30% versus the {{\\|Cortex-A77}} | ||
| + | :at the cost of 15% additional silicon area (or neural area-wise from [[N7]] to [[N5]]). | ||
| + | |||
| + | == References == | ||
Latest revision as of 19:43, 15 April 2025
| Edit Values | |
| Cortex-X1 (Hera) µarch | |
| General Info | |
| Arch Type | CPU |
| Designer | ARM Holdings |
| Manufacturer | TSMC |
| Introduction | May 26, 2020 |
| Process | 10 nm, 7 nm, 5 nm |
| Core Configs | 1, 2, 4, 6, 8 |
| Pipeline | |
| Type | Superscalar, Pipelined |
| OoOE | Yes |
| Speculative | Yes |
| Reg Renaming | Yes |
| Stages | 13 |
| Decode | 5-way |
| Instructions | |
| ISA | ARMv8.2 |
| Extensions | FPU, NEON |
| Cache | |
| L1I Cache | 64 KiB/core 4-way set associative |
| L1D Cache | 64 KiB/core 4-way set associative |
| L2 Cache | 1 MiB/core 8-way set associative |
| L3 Cache | 8 MiB/cluster 16-way set associative |
| Cores | |
| Core Names | Cortex-X1 |
| Succession | |
| Contemporary | |
| Cortex-A78 (Hercules) | |
Cortex-X1 (codename Hera) is a performance-enhanced version of the Cortex-A78 (Hercules), a low-power high-performance ARM microarchitecture designed by Arm for the mobile market. The Cortex-X1 was designed by Arm's Austin, Texas team. This microarchitecture is designed as a synthesizable IP core and is licensed to other semiconductor companies to be implemented in their own chips.
The Cortex-X1, which implements the ARMv8.2 ISA, is a higher performance core that is designed to be combined with the Cortex-A78 in a DynamIQ big.LITTLE combination in order to provide even higher single-thread performance. This core, along with the Cortex-A78, are often combined with a number of low(er) power cores (e.g. Cortex-A55) in order to achieve better energy/performance.
Contents
Cortex-X[edit]
| Year | Cortex-X Core | Cortex-A Core |
|---|---|---|
| 2020 | Cortex-X1 (Hera) Cortex-X1C (Hera-C) |
Cortex-A78 (Hercules) Cortex-A78C (Hera Prime) |
| 2021 | Cortex-X2 (Matterhorn-ELP) |
Cortex-A710 (Matterhorn) Cortex-A510 (Klein) |
| 2022 | Cortex-X3 (Makalu-ELP) | Cortex-A715 (Makalu) |
| 2023 | Cortex-X4 (Hunter-ELP) | Cortex-A720 (Hunter) Cortex-A520 (Hayes) |
| 2024 | Cortex-X925 (Blackhawk) |
Cortex-A720AE (Hunter-AE) Cortex-A725 (Chaberton) |
| 2025 | Cortex-X930 (Travis) | Cortex-A730 (Gelas) Cortex-A530 (Nevis) |
Process Technology[edit]
Although the Cortex-X1 may be fabricated on various process nodes, it has been primarily designed for the 10 nm, 7 nm,
- and 5 nm process nodes with performance, power and area numbers mainly targeting the 5-nanometer node.
Architecture[edit]
Key changes from Cortex-A78[edit]
- See also: Cortex-A78 § Key changes from Cortex-A77
The Cortex-X1 is a custom performance-enhanced variant of the Cortex-A78, therefore it
- inherits most of the changes that were done to the Cortex-A78 from the Cortex-A77.
- Higher performance (See § Performance claims)
- Arm self-reported around 30% performance over the Cortex-A77
(compared to +20% with the Cortex-A78) - 2.0x (machine learning) performance
- Arm self-reported around 30% performance over the Cortex-A77
- Silicon area
- 15% more silicon area (on N5)
- Front-end
- 1.25x wider decode (5-way, up from 4-way)
- 1.33x wider decoded cache bandwidth
(8 MOPs/cycle, up from 6 MOPs/cycle)
- Memory subsystem
- Only 64 KiB L1I cache option (from 32-64 KiB)
- Only 64 KiB L1D cache option (from 32-64 KiB)
- Up to 1 MiB L2 cache option (from 512 KiB)
- Up to 8 MiB L3 cache option (from 4 MiB)
Comparison[edit]
- "Prime" core
| Architecture | Cortex-A78 | Cortex-X1 | Cortex-X2 | Cortex-X3 | Cortex-X4 | Cortex-X925 | Cortex-X930 |
|---|---|---|---|---|---|---|---|
| Code name | Hercules | Hera | Matterhorn-ELP | Makalu-ELP | Hunter-ELP | Blackhawk | Travis |
| ISA | ARMv8.2-A | ARMv9.0-A | ARMv9.2-A | ||||
| Peak clock speed | ~3.0 GHz | ~3.3 GHz | ~3.4 GHz | ~3.8 GHz | ~4.2 GHz | ||
| Max in-flight | 2x 160 | 2x 224 | 2x 288 | 2x 320 | 2x 384 | 2x 768 | |
| L0 (Mops entries) | 1536 [1] | 3072 | 1536 | 0 | |||
| L1-I + L1-D | 32+32 KiB | 64+64 KiB | 64+64 KiB | 64+64 KiB | |||
| L2 | 128–512 KiB | 0.25–1 MiB | 0.5–2 MiB | 2–3 MiB | |||
| L3 | 0–8 MiB [2] | 0–16 MiB | 0–32 MiB | ||||
| Decode width | 4 | 5 | 6 | 10 [3] | 10 | ||
| Dispatch | 6/cycle | 8/cycle | 10/cycle | ||||
Performance claims[edit]
- Compared to the Cortex-A77, the Cortex-X1 is said to be 30% faster in peak performance on SPEC CPU2006.
- The improvement comes from both architectural improvements and frequency improvement with the help
- of process improvement moving from the 7 nm to the 5 nm node.
| Performance | |
|---|---|
| Cortex-A77 | Cortex-X1 |
| 1.0x | 1.3x |
| 2.6 GHz | 3.0 GHz |
| 7 nm (N7) | 5 nm(N5) |
| |
- Arm says that, at ISO-process and frequency, the Cortex-X1 achieves 22% higher integer performance (SPEC CPU2006)
- over the Cortex-A78 and 30% higher integer performance over the Cortex-A77. Likewise, due to the doubling
- of the number of NEON units, the Cortex-X1 can achieve twice the ML performance as both the A77 and A78.
| Performance @ ISO-process/frequency | |
|---|---|
| Cortex-A77 | Cortex-X1 |
| 1.0x | 1.3x (integer performance) |
| 1.0x | 2.0x (ML performance) |
| 3.0 GHz | 3.0 GHz |
| 7 nm (N7) | 5 nm(N5) |
| |
Overview[edit]
- The Cortex-X1 is a high-performance synthesizable core designed by Arm. It is delivered as Register
- Transfer Level (RTL) description in Verilog and is designed to be integrated into customer's SoCs.
- This core supports the ARMv8.2 extension as well as a number of other partial extensions.
- This is the first from Arm's Cortex-X custom program. The X1 is a performance-enhanced
- version of the A78, it therefore uses the A78 as the starting point for its modifications.
- The Cortex-X1 is built on top of the Cortex-A78, but enhances it in order to extract additional performance,
- albeit at a slight reduction in power efficiency and area. To that end, whereas the Hercules was said to provide
- a 20% sustain performance uplift over the Cortex-A77, the Cortex-X1 offers up to 30% peak performance.
- In other words, whereas the Cortex-A78 is designed for high sustained performance at high performance-efficiency,
- the Cortex-X1 is designed to supplement it with higher peak performance while relaxing the power and area constraints.
- The Cortex-X1 is a fatter version of the Cortex-A78, relying on bigger buffers and a large out-of-order window
- in order to extract further performance. To that end, the X1 features a 5-way decode, twice as many NEON units,
- and larger overall buffers in order to allow for a bigger out-of-order window with more in-flight operations.
- The Cortex-X1 enlarges the pipeline while still retaining the higher frequency which was introduced in the Cortex-A77.
- The Cortex-X1 is intended to be combined with a number of Cortex-A78 cores in DynamIQ Shared Unit (DSU)
- cluster along with possibly with other lower-power cores such as the Cortex-A55 to more efficiently support
- a wide range of workloads at various performance and power levels beyond what's possible with any one core.
DSU Cluster[edit]
- The Cortex-X1 provides additional peak performance beyond what the Cortex-A78 can offer.
- Therefore the X1 is designed to be combined with a number of Cortex-A78 cores in DynamIQ
- Shared Unit (DSU) cluster in order to provide a balance in both power and performance.
- Compared to a quad-core Cortex-A77 cluster on 7 nm, a quad-core Cortex-A78 cluster provides
- +20% sustained performance improvement while reducing the silicon area by about 15%.
- When replacing one of those big Cortex-A78 cores with a single Cortex-X1 core, the cluster
- can now provide a peak single-thread performance of up to 30% versus the Cortex-A77
- at the cost of 15% additional silicon area (or neural area-wise from N7 to N5).
References[edit]
| codename | Cortex-X1 + |
| core count | 1 +, 2 +, 4 +, 6 + and 8 + |
| designer | ARM Holdings + |
| first launched | May 26, 2020 + |
| full page name | arm holdings/microarchitectures/cortex-x1 + |
| instance of | microarchitecture + |
| instruction set architecture | ARMv8.2 + |
| manufacturer | TSMC + |
| microarchitecture type | CPU + |
| name | Cortex-X1 + |
| pipeline stages | 13 + |
| process | 10 nm (0.01 μm, 1.0e-5 mm) +, 7 nm (0.007 μm, 7.0e-6 mm) + and 5 nm (0.005 μm, 5.0e-6 mm) + |