From WikiChip
Editing arm holdings/microarchitectures/cortex-a76
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 6: | Line 6: | ||
|manufacturer=TSMC | |manufacturer=TSMC | ||
|introduction=May 31, 2018 | |introduction=May 31, 2018 | ||
− | |process | + | |process=7 nm |
− | |||
− | |||
|cores=1 | |cores=1 | ||
|cores 2=2 | |cores 2=2 | ||
|cores 3=4 | |cores 3=4 | ||
− | |||
− | |||
− | |||
− | |||
|oooe=Yes | |oooe=Yes | ||
|speculative=Yes | |speculative=Yes | ||
Line 31: | Line 25: | ||
|l1d per=core | |l1d per=core | ||
|l1d desc=4-way set associative | |l1d desc=4-way set associative | ||
− | |l2= | + | |l2=256-512 KiB |
|l2 per=core | |l2 per=core | ||
|l2 desc=8-way set associative | |l2 desc=8-way set associative | ||
|l3=0-4 MiB | |l3=0-4 MiB | ||
|l3 per=Cluster | |l3 per=Cluster | ||
− | |||
|predecessor=Cortex-A75 | |predecessor=Cortex-A75 | ||
|predecessor link=arm holdings/microarchitectures/cortex-a75 | |predecessor link=arm holdings/microarchitectures/cortex-a75 | ||
− | |successor= | + | |successor=Deimos |
− | |successor link=arm holdings/microarchitectures/ | + | |successor link=arm holdings/microarchitectures/deimos |
+ | |contemporary=Ares | ||
+ | |contemporary link=arm holdings/microarchitectures/ares | ||
}} | }} | ||
− | '''Cortex-A76''' (codename '''Enyo''') is the successor to the {{armh|Cortex-A75|l=arch}}, a low-power high-performance [[ARM]] [[microarchitecture]] designed by [[ARM Holdings]] for the mobile market | + | '''Cortex-A76''' (codename '''Enyo''') is the successor to the {{armh|Cortex-A75|l=arch}}, a low-power high-performance [[ARM]] [[microarchitecture]] designed by [[ARM Holdings]] for the mobile market. This microarchitecture is designed as a synthesizable [[IP core]] and is sold to other semiconductor companies to be implemented in their own chips. The Cortex-A76, which implemented the {{arm|ARMv8.2}} ISA, is the a performant core which is often combined with a number of lower power cores (e.g. {{\\|Cortex-A55}}) in a {{armh|DynamIQ big.LITTLE}} configuration to achieve better energy/performance. |
== History == | == History == | ||
− | + | Development of the Cortex-A76 started in 2013. [[Arm]] formally announced Enyo during Arm Tech Day on May 31 2018. | |
− | Development of the Cortex-A76 started in 2013. [[Arm]] formally announced Enyo during | ||
== Process Technology == | == Process Technology == | ||
Though the Cortex-A76 may be fabricated on various different [[process nodes]], it has been primarily designed for the [[12 nm]], [[7 nm]], and [[5 nm]] process nodes. | Though the Cortex-A76 may be fabricated on various different [[process nodes]], it has been primarily designed for the [[12 nm]], [[7 nm]], and [[5 nm]] process nodes. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Architecture == | == Architecture == | ||
=== Key changes from {{\\|Cortex-A75}} === | === Key changes from {{\\|Cortex-A75}} === | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Block Diagram === | === Block Diagram === | ||
==== Typical SoC ==== | ==== Typical SoC ==== | ||
Line 99: | Line 59: | ||
*** 64 KiB, 4-way set associative | *** 64 KiB, 4-way set associative | ||
*** 64-byte cache lines | *** 64-byte cache lines | ||
− | *** | + | *** optional parity |
− | |||
** L1D Cache | ** L1D Cache | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
The A76 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB). | The A76 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB). | ||
Line 127: | Line 68: | ||
*** 4 KiB, 16 KiB, 64 KiB, 2 MiB, and 32 MiB page sizes | *** 4 KiB, 16 KiB, 64 KiB, 2 MiB, and 32 MiB page sizes | ||
*** 48-entry fully associative | *** 48-entry fully associative | ||
− | |||
− | |||
− | |||
− | |||
− | |||
== Overview == | == Overview == | ||
− | The Cortex-A76 is a high-performance synthesizable core designed by [[Arm]] as the successor to the {{\\|Cortex-A75}}. It is delivered as Register Transfer Level (RTL) description in Verilog and is designed. This core supports the {{arm|ARMv8.2}} extension as well as a number of other partial extensions. The A76 is a 4-way superscalar out-of-order processor with a private level 1 and level 2 caches. It is designed to be implemented inside the [[DynamIQ Shared Unit]] (DSU) cluster along with other cores | + | The Cortex-A76 is a high-performance synthesizable core designed by [[Arm]] as the successor to the {{\\|Cortex-A75}}. It is delivered as Register Transfer Level (RTL) description in Verilog and is designed. This core supports the {{arm|ARMv8.2}} extension as well as a number of other partial extensions. The A76 is a 4-way superscalar out-of-order processor with a private level 1 and level 2 caches. It is designed to be implemented inside the [[DynamIQ Shared Unit]] (DSU) cluster along with other cores (e.g., with [[little cores]] such as the {{\\|Cortex-A55}}) |
== Core == | == Core == | ||
− | + | The Cortex-A76 succeeds the {{\\|Cortex-A75}}. It is designed to take advantage of the [[7 nm]] node in order to deliver up to 35% higher performance and up to 40% lower power (compared to the A75 on the [[10 nm]] node). It's worth noting that the A76 brings higher performance at a slight hit to the area by going wider. On the [[7 nm process]], the Cortex-A76 targets frequencies of 3 GHz and higher. | |
− | The Cortex-A76 succeeds the {{\\|Cortex-A75}}. It is designed to take advantage of the [[7 nm]] node in order to deliver up to | ||
=== Pipeline === | === Pipeline === | ||
− | The Cortex-A76 is a complex, 4-way superscalar out-of-order processor with an 8-issue back end | + | The Cortex-A76 is a complex, 4-way superscalar out-of-order processor with an 8-issue back end. It has a 64 KiB [[level 1]] [[instruction cache]] and a 64 KiB [[level 1]] [[data cache]]along with a private [[level 2 cache]] that is configurable as either 256 KiB (1 bank) or 512 KiB (2 banks) |
==== Front-end ==== | ==== Front-end ==== | ||
− | Each cycle, up to 16 bytes are fetched from the [[L1 instruction cache]]. The instruction fetch works in tandem with the branch predictor in order to ensure the instruction stream is | + | Each cycle, up to 16 bytes are fetched from the [[L1 instruction cache]]. The instruction fetch works in tandem with the branch predictor in order to ensure the instruction stream is ready to be fetched. The Cortex-A76 has a fixed 64 KiB L1I cache. It is 4-way set associative and supports optional parity protection. |
− | + | From the instruction fetch, up to four 32-bit instructions are sent to the decode queue (DQ) each cycle. For narrower 16-bit instructions (i.e., {{arm|Thumb}}), this means up to eight instructions get queued. The A76 features a 4-way decode. Up to four instructions may be decoded into [[macro-operations]] each cycle. | |
− | |||
− | |||
− | |||
− | From the instruction fetch, up to four 32-bit instructions are sent to the decode queue (DQ) each cycle. For narrower 16-bit instructions (i.e., {{arm|Thumb}}), this means up to eight instructions get queued. The A76 features a 4-way decode. | ||
==== Back-end ==== | ==== Back-end ==== | ||
Line 156: | Line 87: | ||
===== Renaming & Allocation ===== | ===== Renaming & Allocation ===== | ||
− | From the front-end, up to four [[macro-operations]] may be sent each cycle to be renamed. The | + | From the front-end, up to four [[macro-operations]] may be sent each cycle to be renamed. The ROB has a capacity of up to 128 instructions in flight. [[Micro-operations]] are broken down into their [[µOP]] constituents and are scheduled for execution. From here, µOPs are sent to the instruction issue which controls when they can be dispatched to the execution pipelines. µOPs are queued in eight independent issue queues (120 entries in total). |
===== Execution Units ===== | ===== Execution Units ===== | ||
The A76 issue is 8-wide, allow for up to eight µOPs to execute each cycle. The execution units can be grouped into three categories: integer, advanced SIMD, and memory. | The A76 issue is 8-wide, allow for up to eight µOPs to execute each cycle. The execution units can be grouped into three categories: integer, advanced SIMD, and memory. | ||
− | There are four pipelines in the integer cluster - three for general math operations and a dedicate branch ALU. All three ports have a simple ALU | + | There are four pipelines in the integer cluster - three for general math operations and a dedicate branch ALU. All three ports have a simple ALU. The third port has support for complex arithmetic (e.g. MAC, DIV). |
− | There are two | + | There are two ASIMD/FP execution pipelines. In the {{\\|Cortex-A75}}, each of the pipelines were 64-bit wide, on the A76, they were doubled to 128-bit. This means each pipeline is capable of 2 double-precision operations, 4 single-precision, 8 half-precision, or 16 8-bit integer operations. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
===== Memory subsystem ===== | ===== Memory subsystem ===== | ||
− | + | {{empty section}} | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Bibliography == | == Bibliography == | ||
* Arm Tech Day, 2018 | * Arm Tech Day, 2018 |
Facts about "Cortex-A76 - Microarchitectures - ARM"
codename | Cortex-A76 + |
core count | 1 +, 2 +, 4 +, 6 + and 8 + |
designer | ARM Holdings + |
first launched | May 31, 2018 + |
full page name | arm holdings/microarchitectures/cortex-a76 + |
instance of | microarchitecture + |
instruction set architecture | ARMv8.2 + |
manufacturer | TSMC + |
microarchitecture type | CPU + |
name | Cortex-A76 + |
pipeline stages | 13 + |
process | 12 nm (0.012 μm, 1.2e-5 mm) +, 7 nm (0.007 μm, 7.0e-6 mm) + and 5 nm (0.005 μm, 5.0e-6 mm) + |