From WikiChip
Editing arm holdings/microarchitectures/cortex-a76

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 148: Line 148:
 
Keeping the instruction stream feed is the task of the branch prediction unit. The branch prediction unit on the A76 is decoupled from the instruction fetch, allowing it to run ahead and in parallel with the instruction fetch to hide branch prediction latency. To that end, it now operates on 32-byte instruction windows, twice the fetch size. The main [[branch target buffer]] on the A76 is 6K-entries deep. The BPU comprises three stages in order to reduce latency with a 64-entry micro-BTB and a smaller 16-entry BTB.
 
Keeping the instruction stream feed is the task of the branch prediction unit. The branch prediction unit on the A76 is decoupled from the instruction fetch, allowing it to run ahead and in parallel with the instruction fetch to hide branch prediction latency. To that end, it now operates on 32-byte instruction windows, twice the fetch size. The main [[branch target buffer]] on the A76 is 6K-entries deep. The BPU comprises three stages in order to reduce latency with a 64-entry micro-BTB and a smaller 16-entry BTB.
  
The Cortex-A76 has a fixed 64 KiB L1I cache. It is [[virtually indexed, physically tagged]] (VIPT), which behaves as a [[physically indexed, physically tagged]] (PIPT) 4-way set-associative cache. The L1I$ supports optional parity protection and implements a [[pseudo-LRU]] [[cache replacement]] policy. The instruction cache has a 256-bit read interface from the L2 cache. Each cycle up to 32 bytes may be transferred to the L1I cache from the shared L2 cache.
+
The Cortex-A76 has a fixed 64 KiB L1I cache. It is [[Virtually Indexed, Physically Tagged]] (VIPT), which behaves as a [[Physically Indexed, Physically Tagged]] (PIPT) 4-way set-associative cache. The L1I$ supports optional parity protection and implements a [[pseudo-LRU]] [[cache replacement]] policy. The instruction cache has a 256-bit read interface from the L2 cache. Each cycle up to 32 bytes may be transferred to the L1I cache from the shared L2 cache.
  
 
From the instruction fetch, up to four 32-bit instructions are sent to the decode queue (DQ) each cycle. For narrower 16-bit instructions (i.e., {{arm|Thumb}}), this means up to eight instructions get queued. The A76 features a 4-way decode. Each cycle, up to four instructions may be decoded into a relatively semi-complex [[macro-operations]] (MOPs). There are on average 6% more MOPs than instructions. In total two cycles are involved in this operation - one for alignment and one for decode.
 
From the instruction fetch, up to four 32-bit instructions are sent to the decode queue (DQ) each cycle. For narrower 16-bit instructions (i.e., {{arm|Thumb}}), this means up to eight instructions get queued. The A76 features a 4-way decode. Each cycle, up to four instructions may be decoded into a relatively semi-complex [[macro-operations]] (MOPs). There are on average 6% more MOPs than instructions. In total two cycles are involved in this operation - one for alignment and one for decode.

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameCortex-A76 +
core count1 +, 2 +, 4 +, 6 + and 8 +
designerARM Holdings +
first launchedMay 31, 2018 +
full page namearm holdings/microarchitectures/cortex-a76 +
instance ofmicroarchitecture +
instruction set architectureARMv8.2 +
manufacturerTSMC +
microarchitecture typeCPU +
nameCortex-A76 +
pipeline stages13 +
process12 nm (0.012 μm, 1.2e-5 mm) +, 7 nm (0.007 μm, 7.0e-6 mm) + and 5 nm (0.005 μm, 5.0e-6 mm) +