From WikiChip
Editing cavium/microarchitectures/vulcan

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 173: Line 173:
 
=== Front-end ===
 
=== Front-end ===
 
Vulcan's front-end is tasked with fetching instructions from a ready thread instruction stream and feeding them into the decode in order to be delivered to the execution units. Since Vulcan supports up to four threads, a thread scheduler determines from which thread's instruction stream to operate on. This determination is done on each cycle with the help of the branch predictor with no added cost.
 
Vulcan's front-end is tasked with fetching instructions from a ready thread instruction stream and feeding them into the decode in order to be delivered to the execution units. Since Vulcan supports up to four threads, a thread scheduler determines from which thread's instruction stream to operate on. This determination is done on each cycle with the help of the branch predictor with no added cost.
 
Vulcan has a 32 KiB L1 cache which is 8-way set associative and a dedicate L1 instruction TLB. It's worth noting that all the core caches on Vulcan are 8-way set associative to aid the [[branch predictor]] which works on cache line stride patterns. Under normal flow, the data to be fetched was already predicted and has made it to the L1 from the L2. The bandwidth to the L1 instruction cache is 64 bytes per cycle.
 
 
 
==== Fetch ====
 
==== Fetch ====
 
[[Instruction fetch]] is done on a 32-byte window or 8 (4-byte) [[ARM]] instructions. This is twice the throughput of the previous architecture and is designed in order to better absorb bubbles in the pipeline. The instruction stream is decomposed into its constituent instructions where they are queued to go for the [[instruction decode|decoder]]. The queue is shared by all threads. The size of the queue has not been disclosed.
 
[[Instruction fetch]] is done on a 32-byte window or 8 (4-byte) [[ARM]] instructions. This is twice the throughput of the previous architecture and is designed in order to better absorb bubbles in the pipeline. The instruction stream is decomposed into its constituent instructions where they are queued to go for the [[instruction decode|decoder]]. The queue is shared by all threads. The size of the queue has not been disclosed.

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameVulcan +
core count16 +, 20 +, 24 +, 28 +, 30 + and 32 +
designerCavium + and Broadcom +
first launched2018 +
full page namecavium/microarchitectures/vulcan +
instance ofmicroarchitecture +
instruction set architectureARMv8.1 +
manufacturerTSMC +
microarchitecture typeCPU +
nameVulcan +
pipeline stages (max)15 +
pipeline stages (min)13 +
process16 nm (0.016 μm, 1.6e-5 mm) +