From WikiChip
Editing nvidia/microarchitectures/denver

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 28: Line 28:
  
 
== Architecture ==
 
== Architecture ==
Denver is 7-wide in-order superscalar. It has ARMv8 hardware decoder (A32, T32, and A64 modes) which can generate up to 2 micro-ops per cycle. Also it can execute up to 7 micro-ops per-cycle directly from L1i cache. Denver has 7 execution units: 1 branch, 2 integer (1 has hardware multiply module), 2 FP/NEON (128-bit), 2 Load/Store units.
+
Denver is 7-wide in-order superscalar. It has ARMv8 hardware decoder which can generate up to 2 micro-ops per cycle. Also it can execute up to 7 micro-ops per-cycle directly from L1i cache. Denver has 7 execution units: 1 branch, 2 integer (1 has hardware multiply module), 2 FP/NEON (128-bit), 2 Load/Store units.
  
Denver 2 has dynamic branch prediction with Branch Target Buffer and Global History Buffer (Conditional Direction Predictor - gshare-agree). It also has Return Stack Buffer, Indirect Target Predictor and static predictor.
+
Denver 2 has dynamic branch prediction with Branch Target Buffer and Global History Buffer. It also has return stack buffer and indirect predictor.
  
 
Pipeline of Denver 1 has 15 stages, mispredict penalty is 13 cycles.
 
Pipeline of Denver 1 has 15 stages, mispredict penalty is 13 cycles.
Line 42: Line 42:
  
 
=== Dynamic Code Optimization ===
 
=== Dynamic Code Optimization ===
For often executed code optimization micro-interrupt can be generated and firmware-based optimizer is started. Using "Dynamic Profile Information" optimizer can translate ARMv8 instructions into optimized microcode sequence and save it into Optimization Cache. Then Denver will execute code directly from Optimization Cache (part of 128 MiB microcode carve-out) without using hardware ARMv8 decoder. Several microcode sequences may be chained.
+
For often executed code optimization micro-interrupt can be generated and firmware-based optimizer is started. Using "Dynamic Profile Information" optimizer can translate ARMv8 instructions into optimized microcode sequence and save it into Optimization Cache. Then Denver will execute code directly from Optimization Cache without using hardware ARMv8 decoder. Several microcode sequences may be chained
  
 
In 2014 Nvidia listed several optimizations for the dynamic code translation:
 
In 2014 Nvidia listed several optimizations for the dynamic code translation:

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameDenver +
core count2 +
designerNvidia +
first launched2014 +
full page namenvidia/microarchitectures/denver +
instance ofmicroarchitecture +
instruction set architectureARMv8 +
l1$ size384 KiB (393,216 B, 0.375 MiB) +
l1d$ description4-way set associative +
l1d$ size128 KiB (131,072 B, 0.125 MiB) +
l1i$ description4-way set associative +
l1i$ size256 KiB (262,144 B, 0.25 MiB) +
l2$ description16-way set associative +
l2$ size2 MiB (2,048 KiB, 2,097,152 B, 0.00195 GiB) +
manufacturerTSMC +
microarchitecture typeCPU +
nameDenver +
process28 nm (0.028 μm, 2.8e-5 mm) + and 16 nm (0.016 μm, 1.6e-5 mm) +