From WikiChip
Difference between revisions of "nvidia/microarchitectures/denver"
< nvidia

m (A5b moved page nvidia/denver to nvidia/microarchitectures/denver: Move to microarchitectures subsection)
(Architecture: stealing pipeline table from section 12 of ibm/microarchitectures/power9)
Line 62: Line 62:
 
Denver is 7-wide superscalar. It has ARMv8 hardware decoder which can generate up to 2 micro-ops per cycle. Also it can execute up to 7 micro-ops per-cycle directly from L1i cache. Denver has 7 execution units: 1 branch, 2 integer (1 has hardware multiply module), 2 FP/NEON (128-bit), 2 Load/Store units.
 
Denver is 7-wide superscalar. It has ARMv8 hardware decoder which can generate up to 2 micro-ops per cycle. Also it can execute up to 7 micro-ops per-cycle directly from L1i cache. Denver has 7 execution units: 1 branch, 2 integer (1 has hardware multiply module), 2 FP/NEON (128-bit), 2 Load/Store units.
  
Pipeline has 15 stages: IP1 (ITLB), IC2 (I$ Rd), IW3 (Way Sel), IN4 (Dec), IN5 (PB), SB1 (Pick), SB2 (Sch), EB0 (RF Rd), EB1 (Bypass), EA2, ED3, EL4(Bypass), EE5 (ALU), ES6, EW7 (RF wr). Mispredict penalty is 13 cycles.
+
Pipeline of Denver 1 has 15 stages, mispredict penalty is 13 cycles.
 +
 
 +
{| style="overflow-x: scroll; white-space: nowrap; font-size: 1.2em; border-spacing: 10px; border-collapse: separate; "
 +
| Stage name: || IP1 || IC2 || IW3 || IN4 || IN5 || SB1  || SB2 || EB0 || EB1 || EA2 || ED3 || EL4 || EE5 || ES6 || EW7
 +
|-
 +
| Stage action: || ITLB || I$ Rd || Way Sel || Dec || PB || Pick || Sch || RF Rd || Bypass || - || - || Bypass || ALU || - || RF wr
 +
|-
 +
|}
  
 
=== Dynamic Code Optimization ===
 
=== Dynamic Code Optimization ===

Revision as of 12:28, 16 June 2018

Edit Values
Denver µarch
General Info
Arch TypeCPU
DesignerNvidia
ManufacturerTSMC >
Introduction2014
Process28 nm
Core Configs2
Pipeline
TypeSuperscalar
OoOENo
Decode2-way
Instructions
ISAARMv8
Cache
L1I Cache128 KiB/core
4-way set associative
L1D Cache64 KiB/core
4-way set associative
L2 Cache2 MiB/core
16-way set associative

Denver is a CPU microarchitecture from Nvidia introduced in 2014, capable of executing ARMv8 code natively and with help of dynamic code optimization. Native ARM decoder can issue up to 2 instructions per cycle, and up to 7 micro-operations are started per cycle when dynamic code translation is used.

Architecture

Denver is 7-wide superscalar. It has ARMv8 hardware decoder which can generate up to 2 micro-ops per cycle. Also it can execute up to 7 micro-ops per-cycle directly from L1i cache. Denver has 7 execution units: 1 branch, 2 integer (1 has hardware multiply module), 2 FP/NEON (128-bit), 2 Load/Store units.

Pipeline of Denver 1 has 15 stages, mispredict penalty is 13 cycles.

Stage name: IP1 IC2 IW3 IN4 IN5 SB1 SB2 EB0 EB1 EA2 ED3 EL4 EE5 ES6 EW7
Stage action: ITLB I$ Rd Way Sel Dec PB Pick Sch RF Rd Bypass - - Bypass ALU - RF wr

Dynamic Code Optimization

For often executed code optimization micro-interrupt can be generated and firmware-based optimizer is started. Using "Dynamic Profile Information" optimizer can translate ARMv8 instructions into optimized microcode sequence and save it into Optimization Cache. Then Denver will execute code directly from Optimization Cache without using hardware ARMv8 decoder. Several microcode sequences may be chained

In 2014 Nvidia listed several optimizations for the dynamic code translation:

  • Unrolls Loops
  • Renames registers
  • Reorders Loads and Stores
  • Improves control flow
  • Removes unused computation
  • Hoists redundant computation
  • Sinks uncommonly executed computation
  • Improves scheduling

Products

Denver is used in Nvidia's Tegra K1-64 (2014, 28 nm)

Denver 2 is used in Nvidia's Terga Parker (2016, 16 nm TSMC). Parker SoC also uses 4 Cortex-A57 cores.

Die

All Denver Chips

 List of all Denver Chips
 Main processorIGP
ModelLaunchedDesignerFamilyProcessCoreCTL2$L3$FrequencyMax MemDesignerNameFrequency
Count: 0


References

codenameDenver +
core count2 +
designerNvidia +
first launched2014 +
full page namenvidia/microarchitectures/denver +
instance ofmicroarchitecture +
instruction set architectureARMv8 +
manufacturerTSMC > + and TSMC > +
microarchitecture typeCPU +
nameDenver +
process28 nm (0.028 μm, 2.8e-5 mm) +