Difference between revisions of "nvidia/microarchitectures/denver"

	Edit Values
	Denver µarch
	General Info
Arch Type	CPU
Designer	Nvidia
Manufacturer	TSMC
Introduction	2014
Process	28 nm
Core Configs	2
	Pipeline
Type	Superscalar
OoOE	No
Decode	2-way
	Instructions
ISA	ARMv8
	Cache
L1I Cache	128 KiB/core; 4-way set associative
L1D Cache	64 KiB/core; 4-way set associative
L2 Cache	2 MiB/core; 16-way set associative

Revision as of 13:58, 16 June 2018

Denver is a CPU microarchitecture from Nvidia introduced in 2014, capable of executing ARMv8 code natively and with help of dynamic code optimization. Native ARM decoder can issue up to 2 instructions per cycle, and up to 7 micro-operations are started per cycle when dynamic code translation is used.

Architecture

Denver is 7-wide superscalar. It has ARMv8 hardware decoder which can generate up to 2 micro-ops per cycle. Also it can execute up to 7 micro-ops per-cycle directly from L1i cache. Denver has 7 execution units: 1 branch, 2 integer (1 has hardware multiply module), 2 FP/NEON (128-bit), 2 Load/Store units.

Pipeline of Denver 1 has 15 stages, mispredict penalty is 13 cycles.

Stage name:

IP1

IC2

IW3

IN4

IN5

SB1

SB2

EB0

EB1

EA2

ED3

EL4

EE5

ES6

EW7

Stage action:

ITLB

I$ Rd

Way Sel

Decode

Fetch Q

Pick

Sched

RF Rd

Bypass

Ld Addr

D$ Read

Bypass

ALU/Execute

St Addr

RF wr

Dynamic Code Optimization

For often executed code optimization micro-interrupt can be generated and firmware-based optimizer is started. Using "Dynamic Profile Information" optimizer can translate ARMv8 instructions into optimized microcode sequence and save it into Optimization Cache. Then Denver will execute code directly from Optimization Cache without using hardware ARMv8 decoder. Several microcode sequences may be chained

In 2014 Nvidia listed several optimizations for the dynamic code translation:

Unrolls Loops
Renames registers
Reorders Loads and Stores
Improves control flow
Removes unused computation
Hoists redundant computation
Sinks uncommonly executed computation
Improves scheduling

Products

Denver is used in Nvidia's Tegra K1-64 (2014, 28 nm)

Denver 2 is used in Nvidia's Terga Parker (2016, 16 nm TSMC). Parker SoC (Tegra X2) also uses 4 Cortex-A57 cores.

Die

All Denver Chips

	List of all Denver Chips
	Main processor										IGP
Model	Launched	Designer	Family	Process	Core	C	T	L2$	L3$	Frequency	Max Mem	Designer	Name	Frequency
Count: 0

References

NVIDIA’S FIRST CPU IS A WINNER. Denver Uses Dynamic Translation to Outperform Mobile Rivals. - Linley Gwennap (August 18, 2014)
IEEE HotChips 26 (HC26), 2014 - Darrell Boggs "Nvidia's Denver Processor"
https://www.anandtech.com/tag/project-denver

@@ Line 1: / Line 1: @@
 {{nvidia title|Denver|arch}}
 {{microarchitecture
-| atype         = CPU
+|atype=CPU
-| name          = Denver
+|name=Denver
-| designer      = Nvidia
+|designer=Nvidia
-| manufacturer  = TSMC <!-- ??? -->>
+|manufacturer=TSMC
-| introduction  = 2014
+|introduction=2014
-| phase-out     =
+|process=28 nm
-| process       = 28 nm
+|cores=2
-| cores         = 2
+|type=Superscalar
-| cores 2       =
+|oooe=No
-| cores N       =
+|decode=2-way
+|isa=ARMv8
-| type          = Superscalar
+|l1i=128 KiB
-| type 2        =
+|l1i per=core
-| type N        =
+|l1i desc=4-way set associative
-| oooe          = No
+|l1d=64 KiB
-| speculative   = <!-- Yes or No only -->
+|l1d per=core
-| renaming      = <!-- Yes or No only -->
+|l1d desc=4-way set associative
-| stages        = <!-- ONLY IF FIXED SIZE, otherwise use below for range -->
+|l2=2 MiB
-| stages min    =
+|l2 per=core
-| stages max    =
+|l2 desc=16-way set associative
-| decode        = 2-way
-| isa           = ARMv8
-| isa 2         =
-| isa N         =
-| feature       =
-| extension     =
-| extension 2   =
-| extension N   =
-| l1i           = 128 KiB
-| l1i per       = core
-| l1i desc      = 4-way set associative
-| l1d           = 64 KiB
-| l1d per       = core
-| l1d desc      = 4-way set associative
-| l2            = 2 MiB
-| l2 per        = core
-| l2 desc       = 16-way set associative
-| l3            =
-| l3 per        =
-| l3 desc       =
-| core name        =
-| core name 2      =
-| core name N      =
-| predecessor      =
-| predecessor link =
-| successor        =
-| successor link   =
-| successor 2      =
-| successor 2 link =
-| successor N      =
-| successor N link =
 }}
 '''Denver''' is a CPU microarchitecture from [[Nvidia]] introduced in 2014, capable of executing ARMv8 code natively and with help of dynamic code optimization. Native ARM decoder can issue up to 2 instructions per cycle, and up to 7 micro-operations are started per cycle when dynamic code translation is used.

codename	Denver +
core count	2 +
designer	Nvidia +
first launched	2014 +
full page name	nvidia/microarchitectures/denver +
instance of	microarchitecture +
instruction set architecture	ARMv8 +
manufacturer	TSMC +
microarchitecture type	CPU +
name	Denver +
process	28 nm (0.028 μm, 2.8e-5 mm) +

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas