|  (split for nvidia/denver2 - nvidia parker - https://www.anandtech.com/show/10596/hot-chips-2016-nvidia-discloses-tegra-parker-details) |  (https://www.anandtech.com/tag/project-denver) | ||
| Line 130: | Line 130: | ||
| * NVIDIA’S FIRST CPU IS A WINNER. Denver Uses Dynamic Translation to Outperform Mobile Rivals. - Linley Gwennap (August 18, 2014) <!-- Nvidia_Denverreprint.pdf --> | * NVIDIA’S FIRST CPU IS A WINNER. Denver Uses Dynamic Translation to Outperform Mobile Rivals. - Linley Gwennap (August 18, 2014) <!-- Nvidia_Denverreprint.pdf --> | ||
| * [https://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-11-day1-epub/HC26.11-2-Mobile-Processors-epub/HC26.11.234-Denver-Darrell.Boggs-NVIDIA-rev4.pdf IEEE HotChips 26 (HC26), 2014] - Darrell Boggs "Nvidia's Denver Processor" | * [https://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-11-day1-epub/HC26.11-2-Mobile-Processors-epub/HC26.11.234-Denver-Darrell.Boggs-NVIDIA-rev4.pdf IEEE HotChips 26 (HC26), 2014] - Darrell Boggs "Nvidia's Denver Processor" | ||
| + | * https://www.anandtech.com/tag/project-denver | ||
| [[category:nvidia]] | [[category:nvidia]] | ||
Revision as of 13:09, 16 June 2018
| Edit Values | |
| Denver µarch | |
| General Info | |
| Arch Type | CPU | 
| Designer | Nvidia | 
| Manufacturer | TSMC > | 
| Introduction | 2014 | 
| Process | 28 nm | 
| Core Configs | 2 | 
| Pipeline | |
| Type | Superscalar | 
| OoOE | No | 
| Decode | 2-way | 
| Instructions | |
| ISA | ARMv8 | 
| Cache | |
| L1I Cache | 128 KiB/core 4-way set associative | 
| L1D Cache | 64 KiB/core 4-way set associative | 
| L2 Cache | 2 MiB/core 16-way set associative | 
Denver is a CPU microarchitecture from Nvidia introduced in 2014, capable of executing ARMv8 code natively and with help of dynamic code optimization. Native ARM decoder can issue up to 2 instructions per cycle, and up to 7 micro-operations are started per cycle when dynamic code translation is used.
Contents
Architecture
Denver is 7-wide superscalar. It has ARMv8 hardware decoder which can generate up to 2 micro-ops per cycle. Also it can execute up to 7 micro-ops per-cycle directly from L1i cache. Denver has 7 execution units: 1 branch, 2 integer (1 has hardware multiply module), 2 FP/NEON (128-bit), 2 Load/Store units.
Pipeline has 15 stages: IP1 (ITLB), IC2 (I$ Rd), IW3 (Way Sel), IN4 (Dec), IN5 (PB), SB1 (Pick), SB2 (Sch), EB0 (RF Rd), EB1 (Bypass), EA2, ED3, EL4(Bypass), EE5 (ALU), ES6, EW7 (RF wr). Mispredict penalty is 13 cycles.
Dynamic Code Optimization
For often executed code optimization micro-interrupt can be generated and firmware-based optimizer is started. Using "Dynamic Profile Information" optimizer can translate ARMv8 instructions into optimized microcode sequence and save it into Optimization Cache. Then Denver will execute code directly from Optimization Cache without using hardware ARMv8 decoder. Several microcode sequences may be chained
In 2014 Nvidia listed several optimizations for the dynamic code translation:
- Unrolls Loops
- Renames registers
- Reorders Loads and Stores
- Improves control flow
- Removes unused computation
- Hoists redundant computation
- Sinks uncommonly executed computation
- Improves scheduling
Products
Denver is used in Tegra K1-64.
Die
All Denver Chips
| List of all Denver Chips | |||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Main processor | IGP | ||||||||||||||||||||||||
| Model | Launched | Designer | Family | Process | Core | C | T | L2$ | L3$ | Frequency | Max Mem | Designer | Name | Frequency | |||||||||||
| Count: 0 | |||||||||||||||||||||||||
References
- NVIDIA’S FIRST CPU IS A WINNER. Denver Uses Dynamic Translation to Outperform Mobile Rivals. - Linley Gwennap (August 18, 2014)
- IEEE HotChips 26 (HC26), 2014 - Darrell Boggs "Nvidia's Denver Processor"
- https://www.anandtech.com/tag/project-denver
| codename | Denver + | 
| core count | 2 + | 
| designer | Nvidia + | 
| first launched | 2014 + | 
| full page name | nvidia/microarchitectures/denver + | 
| instance of | microarchitecture + | 
| instruction set architecture | ARMv8 + | 
| manufacturer | TSMC > + and TSMC > + | 
| microarchitecture type | CPU + | 
| name | Denver + | 
| process | 28 nm (0.028 μm, 2.8e-5 mm) + |