From WikiChip
Editing nvidia/microarchitectures/denver
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
− | {{nvidia title| | + | {{nvidia title|denver}} |
{{microarchitecture | {{microarchitecture | ||
− | |atype=CPU | + | | atype = CPU |
− | |name=Denver | + | | name = Denver |
− | |designer=Nvidia | + | | designer = Nvidia |
− | |manufacturer=TSMC | + | | manufacturer = TSMC <!-- ??? -->> |
− | |introduction=2014 | + | | introduction = 2014 |
− | |process=28 nm | + | | phase-out = |
− | | | + | | process = 28 nm |
− | |cores=2 | + | | cores = 2 |
− | |type= | + | | cores 2 = |
− | |oooe=No | + | | cores N = |
− | |decode=2-way | + | |
− | |isa=ARMv8 | + | | type = Superscalar |
− | |l1i=128 KiB | + | | type 2 = |
− | |l1i per=core | + | | type N = |
− | |l1i desc=4-way set associative | + | | oooe = No |
− | |l1d=64 KiB | + | | speculative = <!-- Yes or No only --> |
− | |l1d per=core | + | | renaming = <!-- Yes or No only --> |
− | |l1d desc=4-way set associative | + | | stages = <!-- ONLY IF FIXED SIZE, otherwise use below for range --> |
− | |l2=2 MiB | + | | stages min = |
− | |l2 per= | + | | stages max = |
− | |l2 desc=16-way set associative | + | | decode = 2-way |
− | |successor= | + | |
− | |successor link= | + | | isa = ARMv8 |
+ | | isa 2 = | ||
+ | | isa N = | ||
+ | | feature = | ||
+ | | extension = | ||
+ | | extension 2 = | ||
+ | | extension N = | ||
+ | |||
+ | | l1i = 128 KiB | ||
+ | | l1i per = core | ||
+ | | l1i desc = 4-way set associative | ||
+ | | l1d = 64 KiB | ||
+ | | l1d per = core | ||
+ | | l1d desc = 4-way set associative | ||
+ | | l2 = 2 MiB | ||
+ | | l2 per = core | ||
+ | | l2 desc = 16-way set associative | ||
+ | | l3 = | ||
+ | | l3 per = | ||
+ | | l3 desc = | ||
+ | |||
+ | | core name = | ||
+ | | core name 2 = | ||
+ | | core name N = | ||
+ | |||
+ | | predecessor = | ||
+ | | predecessor link = | ||
+ | | successor = | ||
+ | | successor link = | ||
+ | | successor 2 = | ||
+ | | successor 2 link = | ||
+ | | successor N = | ||
+ | | successor N link = | ||
}} | }} | ||
'''Denver''' is a CPU microarchitecture from [[Nvidia]] introduced in 2014, capable of executing ARMv8 code natively and with help of dynamic code optimization. Native ARM decoder can issue up to 2 instructions per cycle, and up to 7 micro-operations are started per cycle when dynamic code translation is used. | '''Denver''' is a CPU microarchitecture from [[Nvidia]] introduced in 2014, capable of executing ARMv8 code natively and with help of dynamic code optimization. Native ARM decoder can issue up to 2 instructions per cycle, and up to 7 micro-operations are started per cycle when dynamic code translation is used. | ||
== Architecture == | == Architecture == | ||
− | Denver is 7-wide | + | Denver is 7-wide superscalar. It has ARMv8 hardware decoder which can generate up to 2 micro-ops per cycle. Also it can execute up to 7 micro-ops per-cycle directly from L1i cache. Denver has 7 execution units: 1 branch, 2 integer (1 has hardware multiply module), 2 FP/NEON (128-bit), 2 Load/Store units. |
− | |||
− | |||
− | |||
− | |||
− | + | Pipeline has 15 stages: IP1 (ITLB), IC2 (I$ Rd), IW3 (Way Sel), IN4 (Dec), IN5 (PB), SB1 (Pick), SB2 (Sch), EB0 (RF Rd), EB1 (Bypass), EA2, ED3, EL4(Bypass), EE5 (ALU), ES6, EW7 (RF wr). Mispredict penalty is 13 cycles. | |
− | |||
− | |||
− | |||
− | |||
− | |||
=== Dynamic Code Optimization === | === Dynamic Code Optimization === | ||
− | For often executed code optimization micro-interrupt can be generated and firmware-based optimizer is started. Using "Dynamic Profile Information" optimizer can translate ARMv8 instructions into optimized microcode sequence and save it into Optimization Cache. Then Denver will execute code directly from Optimization Cache | + | For often executed code optimization micro-interrupt can be generated and firmware-based optimizer is started. Using "Dynamic Profile Information" optimizer can translate ARMv8 instructions into optimized microcode sequence and save it into Optimization Cache. Then Denver will execute code directly from Optimization Cache without using hardware ARMv8 decoder. Several microcode sequences may be chained |
In 2014 Nvidia listed several optimizations for the dynamic code translation: | In 2014 Nvidia listed several optimizations for the dynamic code translation: | ||
Line 53: | Line 76: | ||
*Sinks uncommonly executed computation | *Sinks uncommonly executed computation | ||
*Improves scheduling | *Improves scheduling | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Products == | == Products == | ||
− | Denver is used in Nvidia's Tegra K1-64 (2014, 28 nm | + | Denver is used in Nvidia's Tegra K1-64 (2014, 28 nm) |
− | Denver 2 is used in Nvidia's | + | Denver 2 is used in Nvidia's Terga Parker (2016, 16 nm TSMC). Parker SoC also uses 4 [[Cortex-A57]] cores. |
== Die == | == Die == | ||
Line 159: | Line 132: | ||
* NVIDIA’S FIRST CPU IS A WINNER. Denver Uses Dynamic Translation to Outperform Mobile Rivals. - Linley Gwennap (August 18, 2014) <!-- Nvidia_Denverreprint.pdf --> | * NVIDIA’S FIRST CPU IS A WINNER. Denver Uses Dynamic Translation to Outperform Mobile Rivals. - Linley Gwennap (August 18, 2014) <!-- Nvidia_Denverreprint.pdf --> | ||
* [https://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-11-day1-epub/HC26.11-2-Mobile-Processors-epub/HC26.11.234-Denver-Darrell.Boggs-NVIDIA-rev4.pdf IEEE HotChips 26 (HC26), 2014] - Darrell Boggs "Nvidia's Denver Processor" | * [https://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-11-day1-epub/HC26.11-2-Mobile-Processors-epub/HC26.11.234-Denver-Darrell.Boggs-NVIDIA-rev4.pdf IEEE HotChips 26 (HC26), 2014] - Darrell Boggs "Nvidia's Denver Processor" | ||
− | |||
* https://www.anandtech.com/tag/project-denver | * https://www.anandtech.com/tag/project-denver | ||
[[category:nvidia]] | [[category:nvidia]] |
Facts about "Denver - Microarchitectures - Nvidia"
codename | Denver + |
core count | 2 + |
designer | Nvidia + |
first launched | 2014 + |
full page name | nvidia/microarchitectures/denver + |
instance of | microarchitecture + |
instruction set architecture | ARMv8 + |
l1$ size | 384 KiB (393,216 B, 0.375 MiB) + |
l1d$ description | 4-way set associative + |
l1d$ size | 128 KiB (131,072 B, 0.125 MiB) + |
l1i$ description | 4-way set associative + |
l1i$ size | 256 KiB (262,144 B, 0.25 MiB) + |
l2$ description | 16-way set associative + |
l2$ size | 2 MiB (2,048 KiB, 2,097,152 B, 0.00195 GiB) + |
manufacturer | TSMC + |
microarchitecture type | CPU + |
name | Denver + |
process | 28 nm (0.028 μm, 2.8e-5 mm) + and 16 nm (0.016 μm, 1.6e-5 mm) + |