From WikiChip
Editing cavium/microarchitectures/vulcan
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 3: | Line 3: | ||
|atype=CPU | |atype=CPU | ||
|name=Vulcan | |name=Vulcan | ||
− | |designer= | + | |designer=Broadcomm |
|designer 2=Cavium | |designer 2=Cavium | ||
|manufacturer=TSMC | |manufacturer=TSMC | ||
Line 38: | Line 38: | ||
|predecessor=XLP II | |predecessor=XLP II | ||
|predecessor link=broadcom/microarchitectures/larrabee | |predecessor link=broadcom/microarchitectures/larrabee | ||
− | |||
− | |||
}} | }} | ||
− | '''Vulcan''' is a [[16 nm]] high-performance {{arch|64}} [[ARM]] microarchitecture designed by [[Broadcom]] and later | + | '''Vulcan''' is a [[16 nm]] high-performance {{arch|64}} [[ARM]] microarchitecture designed by [[Broadcom]] and later [[Cavium]] for the server market. |
Introduced in [[2018]], Vulcan-based microprocessors are branded as part of the {{cavium|ThunderX2}} family. | Introduced in [[2018]], Vulcan-based microprocessors are branded as part of the {{cavium|ThunderX2}} family. | ||
== History == | == History == | ||
− | Vulcan can trace its roots all the way back to [[Raza Microelectronics]] {{raza|XLR}} family of [[MIPS]] processors from [[2006]]. With the introduction of their {{raza|XLR}} family in [[2009]], Raza (and later [[NetLogic]]) moved to a high-performance superscalar design with fine-grained 4-way multithreading support. In [[2011]], [[Broadcom]] acquired [[NetLogic Microsystems]] and integrated them | + | Vulcan can trace its roots all the way back to [[Raza Microelectronics]] {{raza|XLR}} family of [[MIPS]] processors from [[2006]]. With the introduction of their {{raza|XLR}} family in [[2009]], Raza (and later [[NetLogic]]) moved to a high-performance superscalar design with fine-grained 4-way multithreading support. In [[2011]], [[Broadcom]] acquired [[NetLogic Microsystems]] and integrated them Broadcom's Embedded Processor Group. |
− | In [[2013]], Broadcom announced that they have licensed the ARMv7 and ARMv8 | + | In [[2013]], Broadcom announced that they have licensed the ARMv7 and ARMv8 architectures, allowing them to develop their own microarchitectures based on the ISA. Vulcan is the outcome of this effort which involved adopting the [[ARM]] ISA instead of [[MIPS]] and enhancing the cores in various ways. Vulcan development started in early [[2012]] and has was expected to enter mass production in mid-[[2015]]. |
− | In [[ | + | In [[2017]] [[Cavium]] acquired Vulcan from broadcom which was introduced later that year. In early [[2018]], Vulcan-based microprocessor entered general availability under the {{cavium|ThunderX2}} brand. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Architecture == | == Architecture == | ||
Line 89: | Line 55: | ||
=== Key changes from {{broadcom|XLP II|l=arch}} === | === Key changes from {{broadcom|XLP II|l=arch}} === | ||
* Converted to [[ARM]] ISA (from [[MIPS]]) | * Converted to [[ARM]] ISA (from [[MIPS]]) | ||
− | ** Aarch64 | + | ** Aarch64, Aarch32 |
* [[16 nm lithography process|16nm FinFET process]] (from [[28 nm|28 nm planar]]) | * [[16 nm lithography process|16nm FinFET process]] (from [[28 nm|28 nm planar]]) | ||
* 40% IPC improvement | * 40% IPC improvement | ||
Line 138: | Line 104: | ||
*** 256 KiB, 8-way set associative | *** 256 KiB, 8-way set associative | ||
** L3 Cache | ** L3 Cache | ||
− | |||
*** 1 MiB/core slice | *** 1 MiB/core slice | ||
*** Shared | *** Shared | ||
− | |||
− | |||
** System DRAM | ** System DRAM | ||
− | |||
*** 8 Channels | *** 8 Channels | ||
*** DDR4, up to 2666 MT/s | *** DDR4, up to 2666 MT/s | ||
Line 182: | Line 144: | ||
Each cycle, up to four instructions are sent to the [[instruction decode|decoder]]. In prior design, [[Broadcom]]'s products decoded [[MIPS]] instructions. With Vulcan, the switching to ARM meant the decoder had to be replaced with much more complex logic that decodes the original [[instruction]] and emits [[micro-ops]]. For the most part, there is a 1:1 mapping between instructions and µOP with an average of 15% more µOPs emitted from instructions. The extra complexity has added another pipeline stage to the decode. | Each cycle, up to four instructions are sent to the [[instruction decode|decoder]]. In prior design, [[Broadcom]]'s products decoded [[MIPS]] instructions. With Vulcan, the switching to ARM meant the decoder had to be replaced with much more complex logic that decodes the original [[instruction]] and emits [[micro-ops]]. For the most part, there is a 1:1 mapping between instructions and µOP with an average of 15% more µOPs emitted from instructions. The extra complexity has added another pipeline stage to the decode. | ||
==== Loop Buffer ==== | ==== Loop Buffer ==== | ||
− | Sitting between the [[instruction decode|decoder]] and the [[instruction scheduler|scheduler]] is a | + | Sitting between the [[instruction decode|decoder]] and the [[instruction scheduler|scheduler]] is a [[loop buffer]]. The loop buffer, in conjunction with the [[branch predictor]], will queue recent tight loop operations. The buffer will play back the operations repeatedly until a branch take occurs. When this takes place, the front-end (instruction fetch, decode, etc..) is largely power-gated in order to save power. Although Broadcom originally told us the buffer was had 48-entries, when the product was re-released by [[Cavium]] in late 2018, WikiChip was unable to confirm this number. |
=== Execution engine === | === Execution engine === | ||
Line 196: | Line 158: | ||
==== Execution Units ==== | ==== Execution Units ==== | ||
− | Up to six µOPs can be sent into Vulcan's six execution units each cycle. As far as integer operations, up to three operations can be issued each cycle. One of the ALUs also handles branch instructions. | + | Up to six µOPs can be sent into Vulcan's six execution units each cycle. As far as integer operations, up to three operations can be issued each cycle. One of the ALUs also handles branch instructions. In the XLP II, there were two simple integer ALUs and a single complex integer ALU unit. Only the complex integer ALU unit was able to perform operations such as multiplication and division. Though unconfirmed, it's suspected that both ALUs can now do complex integer operations as well. |
− | Vulcan has doubled the number of [[floating point]] units to two and widened them to 128-bit to support [[ARM]]'s {{arm|NEON}} operations (prior design was only 64-bit wide). In theory, Vulcan's peak performance now stands at 8 | + | Vulcan has doubled the number of [[floating point]] units to two and widened them to 128-bit to support [[ARM]]'s {{arm|NEON}} operations (prior design was only 64-bit wide). In theory, Vulcan's peak performance now stands at 8 FLOPS/cycle or 8 GFLOPS at 1 GHz. |
− | |||
− | |||
=== Memory subsystem === | === Memory subsystem === | ||
− | Vulcan's memory subsystem deals with the loads and store requests and ordering. There are two [[load-store units]] each capable of moving 128-bit of data - double the bandwidth of the XLP II. The widening of the units was done in order to more efficiently support operations such as the Load Pair (<code>LDP</code>) and Store Pair (<code>STP</code>) instructions. In addition to the LSUs, there is a new dedicated Store Address unit. Similar to Intel's older architectures, the store operation is cracked into two distinct operations - a store address operation used to calculate the effective address and finally the store data operation. Vulcan can issue a store to the Store Address unit before the data is available where the address can be calculated and | + | Vulcan's memory subsystem deals with the loads and store requests and ordering. There are two [[load-store units]] each capable of moving 128-bit of data - double the bandwidth of the XLP II. The widening of the units was done in order to more efficiently support operations such as the Load Pair (<code>LDP</code>) and Store Pair (<code>STP</code>) instructions. In addition to the LSUs, there is a new dedicated Store Address unit. Similar to Intel's older architectures, the store operation is cracked into two distinct operations - a store address operation used to calculate the effective address and finally the store data operation. Vulcan can issue a store to the Store Address unit before the data is available where the address can be calculated and memory ordering conflicts can be detected. Once the data is ready, the operation will be reissued to the LSU. The store buffer is 36-entry deep with the load buffer at 64-entries for a total of 100 simultaneous memory operations in-flight or roughly 55% of all µOPs. Note that the store buffer is considerably smaller than the load buffer because Vulcan can only sustain a single load operation per cycle as most workloads do far more loads than stores. |
− | Vulcan's L2 cache is 256 KiB, half that of prior design, and has an | + | Vulcan's L2 cache is 256 KiB, half that of prior design, and has an L2 to L1 bandwidth of 64 bytes per cycle in either direction. There is a 1 MiB L3 cache per core arranged as 2 MiB slices for a total of 32 MiB of cache shared by the entire chip. |
− | |||
− | |||
− | |||
== System Architecture == | == System Architecture == | ||
Line 225: | Line 182: | ||
== Die == | == Die == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
* Broadcom's original die size was rumored to be around 600 mm². It's unknown how much the die has changed when it was modified by Cavium. | * Broadcom's original die size was rumored to be around 600 mm². It's unknown how much the die has changed when it was modified by Cavium. | ||
* TSMC's [[16 nm process]] | * TSMC's [[16 nm process]] | ||
− | |||
Line 240: | Line 190: | ||
:[[File:cavium vulcan die (annotated).png|600px]] | :[[File:cavium vulcan die (annotated).png|600px]] | ||
+ | |||
== All Vulcan Chips == | == All Vulcan Chips == | ||
− | + | {{empty section}} | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | {{ | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == | + | == References == |
− | * Broadcom | + | * ''Some information was obtained directly from Broadcom'' |
− | * | + | * ''Some information was obtained directly from Cavium'' |
− | |||
− | |||
== See also == | == See also == | ||
* Qualcomm's {{qualcomm|Falkor|l=arch}} | * Qualcomm's {{qualcomm|Falkor|l=arch}} | ||
* Intel's {{intel|Skylake (server)|Skylake|l=arch}} | * Intel's {{intel|Skylake (server)|Skylake|l=arch}} |
Facts about "Vulcan - Microarchitectures - Cavium"
codename | Vulcan + |
core count | 16 +, 20 +, 24 +, 28 +, 30 + and 32 + |
designer | Cavium + and Broadcom + |
first launched | 2018 + |
full page name | cavium/microarchitectures/vulcan + |
instance of | microarchitecture + |
instruction set architecture | ARMv8.1 + |
manufacturer | TSMC + |
microarchitecture type | CPU + |
name | Vulcan + |
pipeline stages (max) | 15 + |
pipeline stages (min) | 13 + |
process | 16 nm (0.016 μm, 1.6e-5 mm) + |