From WikiChip
Editing cavium/microarchitectures/vulcan
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 3: | Line 3: | ||
|atype=CPU | |atype=CPU | ||
|name=Vulcan | |name=Vulcan | ||
− | |designer= | + | |designer=Broadcomm |
|designer 2=Cavium | |designer 2=Cavium | ||
|manufacturer=TSMC | |manufacturer=TSMC | ||
Line 38: | Line 38: | ||
|predecessor=XLP II | |predecessor=XLP II | ||
|predecessor link=broadcom/microarchitectures/larrabee | |predecessor link=broadcom/microarchitectures/larrabee | ||
− | |||
− | |||
}} | }} | ||
'''Vulcan''' is a [[16 nm]] high-performance {{arch|64}} [[ARM]] microarchitecture designed by [[Broadcom]] and later introduced by [[Cavium]] for the server market. | '''Vulcan''' is a [[16 nm]] high-performance {{arch|64}} [[ARM]] microarchitecture designed by [[Broadcom]] and later introduced by [[Cavium]] for the server market. | ||
Line 46: | Line 44: | ||
== History == | == History == | ||
− | Vulcan can trace its roots all the way back to [[Raza Microelectronics]] {{raza|XLR}} family of [[MIPS]] processors from [[2006]]. With the introduction of their {{raza|XLR}} family in [[2009]], Raza (and later [[NetLogic]]) moved to a high-performance superscalar design with fine-grained 4-way multithreading support. In [[2011]], [[Broadcom]] acquired [[NetLogic Microsystems]] and integrated them | + | Vulcan can trace its roots all the way back to [[Raza Microelectronics]] {{raza|XLR}} family of [[MIPS]] processors from [[2006]]. With the introduction of their {{raza|XLR}} family in [[2009]], Raza (and later [[NetLogic]]) moved to a high-performance superscalar design with fine-grained 4-way multithreading support. In [[2011]], [[Broadcom]] acquired [[NetLogic Microsystems]] and integrated them Broadcom's Embedded Processor Group. |
− | In [[2013]], Broadcom announced that they have licensed the ARMv7 and ARMv8 | + | In [[2013]], Broadcom announced that they have licensed the ARMv7 and ARMv8 architectures, allowing them to develop their own microarchitectures based on the ISA. Vulcan is the outcome of this effort which involved adopting the [[ARM]] ISA instead of [[MIPS]] and enhancing the cores in various ways. Vulcan development started in early [[2012]] and has was expected to enter mass production in mid-[[2015]]. |
In [[2016]] [[Cavium]] acquired Vulcan from Broadcom which was introduced the following year. In early [[2018]], Vulcan-based microprocessor entered general availability under the {{cavium|ThunderX2}} brand. | In [[2016]] [[Cavium]] acquired Vulcan from Broadcom which was introduced the following year. In early [[2018]], Vulcan-based microprocessor entered general availability under the {{cavium|ThunderX2}} brand. | ||
Line 89: | Line 87: | ||
=== Key changes from {{broadcom|XLP II|l=arch}} === | === Key changes from {{broadcom|XLP II|l=arch}} === | ||
* Converted to [[ARM]] ISA (from [[MIPS]]) | * Converted to [[ARM]] ISA (from [[MIPS]]) | ||
− | ** Aarch64 | + | ** Aarch64, Aarch32 |
* [[16 nm lithography process|16nm FinFET process]] (from [[28 nm|28 nm planar]]) | * [[16 nm lithography process|16nm FinFET process]] (from [[28 nm|28 nm planar]]) | ||
* 40% IPC improvement | * 40% IPC improvement | ||
Line 138: | Line 136: | ||
*** 256 KiB, 8-way set associative | *** 256 KiB, 8-way set associative | ||
** L3 Cache | ** L3 Cache | ||
− | |||
*** 1 MiB/core slice | *** 1 MiB/core slice | ||
*** Shared | *** Shared | ||
− | |||
− | |||
** System DRAM | ** System DRAM | ||
*** 2 TiB Max Memory / socket | *** 2 TiB Max Memory / socket | ||
Line 198: | Line 193: | ||
Up to six µOPs can be sent into Vulcan's six execution units each cycle. As far as integer operations, up to three operations can be issued each cycle. One of the ALUs also handles branch instructions. Note that only the ALU on port 1 can perform complex integer operations (i.e., [[multiplication]] and [[division]]) in addition to the simple integer operations. The other two ALUs can only perform simple integer operations. | Up to six µOPs can be sent into Vulcan's six execution units each cycle. As far as integer operations, up to three operations can be issued each cycle. One of the ALUs also handles branch instructions. Note that only the ALU on port 1 can perform complex integer operations (i.e., [[multiplication]] and [[division]]) in addition to the simple integer operations. The other two ALUs can only perform simple integer operations. | ||
− | Vulcan has doubled the number of [[floating point]] units to two and widened them to 128-bit to support [[ARM]]'s {{arm|NEON}} operations (prior design was only 64-bit wide). In theory, Vulcan's peak performance now stands at 8 | + | Vulcan has doubled the number of [[floating point]] units to two and widened them to 128-bit to support [[ARM]]'s {{arm|NEON}} operations (prior design was only 64-bit wide). In theory, Vulcan's peak performance now stands at 8 FLOPS/cycle or 8 GFLOPS at 1 GHz. |
Port 1 has addition support for crypto operations supporting [[ARM]]'s crypto extension (e.g., ARM <code>AES</code>, <code>SHA1</code>, <code>SHA256</code> instructions). | Port 1 has addition support for crypto operations supporting [[ARM]]'s crypto extension (e.g., ARM <code>AES</code>, <code>SHA1</code>, <code>SHA256</code> instructions). | ||
=== Memory subsystem === | === Memory subsystem === | ||
− | Vulcan's memory subsystem deals with the loads and store requests and ordering. There are two [[load-store units]] each capable of moving 128-bit of data - double the bandwidth of the XLP II. The widening of the units was done in order to more efficiently support operations such as the Load Pair (<code>LDP</code>) and Store Pair (<code>STP</code>) instructions. In addition to the LSUs, there is a new dedicated Store Address unit. Similar to Intel's older architectures, the store operation is cracked into two distinct operations - a store address operation used to calculate the effective address and finally the store data operation. Vulcan can issue a store to the Store Address unit before the data is available where the address can be calculated and | + | Vulcan's memory subsystem deals with the loads and store requests and ordering. There are two [[load-store units]] each capable of moving 128-bit of data - double the bandwidth of the XLP II. The widening of the units was done in order to more efficiently support operations such as the Load Pair (<code>LDP</code>) and Store Pair (<code>STP</code>) instructions. In addition to the LSUs, there is a new dedicated Store Address unit. Similar to Intel's older architectures, the store operation is cracked into two distinct operations - a store address operation used to calculate the effective address and finally the store data operation. Vulcan can issue a store to the Store Address unit before the data is available where the address can be calculated and memory ordering conflicts can be detected. Once the data is ready, the operation will be reissued to the LSU. The store buffer is 36-entry deep with the load buffer at 64-entries for a total of 100 simultaneous memory operations in-flight or roughly 55% of all µOPs. Note that the store buffer is considerably smaller than the load buffer because Vulcan can only sustain a single load operation per cycle as most workloads do far more loads than stores. |
− | Vulcan's L2 cache is 256 KiB, half that of prior design, and has an | + | Vulcan's L2 cache is 256 KiB, half that of prior design, and has an L2 to L1 bandwidth of 64 bytes per cycle in either direction. There is a 1 MiB L3 cache per core arranged as 2 MiB slices for a total of 32 MiB of cache shared by the entire chip. |
− | |||
− | |||
− | |||
== System Architecture == | == System Architecture == | ||
Line 272: | Line 264: | ||
{{comp table end}} | {{comp table end}} | ||
− | == | + | == References == |
− | * Broadcom | + | * ''Some information was obtained directly from Broadcom'' |
− | * | + | * ''Some information was obtained directly from Cavium'' |
− | |||
− | |||
== See also == | == See also == | ||
* Qualcomm's {{qualcomm|Falkor|l=arch}} | * Qualcomm's {{qualcomm|Falkor|l=arch}} | ||
* Intel's {{intel|Skylake (server)|Skylake|l=arch}} | * Intel's {{intel|Skylake (server)|Skylake|l=arch}} |
Facts about "Vulcan - Microarchitectures - Cavium"
codename | Vulcan + |
core count | 16 +, 20 +, 24 +, 28 +, 30 + and 32 + |
designer | Cavium + and Broadcom + |
first launched | 2018 + |
full page name | cavium/microarchitectures/vulcan + |
instance of | microarchitecture + |
instruction set architecture | ARMv8.1 + |
manufacturer | TSMC + |
microarchitecture type | CPU + |
name | Vulcan + |
pipeline stages (max) | 15 + |
pipeline stages (min) | 13 + |
process | 16 nm (0.016 μm, 1.6e-5 mm) + |