From WikiChip
Editing cavium/microarchitectures/vulcan

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 3: Line 3:
 
|atype=CPU
 
|atype=CPU
 
|name=Vulcan
 
|name=Vulcan
|designer=Broadcom
+
|designer=Broadcomm
 
|designer 2=Cavium
 
|designer 2=Cavium
 
|manufacturer=TSMC
 
|manufacturer=TSMC
Line 38: Line 38:
 
|predecessor=XLP II
 
|predecessor=XLP II
 
|predecessor link=broadcom/microarchitectures/larrabee
 
|predecessor link=broadcom/microarchitectures/larrabee
|successor=Triton
 
|successor link=cavium/microarchitectures/triton
 
 
}}
 
}}
 
'''Vulcan''' is a [[16 nm]] high-performance {{arch|64}} [[ARM]] microarchitecture designed by [[Broadcom]] and later introduced by [[Cavium]] for the server market.
 
'''Vulcan''' is a [[16 nm]] high-performance {{arch|64}} [[ARM]] microarchitecture designed by [[Broadcom]] and later introduced by [[Cavium]] for the server market.
Line 46: Line 44:
  
 
== History ==
 
== History ==
Vulcan can trace its roots all the way back to [[Raza Microelectronics]] {{raza|XLR}} family of [[MIPS]] processors from [[2006]]. With the introduction of their {{raza|XLR}} family in [[2009]], Raza (and later [[NetLogic]]) moved to a high-performance superscalar design with fine-grained 4-way multithreading support. In [[2011]], [[Broadcom]] acquired [[NetLogic Microsystems]] and integrated them into Broadcom's Embedded Processor Group.
+
Vulcan can trace its roots all the way back to [[Raza Microelectronics]] {{raza|XLR}} family of [[MIPS]] processors from [[2006]]. With the introduction of their {{raza|XLR}} family in [[2009]], Raza (and later [[NetLogic]]) moved to a high-performance superscalar design with fine-grained 4-way multithreading support. In [[2011]], [[Broadcom]] acquired [[NetLogic Microsystems]] and integrated them Broadcom's Embedded Processor Group.
  
In [[2013]], Broadcom announced that they have licensed the ARMv7 and ARMv8 ISAs, allowing them to develop their own micro-architectures based on the ISA. Vulcan is the outcome of this effort which involved adapting the existing core to the [[ARM]] ISA instead of [[MIPS]] and enhancing the cores in various ways. Vulcan development started in early [[2012]] and was expected to enter mass production in mid-[[2015]].
+
In [[2013]], Broadcom announced that they have licensed the ARMv7 and ARMv8 architectures, allowing them to develop their own microarchitectures based on the ISA. Vulcan is the outcome of this effort which involved adopting the [[ARM]] ISA instead of [[MIPS]] and enhancing the cores in various ways. Vulcan development started in early [[2012]] and has was expected to enter mass production in mid-[[2015]].
  
 
In [[2016]] [[Cavium]] acquired Vulcan from Broadcom which was introduced the following year. In early [[2018]], Vulcan-based microprocessor entered general availability under the {{cavium|ThunderX2}} brand.
 
In [[2016]] [[Cavium]] acquired Vulcan from Broadcom which was introduced the following year. In early [[2018]], Vulcan-based microprocessor entered general availability under the {{cavium|ThunderX2}} brand.
Line 89: Line 87:
 
=== Key changes from {{broadcom|XLP II|l=arch}} ===
 
=== Key changes from {{broadcom|XLP II|l=arch}} ===
 
* Converted to [[ARM]] ISA (from [[MIPS]])
 
* Converted to [[ARM]] ISA (from [[MIPS]])
** Aarch64
+
** Aarch64, Aarch32
 
* [[16 nm lithography process|16nm FinFET process]] (from [[28 nm|28 nm planar]])
 
* [[16 nm lithography process|16nm FinFET process]] (from [[28 nm|28 nm planar]])
 
* 40% IPC improvement
 
* 40% IPC improvement
Line 138: Line 136:
 
*** 256 KiB, 8-way set associative
 
*** 256 KiB, 8-way set associative
 
** L3 Cache
 
** L3 Cache
*** 16 tiles, line stripped
 
 
*** 1 MiB/core slice
 
*** 1 MiB/core slice
 
*** Shared
 
*** Shared
*** Exclusive of L2
 
**** Victim cache
 
 
** System DRAM
 
** System DRAM
*** 2 TiB Max Memory / socket
 
 
*** 8 Channels
 
*** 8 Channels
 
*** DDR4, up to 2666 MT/s
 
*** DDR4, up to 2666 MT/s
Line 196: Line 190:
  
 
==== Execution Units ====
 
==== Execution Units ====
Up to six µOPs can be sent into Vulcan's six execution units each cycle. As far as integer operations, up to three operations can be issued each cycle. One of the ALUs also handles branch instructions. Note that only the ALU on port 1 can perform complex integer operations (i.e., [[multiplication]] and [[division]]) in addition to the simple integer operations. The other two ALUs can only perform simple integer operations.
+
Up to six µOPs can be sent into Vulcan's six execution units each cycle. As far as integer operations, up to three operations can be issued each cycle. One of the ALUs also handles branch instructions. In the XLP II, there were two simple integer ALUs and a single complex integer ALU unit. Only the complex integer ALU unit was able to perform operations such as multiplication and division. Though unconfirmed, it's suspected that both ALUs can now do complex integer operations as well.
  
Vulcan has doubled the number of [[floating point]] units to two and widened them to 128-bit to support [[ARM]]'s {{arm|NEON}} operations (prior design was only 64-bit wide). In theory, Vulcan's peak performance now stands at 8 [[FLOPS]]/cycle or 8 GFLOPS at 1 GHz.
+
Vulcan has doubled the number of [[floating point]] units to two and widened them to 128-bit to support [[ARM]]'s {{arm|NEON}} operations (prior design was only 64-bit wide). In theory, Vulcan's peak performance now stands at 8 FLOPS/cycle or 8 GFLOPS at 1 GHz.
 
 
Port 1 has addition support for crypto operations supporting [[ARM]]'s crypto extension (e.g., ARM <code>AES</code>, <code>SHA1</code>, <code>SHA256</code> instructions).
 
  
 
=== Memory subsystem ===
 
=== Memory subsystem ===
Vulcan's memory subsystem deals with the loads and store requests and ordering. There are two [[load-store units]] each capable of moving 128-bit of data - double the bandwidth of the XLP II. The widening of the units was done in order to more efficiently support operations such as the Load Pair (<code>LDP</code>) and Store Pair (<code>STP</code>) instructions. In addition to the LSUs, there is a new dedicated Store Address unit. Similar to Intel's older architectures, the store operation is cracked into two distinct operations - a store address operation used to calculate the effective address and finally the store data operation. Vulcan can issue a store to the Store Address unit before the data is available where the address can be calculated and [[memory ordering conflicts]] can be detected. Once the data is ready, the operation will be reissued to the LSU. The [[store buffer]] is 36-entry deep with the [[load buffer]] at 64-entries for a total of 100 [[simultaneous memory operations]] in-flight or roughly 55% of all µOPs. Note that the store buffer is considerably smaller than the load buffer because Vulcan can only sustain a single store operation per cycle as most workloads do far more loads than stores.
+
Vulcan's memory subsystem deals with the loads and store requests and ordering. There are two [[load-store units]] each capable of moving 128-bit of data - double the bandwidth of the XLP II. The widening of the units was done in order to more efficiently support operations such as the Load Pair (<code>LDP</code>) and Store Pair (<code>STP</code>) instructions. In addition to the LSUs, there is a new dedicated Store Address unit. Similar to Intel's older architectures, the store operation is cracked into two distinct operations - a store address operation used to calculate the effective address and finally the store data operation. Vulcan can issue a store to the Store Address unit before the data is available where the address can be calculated and memory ordering conflicts can be detected. Once the data is ready, the operation will be reissued to the LSU. The store buffer is 36-entry deep with the load buffer at 64-entries for a total of 100 simultaneous memory operations in-flight or roughly 55% of all µOPs. Note that the store buffer is considerably smaller than the load buffer because Vulcan can only sustain a single load operation per cycle as most workloads do far more loads than stores.
 
 
Vulcan's L2 cache is 256 KiB, half that of prior design, and has an [[L2]] to [[L1]] bandwidth of 64 bytes per cycle in either direction. There is a 1 MiB L3 cache per core arranged as 2 MiB slices for a total of 32 MiB of cache shared by the entire chip. The L3 is [[exclusive cache]], filling up with evicted L2 cache lines.
 
  
==== Miss sequence ====
+
Vulcan's L2 cache is 256 KiB, half that of prior design, and has an L2 to L1 bandwidth of 64 bytes per cycle in either direction. There is a 1 MiB L3 cache per core arranged as 2 MiB slices for a total of 32 MiB of cache shared by the entire chip.
There are 16 tiles of L3 that are cache line stripped. though broken down into slices, there is no notion of L3 cache affinity to the cores. On an L2 miss, a hash is used to determine the home L3 cache hit. A check is done to determine if the cache line is found in that L3 tile and if found, return the data. Cavium implemented an enhanced version of the [[MOESI protocol]]. When not found, a [[snoop filter]] indicates the presence of the data in other cores. If present following a snoop, the owner transfers the line. On negative snoops for all cores, a DRAM request is issued to the memory controller.
 
  
 
== System Architecture ==
 
== System Architecture ==
Line 242: Line 231:
  
 
== All Vulcan Chips ==
 
== All Vulcan Chips ==
<!-- NOTE:
+
{{empty section}}
          This table is generated automatically from the data in the actual articles.
 
          If a microprocessor is missing from the list, an appropriate article for it needs to be
 
          created and tagged accordingly.
 
 
 
          Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 
-->
 
{{comp table start}}
 
<table class="comptable sortable tc4">
 
{{comp table header|main|5:List of Vulcan-based Processors}}
 
{{comp table header|main|5:Main processor}}
 
{{comp table header|cols|Launched|Cores|Threads|%Frequency|PCIe Lanes}}
 
{{#ask: [[Category:microprocessor models by cavium]] [[microarchitecture::Vulcan]]
 
|?full page name
 
|?model number
 
|?first launched
 
|?core count
 
|?thread count
 
|?base frequency#GHz
 
|?Has subobject.max pcie lanes
 
|format=template
 
|template=proc table 3
 
|userparam=7
 
|mainlabel=-
 
|valuesep=,
 
}}
 
{{comp table count|ask=[[Category:microprocessor models by cavium]] [[microarchitecture::Vulcan]]}}
 
</table>
 
{{comp table end}}
 
  
== Bibliography ==
+
== References ==
* Broadcom, personal communication, March, 2017
+
* ''Some information was obtained directly from Broadcom''
* Cavium, personal communication, June, 2018
+
* ''Some information was obtained directly from Cavium''
* Cavium Booth No: E–1000 (2018, June). "ThunderX2 Processor Family". ISC 2018, Frankfurt, Germany.
 
* Schor, D (2018, June). ''[https://fuse.wikichip.org/news/1316/a-look-at-caviums-new-high-performance-arm-microprocessors-and-the-isambard-supercomputer/ A Look at Cavium’s New High-Performance ARM Microprocessors and the Isambard Supercomputer]''.
 
  
 
== See also ==
 
== See also ==
 
* Qualcomm's {{qualcomm|Falkor|l=arch}}
 
* Qualcomm's {{qualcomm|Falkor|l=arch}}
 
* Intel's {{intel|Skylake (server)|Skylake|l=arch}}
 
* Intel's {{intel|Skylake (server)|Skylake|l=arch}}

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameVulcan +
core count16 +, 20 +, 24 +, 28 +, 30 + and 32 +
designerCavium + and Broadcom +
first launched2018 +
full page namecavium/microarchitectures/vulcan +
instance ofmicroarchitecture +
instruction set architectureARMv8.1 +
manufacturerTSMC +
microarchitecture typeCPU +
nameVulcan +
pipeline stages (max)15 +
pipeline stages (min)13 +
process16 nm (0.016 μm, 1.6e-5 mm) +