Editing nec/microarchitectures/sx-aurora

{{nec title|SX-Aurora|arch}}
{{microarchitecture
|atype=VPU
|name=SX-Aurora
|designer=NEC
|manufacturer=TSMC
|introduction=2018
|cores=8
|type=Superscalar
|type 2=Pipelined
|oooe=Yes
|speculative=Yes
|renaming=Yes
|stages=8
|decode=4-way
|l1i=32 KiB
|l1i per=core
|l1d=32 KiB
|l1d per=core
|l2=256 KiB
|l2 per=core
|l3=16 MiB
|l3 per=chip
|predecessor=SX-ACE
|predecessor link=nec/microarchitectures/sx-ace
}}
'''SX-Aurora''' is [[NEC]]'s successor to the {{\\|SX-ACE}}, a [[16 nm]] microarchitecture for [[vector processors]] first introduced in [[2018]].

== History ==
{{empty section}}

== Architecture ==
=== Key changes from {{\\|SX-ACE}} ===
* [[16 nm process]] (from [[28 nm]])
* 1.6x frequency (1.6 GHz, up from 1 GHz)
* 2x vector cores (8, up from 4)
* Vector core
** 1.5x FMAs EUs (3, up from 2)
** 2x VPPs (32, up from 16)
** 3x [[FLOPs]]/cycle (192 FLOPs/cycle, up from 64 FLOPs/cycle)
* Memory
** 16 MiB L3 [[LLC]]
** 6x [[HBM2]] (from 12x [[DDR3]])
*** 4.7x memory bandwidth (1.2 TB/s, up from 256 GB/s)
{{expand list}}

== Block Diagram ==
=== Entire SoC ===
:[[File:sx-aurora block diagram.svg|700px]]

=== Vector core ===
:[[File:sx-aurora vector core block diagram.svg|1200px]]

== Memory Hierarchy ==
* Vector core
** SPU
*** L1I Cache:
**** 32 KiB
*** L1D Cache:
**** 32 KiB
*** L2 Cache:
**** 256 KiB
** VPU
*** 120 KiB [[load buffer]]
*** 64 KiB [[store buffer]]
* L3 Cache/LLC:
** 16 MiB
*** 8 x 2 MiB
*** [[write-back]]
*** inclusive of L1 & L2
*** 128 banks
*** 3 TiB/s bandwidth
* System DRAM:
** 4Hi / 8Hi [[HBM2]]
** 6 SDRAM [[KGD]] stacks
** 1.2 TiB/s

== Overview ==
[[File:sx-aurora overview.svg|thumb|right|400px|Overview of the SX-Aurora]]
The SX-Aurora is [[NEC]]'s successor to the {{\\|SX-ACE}}, a [[vector processor]] designed for [[high-performance]] scientific/research applications and supercomputers. The SX-Aurora deviates from all prior chips in the kind of markets it's designed to address. Therefore, NEC made slightly different design choice compared to prior generations of vector processors. In an attempt to broaden their market, NEC extended beyond supercomputers to the conventional server and workstation market. This is done through the use of [[PCIe]]-based [[accelerator cards]].

Moving to an accelerator card is not without its challenges. To keep the high memory bandwidth, and thus high [[bytes per FLOP]], while moving a smaller [[form factor]], it was necessary to drop the large amount of DDR memory channels. Instead, NEC opted to utilizing [[high-bandwidth memory]] on-chip instead. The card itself is designed to communicate with other cards on the system in order to scale up from just a single card for workstation use to a supercomputer with 64 cards per rack.

The chip itself consists of eight very [[big cores]] along with 16 MiB of [[last level cache]] on a 2-dimensional mesh. Attached to the LLC are the two memory controllers which interface with the six [[high-bandwidth memory]] sitting on an [[interposer]]. Fabricated on [[TSMC]]'s [[16 nm process]], the SX-Aurora operates at up to 1.6 GHz delivering up to 307.2 [[gigaFLOPS]] ([[double-precision]]) per core for a total of up to 2.45 [[teraFLOPS]].

== Package ==
The SX-Aurora chip uses six [[HBM2]] stacks. Those are either 4 Hi or 8 Hi stacks. The chip utilizes [[TSMC]]'s second-generation [[chip on wafer on substrate]] ([[CoWoS]]) technology with NEC's implementation developed in collaboration with [[TSMC]] and [[Broadcom]]. This chip became the world's first to utilize six HBM2s. 
  
<gallery widths=400px heights=300px>
File:sx-aurora chip.png
File:sx-aurora chip (annotated).png
</gallery>

The package itself is very big at 60 mm x 60 mm. The VE processor die itself is 15 mm x 33 mm with a very large interposer with a total Si area of 1,235 mm² (32.5 mm x 38 mm).


:[[File:nec sx-aurora tsubasa package.svg|800px]]

Though other chips have reached very large interposer sizes before, the SX-Aurora is the first 6 HBM2 implementation. It uses the second-generation [[CoWoS]] packaging technology ([[CoWoS-XL2]]) to exceed the [[reticle size]] through the use of mask stitching.

:[[File:sx-aurora-package-xsection.svg|800px]]


== Vector engine (VE) card ==
{{empty section}}

== Die ==
* [[16 nm process]]
* 4,800,000,000 transistors
* 14.96 mm x 33.00 mm
** 493.68 mm² die size

== Bibliography ==
* {{hcbib|30}}
* Supercomputing 2018, NEC Aurora Forum
* ''Some information was obtained directly from NEC''
codename	SX-Aurora +
core count	8 +
designer	NEC +
first launched	2018 +
full page name	nec/microarchitectures/sx-aurora +
instance of	microarchitecture +
manufacturer	TSMC +
name	SX-Aurora +
pipeline stages	8 +