Editing nervana/microarchitectures/spring crest

{{nervana title|Spring Crest|arch}}
{{microarchitecture
|atype=NPU
|name=Spring Crest
|designer=Nervana
|manufacturer=Intel
|introduction=2019
|process=16 nm
|processing elements=24
|predecessor=Lake Crest
|predecessor link=nervana/microarchitectures/lake crest
}}
'''Spring Crest''' ('''SCR''') is the successor to {{\\|Lake Crest}}, a planned [[neural processor]] microarchitecture designed by [[Intel Nervana]].

Produces based on Spring Crest are branded as the {{nervana|NNP|NNP L-1000 series}}.

== Process Technology ==
Spring Crest is fabricated on [[TSMC]]'s [[16 nm process]].

== Architecture ==
Spring Crest largely builds on the prior generation but introduces more enhancements and compute.

=== Key changes from {{\\|Lake Crest}} ===
* [[16 nm process]] (from [[28 nm]])
* 2x computer clusters (24 CCs, up from 12)
** Support [[Bfloat16]] (from [[Flexpoint]])
* 33% more InterChip Links (16 ICLs, up from 12)

{{expand list}}

=== Block Diagram ===
==== Chip ====
:[[File:spring crest block diagram.svg|700px]]

==== TCP ====
:[[File:spring crest tpc block diagram.svg|700px]]

== Overview ==
[[File:spring crest overview.svg|right|400px]]
Spring Crest is the successor to {{\\|Lake Crest}}, Intel Nervana's first commercial [[neural processor]] that made it to mass production. The chip itself is designed for training at the data center. To that end, it's designed as a [[PCIe Gen 4]] x16 [[accelerator card]] as well as an [[OCP Accelerator Module]] (OAM). Spring Crest is a data center training [[accelerator]], optimized for the fastest time-to-train and highest power efficiency.

The chip features 24 high-performance ''tensor processor clusters'' (TPCs), each incorporating two ''MAC processing units'' (MPU) along with a large pool of high-banked high-bandwidth memory. Each of the MPU pairs integrates a 32x32 array for a total of 98,304 [[FLOPs]] each cycle. Spring Crest uses [[bfloat16]] with a 32-bit (SP FP) accumulate. Bandwidth is favored over latency everywhere. The entire chip is linked using a 2D mesh [[NoC]].

Spring Crest is fabricated on [[TSMC]] [[16FF+|16-nanometer]] process and utilizes its [[CoWoS]] [[packaging technology]] to integrate four stack of [[HBM2]] (8Hi) on an [[interposer]] for a total capacity of 32 GiB operating at 2400 MT/s.

The chip also exposes four InterChip Links (ICL) ports comprising x16 (4×4) SerDes for a total of 64 SerDes. The ICL ports operate at 112 Gbps for a total bidirectional bandwidth of 3.58 Tbps. A full system and incorporate up to 1024 Spring Crest processors and behave like one single chip with a consistent programming model.

== Tensor Processing Cluster (TPC) ==
{{empty section}}

=== MAC Processing Unit (MPU) ===
{{empty section}}

=== Memory Subsystem ===
{{empty section}}

== Network-on-Chip (NoC) ==
{{empty section}}

== Scalability ==
{{empty section}}

== Package ==
* 60 mm x 60 mm package
** 6-2-6 layer stackup
** BGA with 3,325 pins
* 1 die, 4 HBM 8 GiB stacks
* 1200 mm² [[CoWoS]]


:[[File:intel nnp-l chip.png|500px]]

== Die ==
* [[16 nm process]]
* 680 mm²
* 27,000,000,000 transistors


:[[File:spring crest floorplan.png|600px]]
codename	Spring Crest +
designer	Nervana + and Intel +
first launched	2019 +
full page name	nervana/microarchitectures/spring crest +
instance of	microarchitecture +
manufacturer	Intel +
name	Spring Crest +
process	16 nm (0.016 μm, 1.6e-5 mm) +
processing element count	24 +