From WikiChip
Difference between revisions of "nervana/microarchitectures/spring crest"
< nervana

(Network-on-Chip (NoC))
(Overview)
Line 37: Line 37:
  
 
== Overview ==
 
== Overview ==
Spring Crest is the successor to {{\\|Knights Crest}} and is first commercial [[neural processor]] designed by Intel Nervana that made it to mass production. The chip itself is designed for training at the data center. Spring Crest is a data center training [[accelerator]], optimized for the fastest time-to-train and highest power efficiency.
+
[[File:spring crest overview.svg|right|400px]]
 +
Spring Crest is the successor to {{\\|Lake Crest}}, Intel Nervana's first commercial [[neural processor]] that made it to mass production. The chip itself is designed for training at the data center. To that end, it's designed as a [[PCIe Gen 4]] x16 [[accelerator card]] as well as an [[OCP Accelerator Module]] (OAM). Spring Crest is a data center training [[accelerator]], optimized for the fastest time-to-train and highest power efficiency.
 +
 
 +
The chip features 24 high-performance ''tensor processor clusters'' (TPCs), each incorporating two ''MAC processing units'' (MPU) along with a large pool of high-banked high-bandwidth memory. Each of the MPU pairs integrates a 32x32 array for a total of 98,304 [[FLOPs]] each cycle. Spring Crest uses [[bfloat16]] with a 32-bit (SP FP) accumulate. Bandwidth is favored over latency everywhere. The entire chip is linked using a 2D mesh [[NoC]].
 +
 
 +
Spring Crest is fabricated on [[TSMC]] [[16FF+|16-nanometer]] process and utilizes its [[CoWoS]] [[packaging technology]] to integrate four stack of [[HBM2]] (8Hi) on an [[interposer]] for a total capacity of 32 GiB operating at 2400 MT/s.
 +
 
 +
The chip also exposes four InterChip Links (ICL) ports comprising x16 (4×4) SerDes for a total of 64 SerDes. The ICL ports operate at 112 Gbps for a total bidirectional bandwidth of 3.58 Tbps. A full system and incorporate up to 1024 Spring Crest processors and behave like one single chip with a consistent programming model.
  
 
== Tensor Processing Cluster (TPC) ==
 
== Tensor Processing Cluster (TPC) ==

Revision as of 21:24, 9 November 2019

Edit Values
Spring Crest µarch
General Info
Arch TypeNPU
DesignerNervana
ManufacturerIntel
Introduction2019
Process16 nm
PE Configs24
Succession

Spring Crest (SCR) is the successor to Lake Crest, a planned neural processor microarchitecture designed by Intel Nervana.

Produces based on Spring Crest are branded as the NNP L-1000 series.

Process Technology

Spring Crest is fabricated on TSMC's 16 nm process.

Architecture

Spring Crest largely builds on the prior generation but introduces more enhancements and compute.

Key changes from Lake Crest

This list is incomplete; you can help by expanding it.

Block Diagram

Chip

spring crest block diagram.svg

TCP

spring crest tpc block diagram.svg

Overview

spring crest overview.svg

Spring Crest is the successor to Lake Crest, Intel Nervana's first commercial neural processor that made it to mass production. The chip itself is designed for training at the data center. To that end, it's designed as a PCIe Gen 4 x16 accelerator card as well as an OCP Accelerator Module (OAM). Spring Crest is a data center training accelerator, optimized for the fastest time-to-train and highest power efficiency.

The chip features 24 high-performance tensor processor clusters (TPCs), each incorporating two MAC processing units (MPU) along with a large pool of high-banked high-bandwidth memory. Each of the MPU pairs integrates a 32x32 array for a total of 98,304 FLOPs each cycle. Spring Crest uses bfloat16 with a 32-bit (SP FP) accumulate. Bandwidth is favored over latency everywhere. The entire chip is linked using a 2D mesh NoC.

Spring Crest is fabricated on TSMC 16-nanometer process and utilizes its CoWoS packaging technology to integrate four stack of HBM2 (8Hi) on an interposer for a total capacity of 32 GiB operating at 2400 MT/s.

The chip also exposes four InterChip Links (ICL) ports comprising x16 (4×4) SerDes for a total of 64 SerDes. The ICL ports operate at 112 Gbps for a total bidirectional bandwidth of 3.58 Tbps. A full system and incorporate up to 1024 Spring Crest processors and behave like one single chip with a consistent programming model.

Tensor Processing Cluster (TPC)

New text document.svg This section is empty; you can help add the missing info by editing this page.

MAC Processing Unit (MPU)

New text document.svg This section is empty; you can help add the missing info by editing this page.

Memory Subsystem

New text document.svg This section is empty; you can help add the missing info by editing this page.

Network-on-Chip (NoC)

New text document.svg This section is empty; you can help add the missing info by editing this page.

Scalability

New text document.svg This section is empty; you can help add the missing info by editing this page.

Package

  • 60 mm x 60 mm package
    • 6-2-6 layer stackup (BGA)
    • 3,325 pins
  • 1 die, 4 HBM 8 GiB stacks
intel nnp-l chip.png

Die


spring crest floorplan.png
codenameSpring Crest +
designerNervana + and Intel +
first launched2019 +
full page namenervana/microarchitectures/spring crest +
instance ofmicroarchitecture +
manufacturerIntel +
nameSpring Crest +
process16 nm (0.016 μm, 1.6e-5 mm) +
processing element count24 +