From WikiChip
Difference between revisions of "nervana/microarchitectures/spring crest"
< nervana

(Overview)
Line 63: Line 63:
 
== Package ==
 
== Package ==
 
* 60 mm x 60 mm package
 
* 60 mm x 60 mm package
** 6-2-6 layer stackup (BGA)
+
** 6-2-6 layer stackup
** 3,325 pins
+
** BGA with 3,325 pins
 
* 1 die, 4 HBM 8 GiB stacks
 
* 1 die, 4 HBM 8 GiB stacks
 +
* 1200 mm² [[CoWoS]]
 +
  
 
:[[File:intel nnp-l chip.png|500px]]
 
:[[File:intel nnp-l chip.png|500px]]
Line 73: Line 75:
 
* 680 mm²
 
* 680 mm²
 
* 27,000,000,000 transistors
 
* 27,000,000,000 transistors
* 1200 mm² [[CoWoS]]
 
  
  
 
:[[File:spring crest floorplan.png|600px]]
 
:[[File:spring crest floorplan.png|600px]]

Revision as of 22:29, 9 November 2019

Edit Values
Spring Crest µarch
General Info
Arch TypeNPU
DesignerNervana
ManufacturerIntel
Introduction2019
Process16 nm
PE Configs24
Succession

Spring Crest (SCR) is the successor to Lake Crest, a planned neural processor microarchitecture designed by Intel Nervana.

Produces based on Spring Crest are branded as the NNP L-1000 series.

Process Technology

Spring Crest is fabricated on TSMC's 16 nm process.

Architecture

Spring Crest largely builds on the prior generation but introduces more enhancements and compute.

Key changes from Lake Crest

This list is incomplete; you can help by expanding it.

Block Diagram

Chip

spring crest block diagram.svg

TCP

spring crest tpc block diagram.svg

Overview

spring crest overview.svg

Spring Crest is the successor to Lake Crest, Intel Nervana's first commercial neural processor that made it to mass production. The chip itself is designed for training at the data center. To that end, it's designed as a PCIe Gen 4 x16 accelerator card as well as an OCP Accelerator Module (OAM). Spring Crest is a data center training accelerator, optimized for the fastest time-to-train and highest power efficiency.

The chip features 24 high-performance tensor processor clusters (TPCs), each incorporating two MAC processing units (MPU) along with a large pool of high-banked high-bandwidth memory. Each of the MPU pairs integrates a 32x32 array for a total of 98,304 FLOPs each cycle. Spring Crest uses bfloat16 with a 32-bit (SP FP) accumulate. Bandwidth is favored over latency everywhere. The entire chip is linked using a 2D mesh NoC.

Spring Crest is fabricated on TSMC 16-nanometer process and utilizes its CoWoS packaging technology to integrate four stack of HBM2 (8Hi) on an interposer for a total capacity of 32 GiB operating at 2400 MT/s.

The chip also exposes four InterChip Links (ICL) ports comprising x16 (4×4) SerDes for a total of 64 SerDes. The ICL ports operate at 112 Gbps for a total bidirectional bandwidth of 3.58 Tbps. A full system and incorporate up to 1024 Spring Crest processors and behave like one single chip with a consistent programming model.

Tensor Processing Cluster (TPC)

New text document.svg This section is empty; you can help add the missing info by editing this page.

MAC Processing Unit (MPU)

New text document.svg This section is empty; you can help add the missing info by editing this page.

Memory Subsystem

New text document.svg This section is empty; you can help add the missing info by editing this page.

Network-on-Chip (NoC)

New text document.svg This section is empty; you can help add the missing info by editing this page.

Scalability

New text document.svg This section is empty; you can help add the missing info by editing this page.

Package

  • 60 mm x 60 mm package
    • 6-2-6 layer stackup
    • BGA with 3,325 pins
  • 1 die, 4 HBM 8 GiB stacks
  • 1200 mm² CoWoS


intel nnp-l chip.png

Die


spring crest floorplan.png
codenameSpring Crest +
designerNervana +
first launched2019 +
full page namenervana/microarchitectures/spring crest +
instance ofmicroarchitecture +
manufacturerIntel +
nameSpring Crest +
process16 nm (0.016 μm, 1.6e-5 mm) +
processing element count24 +