From WikiChip
PEZY-SC2 - PEZY
< pezy‎ | pezy-scx

Edit Values
PEZY-SC2
General Info
DesignerPEZY
ManufacturerTSMC
Model NumberPEZY-SC2
MarketSupercomputer
Introduction2015 (announced)
2017 (launched)
General Specs
FamilyPEZY-SCx
Frequency1,000 MHz
Microarchitecture
Process16 nm
TechnologyCMOS
Die620 mm²
Cores2,048
Threads16,384
Electrical
Power dissipation180 W
Power dissipation (average)130 W
Vcore0.8 V

PEZY-SC2 (PEZY Super Computer 2) is a third generation many-core microprocessor developed by PEZY and introduced in early 2017. This chip, which operates at 1 GHz, incorporates 2,048 cores dissipating 180 W. The PEZY-SC2 powers the ZettaScaler-2.x series of supercomputers.

Overview[edit]

Introduced by PEZY along with their second-generation ZettaScaler-2.0 supercomputer series, the SC2 incorporates 2,048 cores along with 8-way SMT support for a total of 16,384 threads, twice as many cores as its predecessor. The PEZY-SC2 powers many of the top Green500 most efficient supercomputers with upward of 14 GFLOPS/watt in performance.

Operating at 1 GHz, the PEZY-SC2 has a peak performance of 8.192 TFLOPS (single-precision) and 4.1 TFLOPS (double-precision) while consuming around 180 Watts. The PEZY-SC2 is designed using over 2.4 billion gates and is manufactured on TSMC's 16FF+ process.

In attempt to increase adaptability in the field of deep learning and AI as well as to increase throughput for specialized workloads, the PEZY-SC2 introduced support for 16-bit half precision floating point support. At 1 GHz, the SC2 can peak at 16.4 TFLOPS for half precision.

Architecture[edit]

Managing the tiny PEZY cores are six P-Class P6600 MIPS (MIPS64R6) processors. Previously, the PEZY-SC relied on two lightweight ARM926 cores that proved to be too much of a performance bottleneck. The SC2 got rid of the four "Prefecture" units that incorporated 256 cities along with 2 MiB of L3 cache. Instead, the SC2 now has 40 MiB of shared last level cache shared not only by all the cities, but also by the MIPS cores. In order to improve performance further, the MIPS cores and the PEZY cores now share the same address space, reducing data transfer overhead. It's worth noting that the use of powerful MIPS cores mean they no longer require to rely on an external Intel Xeon E5 host processor.


pezy-sc2 main block.svg

Cache[edit]

The SC2 a hexa-core MIPS P6600 process which has its own separate cache:

[Edit/Modify Cache Info]

hierarchy icon.svg
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory.

The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC.

Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies.

Note: All units are in kibibytes and mebibytes.
L1$768 KiB
786,432 B
0.75 MiB
L1I$384 KiB
393,216 B
0.375 MiB
6x64 KiB4-way set associative 
L1D$384 KiB
393,216 B
0.375 MiB
6x64 KiB8-way set associative 

L2$2 MiB
2,048 KiB
2,097,152 B
0.00195 GiB
  1x2 MiB8-way set associative 

The SC2 integrates a multi-level cache hierarchy:

[Edit/Modify Cache Info]

hierarchy icon.svg
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory.

The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC.

Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies.

Note: All units are in kibibytes and mebibytes.
L1$12 MiB
12,288 KiB
12,582,912 B
L1I$8 MiB
8,192 KiB
8,388,608 B
   
L1D$4 MiB
4,096 KiB
4,194,304 B
   

L2$12 MiB
12,288 KiB
12,582,912 B
0.0117 GiB
     

L3$40 MiB
40,960 KiB
41,943,040 B
0.0391 GiB
  1x40 MiBshared LLC 

Additionally, there is another 40 MiB consisting of 20 KiB per PE of scratch pad memory. This was increased from 16 KiB in the Pezy-SC.

Memory controller[edit]

For main memory, the PEZY-SC2 supports 4 channels of 64-bit DDR4-3200 memory with ECC support for a total aggregated bandwidth of 95.37 GiB/s

[Edit/Modify Memory Info]

ram icons.svg
Integrated Memory Controller
Max TypeDDR4-3200
Supports ECCYes
Max Mem123 GiB
Controllers4
Channels4
Width64 bit
Max Bandwidth95.37 GiB/s
97,658.88 MiB/s
102.403 GB/s
102,402.758 MB/s
0.0931 TiB/s
0.102 TB/s
Bandwidth
Single 23.84GiB/s
Double 47.68 GiB/s
Quad 95.37 GiB/s

In addition to main memory bandwidth, the PEZY-SC2 supports Wide-IO with a width of 1,024 bit. The SC2 uses ThruChip Interface (TCI), a wireless near-field inductive coupling technology, in order to communicate with the TCI-DRAM chips which are packaged together. The SC2 features four TCI-DRAM interfaces, each providing a maximum bandwidth of 500 GB/s for a total aggregated bandwidth of 2 TB/s.

[Edit/Modify Memory Info]

ram icons.svg
Integrated Memory Controller
MemoryWide I/O
Rate2,000 MHz
Width1,024 bit
Channels4
Max Bandwidth1.863 TiB/s
1,907.712 GiB/s
1,953,497.088 MiB/s
2,048.39 GB/s
2,048,390.163 MB/s
2.048 TB/s

Expansions[edit]

The SC2 upgraded the PCIe interface to Gen4, supporting up to 64 GB/s. The number of lanes remain unchanged from the PEZY-SC, at 32.

[Edit/Modify Expansions Info]

ide icon.svg
Expansion Options
PCIeRevision: 4.0
Max Lanes: 32
Configuration: 4x8, 2x16

See also[edit]

Facts about "PEZY-SC2 - PEZY"
Has subobject
"Has subobject" is a predefined property representing a container construct and is provided by Semantic MediaWiki.
PEZY-SC2 - PEZY#pcie +
base frequency1,000 MHz (1 GHz, 1,000,000 kHz) +
core count2,048 +
core voltage0.8 V (8 dV, 80 cV, 800 mV) +
designerPEZY +
die area620 mm² (0.961 in², 6.2 cm², 620,000,000 µm²) +
familyPEZY-SCx +
first announced2015 +
first launched2017 +
full page namepezy/pezy-scx/pezy-sc2 +
has ecc memory supporttrue + and false +
instance ofmicroprocessor +
l1$ size768 KiB (786,432 B, 0.75 MiB) + and 12,288 KiB (12,582,912 B, 12 MiB) +
l1d$ description8-way set associative +
l1d$ size384 KiB (393,216 B, 0.375 MiB) + and 4,096 KiB (4,194,304 B, 4 MiB) +
l1i$ description4-way set associative +
l1i$ size384 KiB (393,216 B, 0.375 MiB) + and 8,192 KiB (8,388,608 B, 8 MiB) +
l2$ description8-way set associative +
l2$ size2 MiB (2,048 KiB, 2,097,152 B, 0.00195 GiB) + and 12 MiB (12,288 KiB, 12,582,912 B, 0.0117 GiB) +
l3$ descriptionshared LLC +
l3$ size40 MiB (40,960 KiB, 41,943,040 B, 0.0391 GiB) +
ldate3000 +
manufacturerTSMC +
market segmentSupercomputer +
max memory bandwidth95.37 GiB/s (97,658.88 MiB/s, 102.403 GB/s, 102,402.758 MB/s, 0.0931 TiB/s, 0.102 TB/s) + and 1,907.712 GiB/s (1,953,497.088 MiB/s, 2,048.39 GB/s, 2,048,390.163 MB/s, 1.863 TiB/s, 2.048 TB/s) +
max memory channels4 +
model numberPEZY-SC2 +
namePEZY-SC2 +
peak flops (double-precision)4,096,000,000,000 FLOPS (4,096,000,000 KFLOPS, 4,096,000 MFLOPS, 4,096 GFLOPS, 4.096 TFLOPS, 0.0041 PFLOPS, 4.096e-6 EFLOPS, 4.096e-9 ZFLOPS) +
peak flops (half-precision)16,384,000,000,000 FLOPS (16,384,000,000 KFLOPS, 16,384,000 MFLOPS, 16,384 GFLOPS, 16.384 TFLOPS, 0.0164 PFLOPS, 1.6384e-5 EFLOPS, 1.6384e-8 ZFLOPS) +
peak flops (single-precision)8,192,000,000,000 FLOPS (8,192,000,000 KFLOPS, 8,192,000 MFLOPS, 8,192 GFLOPS, 8.192 TFLOPS, 0.00819 PFLOPS, 8.192e-6 EFLOPS, 8.192e-9 ZFLOPS) +
power dissipation180 W (180,000 mW, 0.241 hp, 0.18 kW) +
power dissipation (average)130 W (130,000 mW, 0.174 hp, 0.13 kW) +
process16 nm (0.016 μm, 1.6e-5 mm) +
supported memory typeDDR4-3200 +
technologyCMOS +
thread count16,384 +