|Power dissipation||180 W|
|power dissipation (average)||130 W|
PEZY-SC2 (PEZY Super Computer 2) is a third generation many-core microprocessor developed by PEZY and introduced in early 2017. This chip, which operates at 1 GHz, incorporates 2,048 cores dissipating 180 W. The PEZY-SC2 powers the ZettaScaler-2.x series of supercomputers.
Introduced by PEZY along with their second-generation ZettaScaler-2.0 supercomputer series, the SC2 incorporates 2,048 cores along with 8-way SMT support for a total of 16,384 threads, twice as many cores as its predecessor. The PEZY-SC2 powers many of the top Green500 most efficient supercomputers with upward of 14 GFLOPS/watt in performance.
Operating at 1 GHz, the PEZY-SC2 has a peak performance of 8.192 TFLOPS (single-precision) and 4.1 TFLOPS (double-precision) while consuming around 180 Watts. The PEZY-SC2 is designed using over 2.4 billion gates and is manufactured on TSMC's 16FF+ process.
In attempt to increase adaptability in the field of deep learning and AI as well as to increase throughput for specialized workloads, the PEZY-SC2 introduced support for 16-bit half precision floating point support. At 1 GHz, the SC2 can peak at 16.4 TFLOPS for half precision.
Managing the tiny PEZY cores are six P-Class P6600 MIPS (MIPS64R6) processors. Previously, the PEZY-SC relied on two lightweight ARM926 cores that proved to be too much of a performance bottleneck. The SC2 got rid of the four "Prefecture" units that incorporated 256 cities along with 2 MiB of L3 cache. Instead, the SC2 now has 40 MiB of shared last level cache shared not only by all the cities, but also by the MIPS cores. In order to improve performance further, the MIPS cores and the PEZY cores now share the same address space, reducing data transfer overhead. It's worth noting that the use of powerful MIPS cores mean they no longer require to rely on an external Intel Xeon E5 host processor.
The SC2 integrates a multi-level cache hierarchy:
Additionally, there is another 40 MiB consisting of 20 KiB per PE of scratch pad memory. This was increased from 16 KiB in the Pezy-SC.
For main memory, the PEZY-SC2 supports 4 channels of 64-bit DDR4-3200 memory with ECC support for a total aggregated bandwidth of 95.37 GiB/s
In addition to main memory bandwidth, the PEZY-SC2 supports Wide-IO with a width of 1,024 bit. The SC2 uses ThruChip Interface (TCI), a wireless near-field inductive coupling technology, in order to communicate with the TCI-DRAM chips which are packaged together. The SC2 features four TCI-DRAM interfaces, each providing a maximum bandwidth of 500 GB/s for a total aggregated bandwidth of 2 TB/s.
The SC2 upgraded the PCIe interface to Gen4, supporting up to 64 GB/s. The number of lanes remain unchanged from the PEZY-SC, at 32.
|Has subobject||PEZY-SC2 - PEZY#pcie +|
|base frequency||1,000 MHz (1 GHz, 1,000,000 kHz) +|
|core count||2,048 +|
|core voltage||0.8 V (8 dV, 80 cV, 800 mV) +|
|die area||620 mm² (0.961 in², 6.2 cm², 620,000,000 µm²) +|
|first announced||2015 +|
|first launched||2017 +|
|full page name||pezy/pezy-scx/pezy-sc2 +|
|has ecc memory support||true + and false +|
|instance of||microprocessor +|
|l1$ size||0.75 MiB (768 KiB, 786,432 B, 7.324219e-4 GiB) + and 12 MiB (12,288 KiB, 12,582,912 B, 0.0117 GiB) +|
|l1d$ description||8-way set associative +|
|l1d$ size||0.375 MiB (384 KiB, 393,216 B, 3.662109e-4 GiB) + and 4 MiB (4,096 KiB, 4,194,304 B, 0.00391 GiB) +|
|l1i$ description||4-way set associative +|
|l1i$ size||0.375 MiB (384 KiB, 393,216 B, 3.662109e-4 GiB) + and 8 MiB (8,192 KiB, 8,388,608 B, 0.00781 GiB) +|
|l2$ description||8-way set associative +|
|l2$ size||2 MiB (2,048 KiB, 2,097,152 B, 0.00195 GiB) + and 12 MiB (12,288 KiB, 12,582,912 B, 0.0117 GiB) +|
|l3$ description||shared LLC +|
|l3$ size||40 MiB (40,960 KiB, 41,943,040 B, 0.0391 GiB) +|
|market segment||Supercomputer +|
|model number||PEZY-SC2 +|
|peak flops (double-precision)||4,096,000,000,000 FLOPS (4,096,000,000 KFLOPS, 4,096,000 MFLOPS, 4,096 GFLOPS, 4.096 TFLOPS, 0.0041 PFLOPS, 4.096e-6 EFLOPS, 4.096e-9 ZFLOPS) +|
|peak flops (half-precision)||16,384,000,000,000 FLOPS (16,384,000,000 KFLOPS, 16,384,000 MFLOPS, 16,384 GFLOPS, 16.384 TFLOPS, 0.0164 PFLOPS, 1.6384e-5 EFLOPS, 1.6384e-8 ZFLOPS) +|
|peak flops (single-precision)||8,192,000,000,000 FLOPS (8,192,000,000 KFLOPS, 8,192,000 MFLOPS, 8,192 GFLOPS, 8.192 TFLOPS, 0.00819 PFLOPS, 8.192e-6 EFLOPS, 8.192e-9 ZFLOPS) +|
|power dissipation||180 W (180,000 mW, 0.241 hp, 0.18 kW) +|
|power dissipation (average)||130 W (130,000 mW, 0.174 hp, 0.13 kW) +|
|process||16 nm (0.016 μm, 1.6e-5 mm) +|
|supported memory type||DDR4-3200 +|
|thread count||16,384 +|