|Introduction|| 2014 (announced)|
|Process|| 28 nm|
0.028 μm, 16 nm
0.016 μm, 7 nm
0.007 μm, 5 nm
|Clock||733 MHz-1,600 MHz|
PEZY-SCx is a family of high-performance, low-power many-core microprocessors designed by PEZY for a series of supercomputer developed in Japan. PEZY collaborates closely with ExaScaler, a company that provides immersion cooling systems. Together, they have developed a series of supercomputers called ZettaScaler.
The basic architecture of all the PEZY-SCx chips is fairly similar. At the heart is the Processing Element. Depending on the model, 1000s of those PEs are then integrated on a single die.
Processing Element (PE)
The cores are called the processing elements (PE). The PEs are designed to be very simple RISC cores that are confgured as MIMD although in principle each PE can run different workloads. Each PE is a 16-stage in-order superscalar capable of issuing two instructions per cycle with out-of-order completion whenever possible supporting 8-way fine-grain simultaneous multithreading. A processing element supports 8-way SMT with dedicated register files for each thread. Threads are are interleaved each cycle with switching done to reduce forwarding and in order mitigate the lack of branch prediction. Explicit switching of active threads is also done in order to hide high latency operations.
The instruction set architecture implemented is a proprietary one designed by PEZY. The instruction set supports various operations such as data flashing, synchronization, acquisition of IDs, and thread switching. Each PE has an ID which is used by the code to track processes. The PEs do not maintain cache-coherency and there is no per-PE data cache. Complex instructions are processed by the Special Function Units (SFU) located in each city. A fair amount of sacrifices were made in order to ensure the cores remain small enough so that a large amount of them can be packed into a small area
Village & City
For every pair of PEs is 2 KiB of level 1 data cache. Each City is made of 64 KiB of L2 cache, a number of special function units, and 4 smaller blocks called "Villages". A village consists of four processing elements. Each city also contains a Special Function Unit (SFU) which is used to execute complex instructions.
The origin of the PEZY-SCx family is the PEZY-1, a 512-core chip.
- Main article: PEZY-SC
The first series of supercomputers, ZettaScaler-1.x, were based on the PEZY-SC. The PEZY-SC featured four "Prefecture", each consisting of 16 cities for a total of 256 PEs per Prefecture along with 2 MiB of L3 cache. This chip had four such Prefecture units for a total of 1,024 cores and 8,192 threads. Operating at 733 MHz, this chip was capable of 3 TFLOPS (single-precision) and 1.5 TFLOPS (double-precision). A number of signal-related issues particularly relating to PCIe signal failure were addressed by PEZY with the introduction of the PEZY-SCnp which made use of a new package ("np"). The PEZY-SCnp, while identical to the earlier model does feature slightly higher clock resulted in slightly higher peak performance.
- Main article: PEZY-SC2
|List of PEZY-SCx Processors|
|Model||Process||Launched||Cores||Threads||Die||Frequency||FLOPS (SP)||FLOPS (DP)|
|September 2014||1,024||8,192||411.6 mm²|
|6 May 2016||1,024||8,192||766.66 MHz|
- Intel Xeon Phi
- IEEE Cool Chips XVIII Symposium 2015.
- JSICR HPC (2015-HPC-152) "Suiren（睡蓮）による計算科学アプリケーションの性能評価" (Performance evaluation of scientific applications on Suiren System)
- The Fifth International Symposium on Computing and Networking 2017 (CANDAR'17). Keynote address.
|designer||PEZY Computing +|
|first announced||2014 +|
|first launched||2014 +|
|full page name||pezy/pezy-scx +|
|instance of||microprocessor family +|
|main designer||PEZY Computing +|
|process||28 nm (0.028 μm, 2.8e-5 mm) +, 16 nm (0.016 μm, 1.6e-5 mm) +, 7 nm (0.007 μm, 7.0e-6 mm) + and 5 nm (0.005 μm, 5.0e-6 mm) +|