From WikiChip
Difference between revisions of "pezy/pezy-scx"
< pezy

(Processing Element (PE))
Line 42: Line 42:
 
====  Processing Element (PE) ====
 
====  Processing Element (PE) ====
 
[[File:pezy-sc pe.svg|right|200px]]
 
[[File:pezy-sc pe.svg|right|200px]]
The [[physical core|cores]] are called the '''processing elements''' ('''PE'''). The PEs are designed to be very simple [[RISC]] cores that are confused as [[MIMD]] although in principle each PE can run different workloads. Each PE is a 16-stage [[in-order]] [[superscalar]] capable of issuing two instructions per cycle with [[out-of-order]] completion whenever possible supporting 8-way fine-grain [[simultaneous multithreading]]. A processing element supports 8-way SMT with dedicated register files for each thread. Threads are are interleaved each cycle with switching done to reduce [[forwarding]] and in order mitigate the lack of [[branch prediction]]. Explicit switching of active threads is also done in order to hide high latency operations.  
+
The [[physical core|cores]] are called the '''processing elements''' ('''PE'''). The PEs are designed to be very simple [[RISC]] cores that are confgured as [[MIMD]] although in principle each PE can run different workloads. Each PE is a 16-stage [[in-order]] [[superscalar]] capable of issuing two instructions per cycle with [[out-of-order]] completion whenever possible supporting 8-way fine-grain [[simultaneous multithreading]]. A processing element supports 8-way SMT with dedicated register files for each thread. Threads are are interleaved each cycle with switching done to reduce [[forwarding]] and in order mitigate the lack of [[branch prediction]]. Explicit switching of active threads is also done in order to hide high latency operations.  
  
 
The instruction set architecture implemented is a proprietary one designed by PEZY. The instruction set supports various operations such as data flashing, synchronization, acquisition of IDs, and thread switching. Each PE has an ID which is used by the code to track processes. The PEs do not maintain [[cache-coherency]] and there is no per-PE [[data cache]]. Complex instructions are processed by the Special Function Units (SFU) located in each city. A fair amount of sacrifices were made in order to ensure the cores remain small enough so that a large amount of them can be packed into a small area
 
The instruction set architecture implemented is a proprietary one designed by PEZY. The instruction set supports various operations such as data flashing, synchronization, acquisition of IDs, and thread switching. Each PE has an ID which is used by the code to track processes. The PEs do not maintain [[cache-coherency]] and there is no per-PE [[data cache]]. Complex instructions are processed by the Special Function Units (SFU) located in each city. A fair amount of sacrifices were made in order to ensure the cores remain small enough so that a large amount of them can be packed into a small area

Revision as of 08:25, 23 July 2018

PEZY-SCx
Developer PEZY Computing
Manufacturer TSMC
Type Microprocessors
Introduction 2014 (announced)
2014 (launch)
Architecture Many-core architecture
Process 28 nm
0.028 μm
2.8e-5 mm
, 16 nm
0.016 μm
1.6e-5 mm
, 7 nm
0.007 μm
7.0e-6 mm
, 5 nm
0.005 μm
5.0e-6 mm
Technology CMOS
Clock 733 MHz-1,600 MHz

PEZY-SCx (PEZY-SuperComputerx) is a family of many-core microprocessors designed by PEZY. Those processors power many of Japan's most efficient supercomputers.

Overview

PEZY-SCx is a family of high-performance, low-power many-core microprocessors designed by PEZY for a series of supercomputer developed in Japan. PEZY collaborates closely with ExaScaler, a company that provides immersion cooling systems. Together, they have developed a series of supercomputers called ZettaScaler.

Architecture

The basic architecture of all the PEZY-SCx chips is fairly similar. At the heart is the Processing Element. Depending on the model, 1000s of those PEs are then integrated on a single die.

The PEZY-SCx are designed as accelerators, that is, the a host processor (typically an Intel Xeon E5) off-loads the PEZY-SC code to execute. Those chips support OpenCL-like programming called PZCL.

Processing Element (PE)

pezy-sc pe.svg

The cores are called the processing elements (PE). The PEs are designed to be very simple RISC cores that are confgured as MIMD although in principle each PE can run different workloads. Each PE is a 16-stage in-order superscalar capable of issuing two instructions per cycle with out-of-order completion whenever possible supporting 8-way fine-grain simultaneous multithreading. A processing element supports 8-way SMT with dedicated register files for each thread. Threads are are interleaved each cycle with switching done to reduce forwarding and in order mitigate the lack of branch prediction. Explicit switching of active threads is also done in order to hide high latency operations.

The instruction set architecture implemented is a proprietary one designed by PEZY. The instruction set supports various operations such as data flashing, synchronization, acquisition of IDs, and thread switching. Each PE has an ID which is used by the code to track processes. The PEs do not maintain cache-coherency and there is no per-PE data cache. Complex instructions are processed by the Special Function Units (SFU) located in each city. A fair amount of sacrifices were made in order to ensure the cores remain small enough so that a large amount of them can be packed into a small area

Village & City

For every pair of PEs is 2 KiB of level 1 data cache. Each City is made of 64 KiB of L2 cache, a number of special function units, and 4 smaller blocks called "Villages". A village consists of four processing elements. Each city also contains a Special Function Unit (SFU) which is used to execute complex instructions.

pezy-sc city.svg

Models

The origin of the PEZY-SCx family is the PEZY-1, a 512-core chip.

1st generation

Main article: PEZY-SC

The first series of supercomputers, ZettaScaler-1.x, were based on the PEZY-SC. The PEZY-SC featured four "Prefecture", each consisting of 16 cities for a total of 256 PEs per Prefecture along with 2 MiB of L3 cache. This chip had four such Prefecture units for a total of 1,024 cores and 8,192 threads. Operating at 733 MHz, this chip was capable of 3 TFLOPS (single-precision) and 1.5 TFLOPS (double-precision). A number of signal-related issues particularly relating to PCIe signal failure were addressed by PEZY with the introduction of the PEZY-SCnp which made use of a new package ("np"). The PEZY-SCnp, while identical to the earlier model does feature slightly higher clock resulted in slightly higher peak performance.


pezy-sc main block.svg

2nd generation

Main article: PEZY-SC2

The second series of supercomputers, ZettaScaler-2.x, were based on the PEZY-SC2.

pezy-sc2 main block.svg

future generations

PEZY has laid out future generations based on TSMC's 7nm and 5nm processes.

Summary

 List of PEZY-SCx Processors
 Main FeaturesPerformance
ModelProcessLaunchedCoresThreadsDieFrequencyFLOPS (SP)FLOPS (DP)
PEZY-SC45 nm
0.005 μm
5.0e-6 mm
202016,384131,072740 mm²
1.147 in²
7.4 cm²
740,000,000 µm²
1,600 MHz
1.6 GHz
1,600,000 kHz
104.858 TFLOPS
104,857,600,000,000 FLOPS
104,857,600,000 KFLOPS
104,857,600 MFLOPS
104,857.6 GFLOPS
0.105 PFLOPS
52.429 TFLOPS
52,428,800,000,000 FLOPS
52,428,800,000 KFLOPS
52,428,800 MFLOPS
52,428.8 GFLOPS
0.0524 PFLOPS
PEZY-SC37 nm
0.007 μm
7.0e-6 mm
20198,19265,536700 mm²
1.085 in²
7 cm²
700,000,000 µm²
1,333.333 MHz
1.333 GHz
1,333,333 kHz
43.691 TFLOPS
43,690,666,655,744 FLOPS
43,690,666,655.744 KFLOPS
43,690,666.656 MFLOPS
43,690.667 GFLOPS
0.0437 PFLOPS
21.845 TFLOPS
21,845,333,327,872 FLOPS
21,845,333,327.872 KFLOPS
21,845,333.328 MFLOPS
21,845.333 GFLOPS
0.0218 PFLOPS
PEZY-SC216 nm
0.016 μm
1.6e-5 mm
20172,04816,384620 mm²
0.961 in²
6.2 cm²
620,000,000 µm²
1,000 MHz
1 GHz
1,000,000 kHz
8.192 TFLOPS
8,192,000,000,000 FLOPS
8,192,000,000 KFLOPS
8,192,000 MFLOPS
8,192 GFLOPS
0.00819 PFLOPS
4.096 TFLOPS
4,096,000,000,000 FLOPS
4,096,000,000 KFLOPS
4,096,000 MFLOPS
4,096 GFLOPS
0.0041 PFLOPS
PEZY-SC28 nm
0.028 μm
2.8e-5 mm
September 20141,0248,192411.6 mm²
0.638 in²
4.116 cm²
411,600,000 µm²
733.33 MHz
0.733 GHz
733,330 kHz
3.004 TFLOPS
3,003,733,331,968 FLOPS
3,003,733,331.968 KFLOPS
3,003,733.332 MFLOPS
3,003.733 GFLOPS
0.003 PFLOPS
1.502 TFLOPS
1,501,866,665,984 FLOPS
1,501,866,665.984 KFLOPS
1,501,866.666 MFLOPS
1,501.867 GFLOPS
0.0015 PFLOPS
PEZY-SCnp28 nm
0.028 μm
2.8e-5 mm
6 May 20161,0248,192766.66 MHz
0.767 GHz
766,660 kHz
3.14 TFLOPS
3,140,266,663,936 FLOPS
3,140,266,663.936 KFLOPS
3,140,266.664 MFLOPS
3,140.267 GFLOPS
0.00314 PFLOPS
1.57 TFLOPS
1,570,133,331,968 FLOPS
1,570,133,331.968 KFLOPS
1,570,133.332 MFLOPS
1,570.133 GFLOPS
0.00157 PFLOPS
Count: 5

See also

Facts about "PEZY-SCx - PEZY"
designerPEZY Computing +
first announced2014 +
first launched2014 +
full page namepezy/pezy-scx +
instance ofmicroprocessor family +
main designerPEZY Computing +
manufacturerTSMC +
namePEZY-SCx +
process28 nm (0.028 μm, 2.8e-5 mm) +, 16 nm (0.016 μm, 1.6e-5 mm) +, 7 nm (0.007 μm, 7.0e-6 mm) + and 5 nm (0.005 μm, 5.0e-6 mm) +
technologyCMOS +