(11 intermediate revisions by 2 users not shown) | |||
Line 37: | Line 37: | ||
=== Architecture === | === Architecture === | ||
The basic architecture of all the PEZY-SCx chips is fairly similar. At the heart is the Processing Element. Depending on the model, 1000s of those PEs are then integrated on a single [[die]]. | The basic architecture of all the PEZY-SCx chips is fairly similar. At the heart is the Processing Element. Depending on the model, 1000s of those PEs are then integrated on a single [[die]]. | ||
+ | |||
+ | The PEZY-SCx are designed as accelerators, that is, the a host processor (typically an [[Intel]] {{intel|Xeon E5}}) off-loads the PEZY-SC code to execute. Those chips support OpenCL-like programming called PZCL. | ||
==== Processing Element (PE) ==== | ==== Processing Element (PE) ==== | ||
[[File:pezy-sc pe.svg|right|200px]] | [[File:pezy-sc pe.svg|right|200px]] | ||
− | The [[physical core|cores]] are called the '''processing elements''' ('''PE'''). The PEs are designed to be very simple [[RISC]] cores that | + | The [[physical core|cores]] are called the '''processing elements''' ('''PE'''). The PEs are designed to be very simple [[RISC]] cores that are confgured as [[MIMD]] although in principle each PE can run different workloads. Each PE is a 16-stage [[in-order]] [[superscalar]] capable of issuing two instructions per cycle with [[out-of-order]] completion whenever possible supporting 8-way fine-grain [[simultaneous multithreading]]. A processing element supports 8-way SMT with dedicated register files for each thread. Threads are are interleaved each cycle with switching done to reduce [[forwarding]] and in order mitigate the lack of [[branch prediction]]. Explicit switching of active threads is also done in order to hide high latency operations. |
− | The instruction set architecture implemented is a proprietary one designed by PEZY. The PEs do not maintain cache-coherency and there is no per-PE data cache. A fair amount of sacrifices were made in order to ensure the cores remain small enough so that a large amount of them can be packed into a small area. | + | The instruction set architecture implemented is a proprietary one designed by PEZY. The instruction set supports various operations such as data flashing, synchronization, acquisition of IDs, and thread switching. Each PE has an ID which is used by the code to track processes. The PEs do not maintain [[cache-coherency]] and there is no per-PE [[data cache]]. Complex instructions are processed by the Special Function Units (SFU) located in each city. A fair amount of sacrifices were made in order to ensure the cores remain small enough so that a large amount of them can be packed into a small area |
+ | |||
+ | ==== Village & City ==== | ||
+ | For every pair of PEs is 2 KiB of [[level 1 data cache]]. Each '''City''' is made of 64 KiB of [[L2 cache]], a number of special function units, and 4 smaller blocks called "Villages". A '''village''' consists of four processing elements. Each city also contains a Special Function Unit (SFU) which is used to execute complex instructions. | ||
+ | |||
+ | ::[[File:pezy-sc city.svg|450px]] | ||
+ | |||
+ | == Models == | ||
+ | The origin of the PEZY-SCx family is the {{pezy|PEZY-1}}, a 512-core chip. | ||
=== 1st generation === | === 1st generation === | ||
{{main|pezy/pezy-scx/pezy-sc|l1=PEZY-SC}} | {{main|pezy/pezy-scx/pezy-sc|l1=PEZY-SC}} | ||
− | The first series of supercomputers, [[ZettaScaler#ZettaScaler-1.x|ZettaScaler-1.x]], were based on the {{\\|PEZY-SC}}. | + | The first series of supercomputers, [[ZettaScaler#ZettaScaler-1.x|ZettaScaler-1.x]], were based on the {{\\|PEZY-SC}}. The {{\\|PEZY-SC}} featured four "Prefecture", each consisting of 16 cities for a total of 256 PEs per Prefecture along with 2 MiB of [[L3 cache]]. This chip had four such Prefecture units for a total of 1,024 [[physical core|cores]] and 8,192 [[logical cores|threads]]. Operating at 733 MHz, this chip was capable of 3 TFLOPS ([[single-precision]]) and 1.5 TFLOPS ([[double-precision]]). A number of signal-related issues particularly relating to PCIe signal failure were addressed by PEZY with the introduction of the {{\\|PEZY-SCnp}} which made use of a new package ("np"). The PEZY-SCnp, while identical to the earlier model does feature slightly higher clock resulted in slightly higher peak performance. |
+ | |||
+ | |||
+ | : [[File:pezy-sc main block.svg|600px]] | ||
=== 2nd generation === | === 2nd generation === | ||
{{main|pezy/pezy-scx/pezy-sc2|l1=PEZY-SC2}} | {{main|pezy/pezy-scx/pezy-sc2|l1=PEZY-SC2}} | ||
− | The | + | The second series of supercomputers, [[ZettaScaler#ZettaScaler-2.x|ZettaScaler-2.x]], were based on the {{\\|PEZY-SC2}}. |
+ | |||
+ | : [[File:pezy-sc2 main block.svg|600px]] | ||
=== future generations === | === future generations === | ||
Line 65: | Line 80: | ||
{{comp table start}} | {{comp table start}} | ||
<table class="comptable sortable tc4"> | <table class="comptable sortable tc4"> | ||
− | {{comp table header|main| | + | {{comp table header|main|8:List of PEZY-SCx Processors}} |
− | {{comp table header|cols|Process|Launched|Cores|Threads|Die|%Frequency}} | + | {{comp table header|main|5:Main Features|3:Performance}} |
+ | {{comp table header|cols|Process|Launched|Cores|Threads|Die|%Frequency|FLOPS (SP)|FLOPS (DP)}} | ||
{{#ask: [[Category:microprocessor models by pezy]] [[family::PEZY-SCx]] | {{#ask: [[Category:microprocessor models by pezy]] [[family::PEZY-SCx]] | ||
|?full page name | |?full page name | ||
Line 76: | Line 92: | ||
|?die area | |?die area | ||
|?base frequency#MHz | |?base frequency#MHz | ||
+ | |?peak flops (single-precision)#TFLOPS | ||
+ | |?peak flops (double-precision)#TFLOPS | ||
|format=template | |format=template | ||
|template=proc table 3 | |template=proc table 3 | ||
− | |userparam= | + | |userparam=10 |
|mainlabel=- | |mainlabel=- | ||
|sort=process,model number | |sort=process,model number | ||
Line 88: | Line 106: | ||
== See also == | == See also == | ||
* Intel {{intel|Xeon Phi}} | * Intel {{intel|Xeon Phi}} | ||
+ | |||
+ | == Bibliography == | ||
+ | * IEEE Cool Chips XVIII Symposium 2015. | ||
+ | * JSICR HPC (2015-HPC-152) "Suiren(睡蓮)による計算科学アプリケーションの性能評価" (Performance evaluation of scientific applications on Suiren System) | ||
+ | * The Fifth International Symposium on Computing and Networking 2017 (CANDAR'17). Keynote address. | ||
+ | |||
+ | [[category:supercomputing in japan]] |
Latest revision as of 08:33, 9 May 2019
PEZY-SCx | |
Developer | PEZY Computing |
Manufacturer | TSMC |
Type | Microprocessors |
Introduction | 2014 (announced) 2014 (launch) |
Architecture | Many-core architecture |
Process | 28 nm 0.028 μm , 16 nm2.8e-5 mm 0.016 μm , 7 nm1.6e-5 mm 0.007 μm , 5 nm7.0e-6 mm 0.005 μm
5.0e-6 mm |
Technology | CMOS |
Clock | 733 MHz-1,600 MHz |
PEZY-SCx (PEZY-SuperComputerx) is a family of many-core microprocessors designed by PEZY. Those processors power many of Japan's most efficient supercomputers.
Contents
Overview[edit]
PEZY-SCx is a family of high-performance, low-power many-core microprocessors designed by PEZY for a series of supercomputer developed in Japan. PEZY collaborates closely with ExaScaler, a company that provides immersion cooling systems. Together, they have developed a series of supercomputers called ZettaScaler.
Architecture[edit]
The basic architecture of all the PEZY-SCx chips is fairly similar. At the heart is the Processing Element. Depending on the model, 1000s of those PEs are then integrated on a single die.
The PEZY-SCx are designed as accelerators, that is, the a host processor (typically an Intel Xeon E5) off-loads the PEZY-SC code to execute. Those chips support OpenCL-like programming called PZCL.
Processing Element (PE)[edit]
The cores are called the processing elements (PE). The PEs are designed to be very simple RISC cores that are confgured as MIMD although in principle each PE can run different workloads. Each PE is a 16-stage in-order superscalar capable of issuing two instructions per cycle with out-of-order completion whenever possible supporting 8-way fine-grain simultaneous multithreading. A processing element supports 8-way SMT with dedicated register files for each thread. Threads are are interleaved each cycle with switching done to reduce forwarding and in order mitigate the lack of branch prediction. Explicit switching of active threads is also done in order to hide high latency operations.
The instruction set architecture implemented is a proprietary one designed by PEZY. The instruction set supports various operations such as data flashing, synchronization, acquisition of IDs, and thread switching. Each PE has an ID which is used by the code to track processes. The PEs do not maintain cache-coherency and there is no per-PE data cache. Complex instructions are processed by the Special Function Units (SFU) located in each city. A fair amount of sacrifices were made in order to ensure the cores remain small enough so that a large amount of them can be packed into a small area
Village & City[edit]
For every pair of PEs is 2 KiB of level 1 data cache. Each City is made of 64 KiB of L2 cache, a number of special function units, and 4 smaller blocks called "Villages". A village consists of four processing elements. Each city also contains a Special Function Unit (SFU) which is used to execute complex instructions.
Models[edit]
The origin of the PEZY-SCx family is the PEZY-1, a 512-core chip.
1st generation[edit]
- Main article: PEZY-SC
The first series of supercomputers, ZettaScaler-1.x, were based on the PEZY-SC. The PEZY-SC featured four "Prefecture", each consisting of 16 cities for a total of 256 PEs per Prefecture along with 2 MiB of L3 cache. This chip had four such Prefecture units for a total of 1,024 cores and 8,192 threads. Operating at 733 MHz, this chip was capable of 3 TFLOPS (single-precision) and 1.5 TFLOPS (double-precision). A number of signal-related issues particularly relating to PCIe signal failure were addressed by PEZY with the introduction of the PEZY-SCnp which made use of a new package ("np"). The PEZY-SCnp, while identical to the earlier model does feature slightly higher clock resulted in slightly higher peak performance.
2nd generation[edit]
- Main article: PEZY-SC2
The second series of supercomputers, ZettaScaler-2.x, were based on the PEZY-SC2.
future generations[edit]
PEZY has laid out future generations based on TSMC's 7nm and 5nm processes.
Summary[edit]
List of PEZY-SCx Processors | ||||||||
---|---|---|---|---|---|---|---|---|
Main Features | Performance | |||||||
Model | Process | Launched | Cores | Threads | Die | Frequency | FLOPS (SP) | FLOPS (DP) |
PEZY-SC4 | 5 nm 0.005 μm 5.0e-6 mm | 2020 | 16,384 | 131,072 | 740 mm² 1.147 in² 7.4 cm² 740,000,000 µm² | 1,600 MHz 1.6 GHz 1,600,000 kHz | 104.858 TFLOPS 104,857,600,000,000 FLOPS 104,857,600,000 KFLOPS 104,857,600 MFLOPS 104,857.6 GFLOPS 0.105 PFLOPS | 52.429 TFLOPS 52,428,800,000,000 FLOPS 52,428,800,000 KFLOPS 52,428,800 MFLOPS 52,428.8 GFLOPS 0.0524 PFLOPS |
PEZY-SC3 | 7 nm 0.007 μm 7.0e-6 mm | 2019 | 8,192 | 65,536 | 700 mm² 1.085 in² 7 cm² 700,000,000 µm² | 1,333.333 MHz 1.333 GHz 1,333,333 kHz | 43.691 TFLOPS 43,690,666,655,744 FLOPS 43,690,666,655.744 KFLOPS 43,690,666.656 MFLOPS 43,690.667 GFLOPS 0.0437 PFLOPS | 21.845 TFLOPS 21,845,333,327,872 FLOPS 21,845,333,327.872 KFLOPS 21,845,333.328 MFLOPS 21,845.333 GFLOPS 0.0218 PFLOPS |
PEZY-SC2 | 16 nm 0.016 μm 1.6e-5 mm | 2017 | 2,048 | 16,384 | 620 mm² 0.961 in² 6.2 cm² 620,000,000 µm² | 1,000 MHz 1 GHz 1,000,000 kHz | 8.192 TFLOPS 8,192,000,000,000 FLOPS 8,192,000,000 KFLOPS 8,192,000 MFLOPS 8,192 GFLOPS 0.00819 PFLOPS | 4.096 TFLOPS 4,096,000,000,000 FLOPS 4,096,000,000 KFLOPS 4,096,000 MFLOPS 4,096 GFLOPS 0.0041 PFLOPS |
PEZY-SC | 28 nm 0.028 μm 2.8e-5 mm | September 2014 | 1,024 | 8,192 | 411.6 mm² 0.638 in² 4.116 cm² 411,600,000 µm² | 733.33 MHz 0.733 GHz 733,330 kHz | 3.004 TFLOPS 3,003,733,331,968 FLOPS 3,003,733,331.968 KFLOPS 3,003,733.332 MFLOPS 3,003.733 GFLOPS 0.003 PFLOPS | 1.502 TFLOPS 1,501,866,665,984 FLOPS 1,501,866,665.984 KFLOPS 1,501,866.666 MFLOPS 1,501.867 GFLOPS 0.0015 PFLOPS |
PEZY-SCnp | 28 nm 0.028 μm 2.8e-5 mm | 6 May 2016 | 1,024 | 8,192 | 766.66 MHz 0.767 GHz 766,660 kHz | 3.14 TFLOPS 3,140,266,663,936 FLOPS 3,140,266,663.936 KFLOPS 3,140,266.664 MFLOPS 3,140.267 GFLOPS 0.00314 PFLOPS | 1.57 TFLOPS 1,570,133,331,968 FLOPS 1,570,133,331.968 KFLOPS 1,570,133.332 MFLOPS 1,570.133 GFLOPS 0.00157 PFLOPS | |
Count: 5 |
See also[edit]
- Intel Xeon Phi
Bibliography[edit]
- IEEE Cool Chips XVIII Symposium 2015.
- JSICR HPC (2015-HPC-152) "Suiren(睡蓮)による計算科学アプリケーションの性能評価" (Performance evaluation of scientific applications on Suiren System)
- The Fifth International Symposium on Computing and Networking 2017 (CANDAR'17). Keynote address.
designer | PEZY Computing + |
first announced | 2014 + |
first launched | 2014 + |
full page name | pezy/pezy-scx + |
instance of | microprocessor family + |
main designer | PEZY Computing + |
manufacturer | TSMC + |
name | PEZY-SCx + |
process | 28 nm (0.028 μm, 2.8e-5 mm) +, 16 nm (0.016 μm, 1.6e-5 mm) +, 7 nm (0.007 μm, 7.0e-6 mm) + and 5 nm (0.005 μm, 5.0e-6 mm) + |
technology | CMOS + |