Template:mpu PEZY-SC (PEZY Super Computer) is second generation many-core microprocessor developed by PEZY in 2014. PEZY-SC contains 2 ARM926 cores (ARMv5TEJ) along with 1024 simpler RISC cores. Operating at 733 MHz, the processor is said to have peach performance of 3.0 TFLOPS (single-precision) and 1.5 TFLOPS (double-precision). PEZY-SC was designed using 580 million gates and manufactured on TSMC's 28HPC+ (28 nm process). The PEZY-SC is used in a number of TOP500 & Green500 supercomputers as the world's most efficient supercomputers.
Contents
Overview
- See also: PEZY-1
The PEZY-SC (SC for "Super Computer") is PEZY's second generation microprocessors which builds upon the PEZY-1. The chip contains exactly twice as many cores and incorporates a large amount of cache including 8 MB of L3$.
In June of 2015, PEZY-SC-based supercomputers took all top 3 spots on the Green500 listing as the 3 most efficient supercomputers. PEZY-SC powers Shoubu (1,181,952 cores, ? kW, 605.624 TFlop/s Linpack Rmax), and Suiren Blue (262,656 cores, 40.86 kW, 247.752 TFlop/s Linpack Rmax), and Suiren (328,480 cores, 48.90 kW, 271.782 TFlop/s Linpack Rmax) supercomputers (ranked 1, 2, and 3 respectively).
Architecture
The PEZY-SC microprocessors is made of 4 blocks called "Prefectures". The Prefecture contains 2 MB of L3$ enclosed by 16 smaller blocks called "Cities". Each City is made of 64 KB of L2$, a number of special function units, and 4 smaller blocks called "Villages". A village is a block of 4 execution units. For ever 2 execution units there are 2 KB of L1d$.
Processor Element (PE)
The PE are the individual execution cores.
| This section requires expansion; you can help adding the missing info. |
Die Shot
Cache
PEZY-SC's cache is separate from the ARM926's cache which has an L1$ of 32 KiB (2x) and 64 KiB L2$ (shared).
| Cache Info [Edit Values] | ||
| L1I$ | 2 MiB 2,048 KiB 2,097,152 B |
1024x2 KiB (per processor element) |
| L1D$ | 1 MiB 1,024 KiB 1,048,576 B |
512x2 KiB (per 2 processor elements) |
| L2$ | 4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB |
4x2 MiB (per city) |
| L3$ | 8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB |
4x2 MiB (per prefecture) |
Memory controller
| Integrated Memory Controller | |
| Type | DDR4-1333 |
| Controllers | 1 |
| Channels | 8 |
| Bandwidth (single) | 10,600 MB/s |
| Bandwidth (dual) | 21,200 MB/s |
| Bandwidth (quad) | 42,400 MB/s |
| Bandwidth (octa) | 84,800 MB/s |
Expansions
External Links
| Has subobject "Has subobject" is a predefined property representing a container construct and is provided by Semantic MediaWiki. | PEZY-SC - PEZY#pcie + and PEZY-SC - PEZY#package + |
| base frequency | 733.33 MHz (0.733 GHz, 733,330 kHz) + |
| core count | 1,024 + |
| core voltage | 1 V (10 dV, 100 cV, 1,000 mV) + |
| designer | PEZY + |
| die area | 411.6 mm² (0.638 in², 4.116 cm², 411,600,000 µm²) + |
| die length | 19.5 mm (1.95 cm, 0.768 in, 19,500 µm) + |
| die width | 21.1 mm (2.11 cm, 0.831 in, 21,100 µm) + |
| family | PEZY-SCx + |
| first announced | 2013 + |
| first launched | September 2014 + |
| full page name | pezy/pezy-scx/pezy-sc + |
| has ecc memory support | true + |
| instance of | microprocessor + |
| l1$ size | 64 KiB (65,536 B, 0.0625 MiB) + and 3,072 KiB (3,145,728 B, 3 MiB) + |
| l1d$ description | per 2 processor elements + |
| l1d$ size | 32 KiB (32,768 B, 0.0313 MiB) + and 1,024 KiB (1,048,576 B, 1 MiB) + |
| l1i$ description | per processor element + |
| l1i$ size | 32 KiB (32,768 B, 0.0313 MiB) + and 2,048 KiB (2,097,152 B, 2 MiB) + |
| l2$ description | per city + |
| l2$ size | 4 MiB (4,096 KiB, 4,194,304 B, 0.00391 GiB) + and 0.0625 MiB (64 KiB, 65,536 B, 6.103516e-5 GiB) + |
| l3$ description | per prefecture + |
| l3$ size | 8 MiB (8,192 KiB, 8,388,608 B, 0.00781 GiB) + |
| ldate | September 2014 + |
| main image | |
| manufacturer | TSMC + |
| market segment | Supercomputer + |
| max memory bandwidth | 127.156 GiB/s (130,207.744 MiB/s, 136.533 GB/s, 136,532.715 MB/s, 0.124 TiB/s, 0.137 TB/s) + |
| max memory channels | 8 + |
| model number | PEZY-SC + |
| name | PEZY-SC + |
| package | FCBGA-2112 + |
| peak flops (double-precision) | 1,501,866,665,984 FLOPS (1,501,866,665.984 KFLOPS, 1,501,866.666 MFLOPS, 1,501.867 GFLOPS, 1.502 TFLOPS, 0.0015 PFLOPS, 1.501867e-6 EFLOPS, 1.501867e-9 ZFLOPS) + |
| peak flops (single-precision) | 3,003,733,331,968 FLOPS (3,003,733,331.968 KFLOPS, 3,003,733.332 MFLOPS, 3,003.733 GFLOPS, 3.004 TFLOPS, 0.003 PFLOPS, 3.003733e-6 EFLOPS, 3.003733e-9 ZFLOPS) + |
| power dissipation | 100 W (100,000 mW, 0.134 hp, 0.1 kW) + |
| power dissipation (average) | 70 W (70,000 mW, 0.0939 hp, 0.07 kW) + |
| process | 28 nm (0.028 μm, 2.8e-5 mm) + |
| supported memory type | DDR4-2133 + |
| technology | CMOS + |
| thread count | 8,192 + |
| transistor count | 3,730,000,000 + |