(20 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
{{pezy title|PEZY-SC2}} | {{pezy title|PEZY-SC2}} | ||
− | {{ | + | {{chip |
|future=Yes | |future=Yes | ||
|name=PEZY-SC2 | |name=PEZY-SC2 | ||
Line 9: | Line 9: | ||
|market=Supercomputer | |market=Supercomputer | ||
|first announced=2015 | |first announced=2015 | ||
− | |first launched= | + | |first launched=2017 |
|family=PEZY-SCx | |family=PEZY-SCx | ||
|frequency=1,000 MHz | |frequency=1,000 MHz | ||
Line 16: | Line 16: | ||
|die area=620 mm² | |die area=620 mm² | ||
|core count=2,048 | |core count=2,048 | ||
− | |power= | + | |thread count=16,384 |
+ | |power=180 W | ||
+ | |average power=130 W | ||
|v core=0.8 V | |v core=0.8 V | ||
}} | }} | ||
− | '''PEZY-SC2''' ('''PEZY Super Computer 2''') is third generation [[many-core microprocessor]] developed by [[PEZY]] | + | '''PEZY-SC2''' ('''PEZY Super Computer 2''') is a third generation [[many-core microprocessor]] developed by [[PEZY]] and introduced in early 2017. This chip, which operates at 1 GHz, incorporates 2,048 cores dissipating 180 W. The PEZY-SC2 powers the [[ZettaScaler]]-2.x series of supercomputers. |
− | PEZY-SC2 | + | == Overview == |
+ | Introduced by [[PEZY]] along with their second-generation [[ZettaScaler]]-2.0 supercomputer series, the SC2 incorporates 2,048 cores along with 8-way [[simultaneous multithreading|SMT]] support for a total of 16,384 threads, twice as many cores as {{\\|PEZY-SC|its predecessor}}. The PEZY-SC2 powers many of the top [[Green500]] most efficient supercomputers with upward of 14 GFLOPS/watt in performance. | ||
+ | |||
+ | Operating at 1 GHz, the PEZY-SC2 has a peak performance of 8.192 [[TFLOPS]] (single-precision) and 4.1 TFLOPS (double-precision) while consuming around 180 Watts. The PEZY-SC2 is designed using over 2.4 billion gates and is manufactured on [[TSMC]]'s [[16 nm process|16FF+ process]]. | ||
+ | {{#set: | ||
+ | | peak flops (half-precision) = {{#expr:1000000000 * 8 * 2048}} FLOPS | ||
+ | | peak flops (single-precision) = {{#expr:1000000000 * 4 * 2048}} FLOPS | ||
+ | | peak flops (double-precision) = {{#expr:1000000000 * 2 * 2048}} FLOPS | ||
+ | }} | ||
+ | In attempt to increase adaptability in the field of deep learning and AI as well as to increase throughput for specialized workloads, the PEZY-SC2 introduced support for 16-bit half precision floating point support. At 1 GHz, the SC2 can peak at 16.4 TFLOPS for half precision. | ||
+ | |||
+ | === Architecture === | ||
+ | Managing the tiny PEZY cores are six {{mips|P-Class}} {{mips|P6600}} [[MIPS]] (MIPS64R6) processors. Previously, the {{\\|PEZY-SC}} relied on two lightweight {{arm|ARM9|ARM926|l=arch}} [[physical core|cores]] that proved to be too much of a performance bottleneck. The SC2 got rid of the four "Prefecture" units that incorporated 256 cities along with 2 MiB of L3 cache. Instead, the SC2 now has 40 MiB of shared [[last level cache]] shared not only by all the cities, but also by the MIPS cores. In order to improve performance further, the MIPS cores and the PEZY cores now share the same address space, reducing data transfer overhead. It's worth noting that the use of powerful MIPS cores mean they no longer require to rely on an external [[Intel]] {{intel|Xeon E5}} host processor. | ||
+ | |||
+ | |||
+ | : [[File:pezy-sc2 main block.svg|750px]] | ||
+ | |||
+ | == Cache == | ||
+ | The SC2 a [[hexa-core]] MIPS {{mips|P6600}} process which has its own separate cache: | ||
+ | |||
+ | {{cache size | ||
+ | |l1 cache=768 KiB | ||
+ | |l1i cache=384 KiB | ||
+ | |l1i break=6x64 KiB | ||
+ | |l1i desc=4-way set associative | ||
+ | |l1d cache=384 KiB | ||
+ | |l1d break=6x64 KiB | ||
+ | |l1d desc=8-way set associative | ||
+ | |l2 cache=2 MiB | ||
+ | |l2 break=1x2 MiB | ||
+ | |l2 desc=8-way set associative | ||
+ | }} | ||
+ | |||
+ | The SC2 integrates a multi-level cache hierarchy: | ||
+ | |||
+ | {{cache size | ||
+ | |l1 cache=12 MiB | ||
+ | |l1i cache=8 MiB | ||
+ | |l1d cache=4 MiB | ||
+ | |l2 cache=12 MiB | ||
+ | |l3 cache=40 MiB | ||
+ | |l3 break=1x40 MiB | ||
+ | |l3 desc=shared LLC | ||
+ | }} | ||
+ | |||
+ | Additionally, there is another 40 MiB consisting of 20 KiB per PE of scratch pad memory. This was increased from 16 KiB in the {{\\|Pezy-SC}}. | ||
== Memory controller == | == Memory controller == | ||
+ | For main memory, the PEZY-SC2 supports 4 channels of 64-bit DDR4-3200 memory with ECC support for a total aggregated bandwidth of 95.37 GiB/s | ||
{{memory controller | {{memory controller | ||
− | |type=DDR4- | + | |type=DDR4-3200 |
|ecc=Yes | |ecc=Yes | ||
+ | |max mem=123 GiB | ||
+ | |controllers=4 | ||
+ | |channels=4 | ||
+ | |width=64 bit | ||
+ | |max bandwidth=95.37 GiB/s | ||
+ | |bandwidth schan=23.84GiB/s | ||
+ | |bandwidth dchan=47.68 GiB/s | ||
+ | |bandwidth qchan=95.37 GiB/s | ||
}} | }} | ||
+ | |||
+ | In addition to main memory bandwidth, the PEZY-SC2 supports Wide-IO with a width of 1,024 bit. The SC2 uses [[ThruChip Interface]] (TCI), a wireless near-field inductive coupling technology, in order to communicate with the TCI-DRAM chips which are packaged together. The SC2 features four TCI-DRAM interfaces, each providing a maximum bandwidth of 500 GB/s for a total aggregated bandwidth of 2 TB/s. | ||
{{memory controller | {{memory controller | ||
Line 37: | Line 95: | ||
== Expansions == | == Expansions == | ||
− | {{expansions | + | The SC2 upgraded the PCIe interface to Gen4, supporting up to 64 GB/s. The number of lanes remain unchanged from the {{\\|PEZY-SC}}, at 32. |
− | | pcie revision | + | {{expansions main |
− | | pcie lanes | + | | |
− | | pcie config | + | {{expansions entry |
− | | pcie config 2 | + | |type=PCIe |
− | + | |pcie revision=4.0 | |
− | + | |pcie lanes=32 | |
− | + | |pcie config=4x8 | |
+ | |pcie config 2=2x16 | ||
+ | }} | ||
}} | }} | ||
+ | == See also == | ||
+ | * [https://fuse.wikichip.org/news/191/the-2048-core-pezy-sc2-sets-a-green500-record/ The 2,048-core PEZY-SC2 sets a Green500 record - WikiChip Fuse] |
Latest revision as of 10:15, 22 September 2018
Edit Values | |
PEZY-SC2 | |
General Info | |
Designer | PEZY |
Manufacturer | TSMC |
Model Number | PEZY-SC2 |
Market | Supercomputer |
Introduction | 2015 (announced) 2017 (launched) |
General Specs | |
Family | PEZY-SCx |
Frequency | 1,000 MHz |
Microarchitecture | |
Process | 16 nm |
Technology | CMOS |
Die | 620 mm² |
Cores | 2,048 |
Threads | 16,384 |
Electrical | |
Power dissipation | 180 W |
Power dissipation (average) | 130 W |
Vcore | 0.8 V |
PEZY-SC2 (PEZY Super Computer 2) is a third generation many-core microprocessor developed by PEZY and introduced in early 2017. This chip, which operates at 1 GHz, incorporates 2,048 cores dissipating 180 W. The PEZY-SC2 powers the ZettaScaler-2.x series of supercomputers.
Overview[edit]
Introduced by PEZY along with their second-generation ZettaScaler-2.0 supercomputer series, the SC2 incorporates 2,048 cores along with 8-way SMT support for a total of 16,384 threads, twice as many cores as its predecessor. The PEZY-SC2 powers many of the top Green500 most efficient supercomputers with upward of 14 GFLOPS/watt in performance.
Operating at 1 GHz, the PEZY-SC2 has a peak performance of 8.192 TFLOPS (single-precision) and 4.1 TFLOPS (double-precision) while consuming around 180 Watts. The PEZY-SC2 is designed using over 2.4 billion gates and is manufactured on TSMC's 16FF+ process.
In attempt to increase adaptability in the field of deep learning and AI as well as to increase throughput for specialized workloads, the PEZY-SC2 introduced support for 16-bit half precision floating point support. At 1 GHz, the SC2 can peak at 16.4 TFLOPS for half precision.
Architecture[edit]
Managing the tiny PEZY cores are six P-Class P6600 MIPS (MIPS64R6) processors. Previously, the PEZY-SC relied on two lightweight ARM926 cores that proved to be too much of a performance bottleneck. The SC2 got rid of the four "Prefecture" units that incorporated 256 cities along with 2 MiB of L3 cache. Instead, the SC2 now has 40 MiB of shared last level cache shared not only by all the cities, but also by the MIPS cores. In order to improve performance further, the MIPS cores and the PEZY cores now share the same address space, reducing data transfer overhead. It's worth noting that the use of powerful MIPS cores mean they no longer require to rely on an external Intel Xeon E5 host processor.
Cache[edit]
The SC2 a hexa-core MIPS P6600 process which has its own separate cache:
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory. The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC. Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies. Note: All units are in kibibytes and mebibytes. |
|||||||||||||||||||||||||
|
The SC2 integrates a multi-level cache hierarchy:
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory. The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC. Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies. Note: All units are in kibibytes and mebibytes. |
|||||||||||||||||||||||||||||||||||||
|
Additionally, there is another 40 MiB consisting of 20 KiB per PE of scratch pad memory. This was increased from 16 KiB in the Pezy-SC.
Memory controller[edit]
For main memory, the PEZY-SC2 supports 4 channels of 64-bit DDR4-3200 memory with ECC support for a total aggregated bandwidth of 95.37 GiB/s
Integrated Memory Controller
|
||||||||||||||||
|
In addition to main memory bandwidth, the PEZY-SC2 supports Wide-IO with a width of 1,024 bit. The SC2 uses ThruChip Interface (TCI), a wireless near-field inductive coupling technology, in order to communicate with the TCI-DRAM chips which are packaged together. The SC2 features four TCI-DRAM interfaces, each providing a maximum bandwidth of 500 GB/s for a total aggregated bandwidth of 2 TB/s.
Integrated Memory Controller
|
||||||||||
|
Expansions[edit]
The SC2 upgraded the PCIe interface to Gen4, supporting up to 64 GB/s. The number of lanes remain unchanged from the PEZY-SC, at 32.
Expansion Options |
|||||
|
See also[edit]
Has subobject "Has subobject" is a predefined property representing a container construct and is provided by Semantic MediaWiki. | PEZY-SC2 - PEZY#pcie + |
base frequency | 1,000 MHz (1 GHz, 1,000,000 kHz) + |
core count | 2,048 + |
core voltage | 0.8 V (8 dV, 80 cV, 800 mV) + |
designer | PEZY + |
die area | 620 mm² (0.961 in², 6.2 cm², 620,000,000 µm²) + |
family | PEZY-SCx + |
first announced | 2015 + |
first launched | 2017 + |
full page name | pezy/pezy-scx/pezy-sc2 + |
has ecc memory support | true + and false + |
instance of | microprocessor + |
l1$ size | 768 KiB (786,432 B, 0.75 MiB) + and 12,288 KiB (12,582,912 B, 12 MiB) + |
l1d$ description | 8-way set associative + |
l1d$ size | 384 KiB (393,216 B, 0.375 MiB) + and 4,096 KiB (4,194,304 B, 4 MiB) + |
l1i$ description | 4-way set associative + |
l1i$ size | 384 KiB (393,216 B, 0.375 MiB) + and 8,192 KiB (8,388,608 B, 8 MiB) + |
l2$ description | 8-way set associative + |
l2$ size | 2 MiB (2,048 KiB, 2,097,152 B, 0.00195 GiB) + and 12 MiB (12,288 KiB, 12,582,912 B, 0.0117 GiB) + |
l3$ description | shared LLC + |
l3$ size | 40 MiB (40,960 KiB, 41,943,040 B, 0.0391 GiB) + |
ldate | 3000 + |
manufacturer | TSMC + |
market segment | Supercomputer + |
max memory bandwidth | 95.37 GiB/s (97,658.88 MiB/s, 102.403 GB/s, 102,402.758 MB/s, 0.0931 TiB/s, 0.102 TB/s) + and 1,907.712 GiB/s (1,953,497.088 MiB/s, 2,048.39 GB/s, 2,048,390.163 MB/s, 1.863 TiB/s, 2.048 TB/s) + |
max memory channels | 4 + |
model number | PEZY-SC2 + |
name | PEZY-SC2 + |
peak flops (double-precision) | 4,096,000,000,000 FLOPS (4,096,000,000 KFLOPS, 4,096,000 MFLOPS, 4,096 GFLOPS, 4.096 TFLOPS, 0.0041 PFLOPS, 4.096e-6 EFLOPS, 4.096e-9 ZFLOPS) + |
peak flops (half-precision) | 16,384,000,000,000 FLOPS (16,384,000,000 KFLOPS, 16,384,000 MFLOPS, 16,384 GFLOPS, 16.384 TFLOPS, 0.0164 PFLOPS, 1.6384e-5 EFLOPS, 1.6384e-8 ZFLOPS) + |
peak flops (single-precision) | 8,192,000,000,000 FLOPS (8,192,000,000 KFLOPS, 8,192,000 MFLOPS, 8,192 GFLOPS, 8.192 TFLOPS, 0.00819 PFLOPS, 8.192e-6 EFLOPS, 8.192e-9 ZFLOPS) + |
power dissipation | 180 W (180,000 mW, 0.241 hp, 0.18 kW) + |
power dissipation (average) | 130 W (130,000 mW, 0.174 hp, 0.13 kW) + |
process | 16 nm (0.016 μm, 1.6e-5 mm) + |
supported memory type | DDR4-3200 + |
technology | CMOS + |
thread count | 16,384 + |