Line 9: | Line 9: | ||
|market=Supercomputer | |market=Supercomputer | ||
|first announced=2015 | |first announced=2015 | ||
− | |first launched= | + | |first launched=2017 |
|family=PEZY-SCx | |family=PEZY-SCx | ||
|frequency=1,000 MHz | |frequency=1,000 MHz | ||
Line 16: | Line 16: | ||
|die area=620 mm² | |die area=620 mm² | ||
|core count=2,048 | |core count=2,048 | ||
− | |thread count= | + | |thread count=16,384 |
|power=180 W | |power=180 W | ||
|v core=0.8 V | |v core=0.8 V | ||
}} | }} | ||
− | '''PEZY-SC2''' ('''PEZY Super Computer 2''') is third generation [[many-core microprocessor]] developed by [[PEZY]] | + | '''PEZY-SC2''' ('''PEZY Super Computer 2''') is third generation [[many-core microprocessor]] developed by [[PEZY]] and introduced in early 2017. This chip, which operates at 1 GHz, incorporates 2,048 cores dissipating 180 W. |
+ | |||
+ | == Overview == | ||
+ | Introduced by [[PEZY]] along with their second-generation [[ZettaScaler]]-2.0 supercomputer series, the SC2 incorporates 2,048 cores along with 8-way [[simultaneous multithreading|SMT]] support for a total of 16,384 threads, twice as many cores as {{\\|PEZY-SC|its predecessor}}. | ||
+ | |||
+ | Operating at 1 GHz, the PEZY-SC2 has a peak performance of 8.192 TFLOPS (single-precision) and 4.1 TFLOPS (double-precision) while consuming around 180 Watts. The PEZY-SC2 is designed using over 2.4 billion gates and is manufactured on [[TSMC]]'s [[16 nm process|16FF+ process]]. | ||
+ | {{#set: | ||
+ | | peak flops (half-precision) = {{#expr:1000000000 * 8 * 2048}} FLOPS | ||
+ | | peak flops (single-precision) = {{#expr:1000000000 * 4 * 2048}} FLOPS | ||
+ | | peak flops (double-precision) = {{#expr:1000000000 * 2 * 2048}} FLOPS | ||
+ | }} | ||
+ | In attempt to increase adaptability in the field of deep learning and AI as well as to increase throughput for specialized workloads, the PEZY-SC2 introduced support for 16-bit half precision floating point support. At 1 GHz, the SC2 can peak at 16.4 TFLOPS for half precision. | ||
+ | |||
+ | == Cache == | ||
+ | The SC2 a [[hexa-core]] MIPS {{mips|P6600}} process which has its own separate cache: | ||
+ | |||
+ | {{cache size | ||
+ | |l1 cache=768 KiB | ||
+ | |l1i cache=384 KiB | ||
+ | |l1i break=6x64 KiB | ||
+ | |l1i desc=4-way set associative | ||
+ | |l1d cache=384 KiB | ||
+ | |l1d break=6x64 KiB | ||
+ | |l1d desc=8-way set associative | ||
+ | |l2 cache=2 MiB | ||
+ | |l2 break=1x2 MiB | ||
+ | |l2 desc=8-way set associative | ||
+ | }} | ||
+ | |||
+ | The SC2 integrates a multi-level cache hierarchy: | ||
+ | |||
+ | {{cache size | ||
+ | |l1 cache=12 MiB | ||
+ | |l1i cache=8 MiB | ||
+ | |l1i break= | ||
+ | |l1i desc=per processor element | ||
+ | |l1d cache=4 MiB | ||
+ | |l1d break=512x2 KiB | ||
+ | |l1d desc=per 2 processor elements | ||
+ | |l1d policy= | ||
+ | |l2 cache=12 MiB | ||
+ | |l2 break= | ||
+ | |l2 desc=per city | ||
+ | |l2 policy= | ||
+ | |l3 cache=40 MiB | ||
+ | |l3 break=1x40 MiB | ||
+ | |l3 desc=shared LLC | ||
+ | |l3 policy= | ||
+ | }} | ||
− | |||
== Memory controller == | == Memory controller == |
Revision as of 21:50, 2 November 2017
Template:mpu PEZY-SC2 (PEZY Super Computer 2) is third generation many-core microprocessor developed by PEZY and introduced in early 2017. This chip, which operates at 1 GHz, incorporates 2,048 cores dissipating 180 W.
Contents
Overview
Introduced by PEZY along with their second-generation ZettaScaler-2.0 supercomputer series, the SC2 incorporates 2,048 cores along with 8-way SMT support for a total of 16,384 threads, twice as many cores as its predecessor.
Operating at 1 GHz, the PEZY-SC2 has a peak performance of 8.192 TFLOPS (single-precision) and 4.1 TFLOPS (double-precision) while consuming around 180 Watts. The PEZY-SC2 is designed using over 2.4 billion gates and is manufactured on TSMC's 16FF+ process.
In attempt to increase adaptability in the field of deep learning and AI as well as to increase throughput for specialized workloads, the PEZY-SC2 introduced support for 16-bit half precision floating point support. At 1 GHz, the SC2 can peak at 16.4 TFLOPS for half precision.
Cache
The SC2 a hexa-core MIPS P6600 process which has its own separate cache:
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory. The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC. Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies. Note: All units are in kibibytes and mebibytes. |
|||||||||||||||||||||||||
|
The SC2 integrates a multi-level cache hierarchy:
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory. The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC. Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies. Note: All units are in kibibytes and mebibytes. |
|||||||||||||||||||||||||||||||||||||
|
Memory controller
Integrated Memory Controller
|
||||
|
Integrated Memory Controller
|
||||||||||
|
Expansions
Expansion Options |
|||||
|
Has subobject "Has subobject" is a predefined property representing a container construct and is provided by Semantic MediaWiki. | PEZY-SC2 - PEZY#pcie + |
has ecc memory support | true + and false + |
l1$ size | 768 KiB (786,432 B, 0.75 MiB) + and 12,288 KiB (12,582,912 B, 12 MiB) + |
l1d$ description | 8-way set associative + and per 2 processor elements + |
l1d$ size | 384 KiB (393,216 B, 0.375 MiB) + and 4,096 KiB (4,194,304 B, 4 MiB) + |
l1i$ description | 4-way set associative + and per processor element + |
l1i$ size | 384 KiB (393,216 B, 0.375 MiB) + and 8,192 KiB (8,388,608 B, 8 MiB) + |
l2$ description | 8-way set associative + and per city + |
l2$ size | 2 MiB (2,048 KiB, 2,097,152 B, 0.00195 GiB) + and 12 MiB (12,288 KiB, 12,582,912 B, 0.0117 GiB) + |
l3$ description | shared LLC + |
l3$ size | 40 MiB (40,960 KiB, 41,943,040 B, 0.0391 GiB) + |
max memory bandwidth | 1,907.712 GiB/s (1,953,497.088 MiB/s, 2,048.39 GB/s, 2,048,390.163 MB/s, 1.863 TiB/s, 2.048 TB/s) + |
max memory channels | 4 + |
peak flops (double-precision) | 4,096,000,000,000 FLOPS (4,096,000,000 KFLOPS, 4,096,000 MFLOPS, 4,096 GFLOPS, 4.096 TFLOPS, 0.0041 PFLOPS, 4.096e-6 EFLOPS, 4.096e-9 ZFLOPS) + |
peak flops (half-precision) | 16,384,000,000,000 FLOPS (16,384,000,000 KFLOPS, 16,384,000 MFLOPS, 16,384 GFLOPS, 16.384 TFLOPS, 0.0164 PFLOPS, 1.6384e-5 EFLOPS, 1.6384e-8 ZFLOPS) + |
peak flops (single-precision) | 8,192,000,000,000 FLOPS (8,192,000,000 KFLOPS, 8,192,000 MFLOPS, 8,192 GFLOPS, 8.192 TFLOPS, 0.00819 PFLOPS, 8.192e-6 EFLOPS, 8.192e-9 ZFLOPS) + |
supported memory type | DDR4-2666 + |