From WikiChip
Difference between revisions of "pezy/pezy-scx/pezy-sc2"
< pezy‎ | pezy-scx

 
(26 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
{{pezy title|PEZY-SC2}}
 
{{pezy title|PEZY-SC2}}
{{mpu
+
{{chip
 
|future=Yes
 
|future=Yes
 
|name=PEZY-SC2
 
|name=PEZY-SC2
Line 9: Line 9:
 
|market=Supercomputer
 
|market=Supercomputer
 
|first announced=2015
 
|first announced=2015
|first launched=2016
+
|first launched=2017
 +
|family=PEZY-SCx
 
|frequency=1,000 MHz
 
|frequency=1,000 MHz
 
|process=16 nm
 
|process=16 nm
Line 15: Line 16:
 
|die area=620 mm²
 
|die area=620 mm²
 
|core count=2,048
 
|core count=2,048
|power=200 W
+
|thread count=16,384
 +
|power=180 W
 +
|average power=130 W
 
|v core=0.8 V
 
|v core=0.8 V
 
}}
 
}}
'''PEZY-SC2''' ('''PEZY Super Computer 2''') is third generation [[many-core microprocessor]] developed by [[PEZY]] released in early 2017. The SC2 incorporates 2,048 cores, twice as many cores as its predecessor.
+
'''PEZY-SC2''' ('''PEZY Super Computer 2''') is a third generation [[many-core microprocessor]] developed by [[PEZY]] and introduced in early 2017. This chip, which operates at 1 GHz, incorporates 2,048 cores dissipating 180 W. The PEZY-SC2 powers the [[ZettaScaler]]-2.x series of supercomputers.  
  
PEZY-SC2 operates at 1 GHz and consume around 100 W while delivering performance in the order of 16.4 TFLOPS (HP), 8.2 TFLOPS (SP), and 4.1 TFLOPS (DP). The PEZY-SC2 is designed using over 2.4 billion gates and will be manufactured on TSMC's [[16 nm process]].
+
== Overview ==
 +
Introduced by [[PEZY]] along with their second-generation [[ZettaScaler]]-2.0 supercomputer series, the SC2 incorporates 2,048 cores along with 8-way [[simultaneous multithreading|SMT]] support for a total of 16,384 threads, twice as many cores as {{\\|PEZY-SC|its predecessor}}. The PEZY-SC2 powers many of the top [[Green500]] most efficient supercomputers with upward of 14 GFLOPS/watt in performance.
  
 +
Operating at 1 GHz, the PEZY-SC2 has a peak performance of 8.192 [[TFLOPS]] (single-precision) and 4.1 TFLOPS (double-precision) while consuming around 180 Watts. The PEZY-SC2 is designed using over 2.4 billion gates and is manufactured on [[TSMC]]'s [[16 nm process|16FF+ process]].
 +
{{#set:
 +
| peak flops (half-precision)  = {{#expr:1000000000 * 8 * 2048}} FLOPS
 +
| peak flops (single-precision) = {{#expr:1000000000 * 4 * 2048}} FLOPS
 +
| peak flops (double-precision) = {{#expr:1000000000 * 2 * 2048}} FLOPS
 +
}}
 +
In attempt to increase adaptability in the field of deep learning and AI as well as to increase throughput for specialized workloads, the PEZY-SC2 introduced support for 16-bit half precision floating point support. At 1 GHz, the SC2 can peak at 16.4 TFLOPS for half precision.
 +
 +
=== Architecture ===
 +
Managing the tiny PEZY cores are six {{mips|P-Class}} {{mips|P6600}} [[MIPS]] (MIPS64R6) processors. Previously, the {{\\|PEZY-SC}} relied on two lightweight {{arm|ARM9|ARM926|l=arch}} [[physical core|cores]] that proved to be too much of a performance bottleneck. The SC2 got rid of the four "Prefecture" units that incorporated 256 cities along with 2 MiB of L3 cache. Instead, the SC2 now has 40 MiB of shared [[last level cache]] shared not only by all the cities, but also by the MIPS cores. In order to improve performance further, the MIPS cores and the PEZY cores now share the same address space, reducing data transfer overhead. It's worth noting that the use of powerful MIPS cores mean they no longer require to rely on an external [[Intel]] {{intel|Xeon E5}} host processor.
 +
 +
 +
: [[File:pezy-sc2 main block.svg|750px]]
 +
 +
== Cache ==
 +
The SC2 a [[hexa-core]] MIPS {{mips|P6600}} process which has its own separate cache:
 +
 +
{{cache size
 +
|l1 cache=768 KiB
 +
|l1i cache=384 KiB
 +
|l1i break=6x64 KiB
 +
|l1i desc=4-way set associative
 +
|l1d cache=384 KiB
 +
|l1d break=6x64 KiB
 +
|l1d desc=8-way set associative
 +
|l2 cache=2 MiB
 +
|l2 break=1x2 MiB
 +
|l2 desc=8-way set associative
 +
}}
  
{{unknown features}}
+
The SC2 integrates a multi-level cache hierarchy:
 +
 
 +
{{cache size
 +
|l1 cache=12 MiB
 +
|l1i cache=8 MiB
 +
|l1d cache=4 MiB
 +
|l2 cache=12 MiB
 +
|l3 cache=40 MiB
 +
|l3 break=1x40 MiB
 +
|l3 desc=shared LLC
 +
}}
 +
 
 +
Additionally, there is another 40 MiB consisting of 20 KiB per PE of scratch pad memory. This was increased from 16 KiB in the {{\\|Pezy-SC}}.
  
 
== Memory controller ==
 
== Memory controller ==
 +
For main memory, the PEZY-SC2 supports 4 channels of 64-bit DDR4-3200 memory with ECC support for a total aggregated bandwidth of 95.37 GiB/s
 
{{memory controller
 
{{memory controller
|type=DDR4-2666
+
|type=DDR4-3200
 
|ecc=Yes
 
|ecc=Yes
|controllers=8
+
|max mem=123 GiB
|channels=8
+
|controllers=4
|max bandwidth=158.95 GiB/s
+
|channels=4
|bandwidth schan=19.89 GiB/s
+
|width=64 bit
|bandwidth dchan=39.72 GiB/s
+
|max bandwidth=95.37 GiB/s
|bandwidth qchan=79.47 GiB/s
+
|bandwidth schan=23.84GiB/s
|bandwidth ochan=158.95 GiB/s
+
|bandwidth dchan=47.68 GiB/s
|bandwidth hchan=119.21 GiB/s
+
|bandwidth qchan=95.37 GiB/s
 
}}
 
}}
 +
 +
In addition to main memory bandwidth, the PEZY-SC2 supports Wide-IO with a width of 1,024 bit. The SC2 uses [[ThruChip Interface]] (TCI), a wireless near-field inductive coupling technology, in order to communicate with the TCI-DRAM chips which are packaged together. The SC2 features four TCI-DRAM interfaces, each providing a maximum bandwidth of 500 GB/s for a total aggregated bandwidth of 2 TB/s.
  
 
{{memory controller
 
{{memory controller
Line 45: Line 93:
 
|max bandwidth=1.863 TiB/s
 
|max bandwidth=1.863 TiB/s
 
}}
 
}}
 +
 +
== Expansions ==
 +
The SC2 upgraded the PCIe interface to Gen4, supporting up to 64 GB/s. The number of lanes remain unchanged from the {{\\|PEZY-SC}}, at 32.
 +
{{expansions main
 +
|
 +
{{expansions entry
 +
|type=PCIe
 +
|pcie revision=4.0
 +
|pcie lanes=32
 +
|pcie config=4x8
 +
|pcie config 2=2x16
 +
}}
 +
}}
 +
== See also ==
 +
* [https://fuse.wikichip.org/news/191/the-2048-core-pezy-sc2-sets-a-green500-record/ The 2,048-core PEZY-SC2 sets a Green500 record - WikiChip Fuse]

Latest revision as of 11:15, 22 September 2018

Edit Values
PEZY-SC2
General Info
DesignerPEZY
ManufacturerTSMC
Model NumberPEZY-SC2
MarketSupercomputer
Introduction2015 (announced)
2017 (launched)
General Specs
FamilyPEZY-SCx
Frequency1,000 MHz
Microarchitecture
Process16 nm
TechnologyCMOS
Die620 mm²
Cores2,048
Threads16,384
Electrical
Power dissipation180 W
Power dissipation (average)130 W
Vcore0.8 V

PEZY-SC2 (PEZY Super Computer 2) is a third generation many-core microprocessor developed by PEZY and introduced in early 2017. This chip, which operates at 1 GHz, incorporates 2,048 cores dissipating 180 W. The PEZY-SC2 powers the ZettaScaler-2.x series of supercomputers.

Overview[edit]

Introduced by PEZY along with their second-generation ZettaScaler-2.0 supercomputer series, the SC2 incorporates 2,048 cores along with 8-way SMT support for a total of 16,384 threads, twice as many cores as its predecessor. The PEZY-SC2 powers many of the top Green500 most efficient supercomputers with upward of 14 GFLOPS/watt in performance.

Operating at 1 GHz, the PEZY-SC2 has a peak performance of 8.192 TFLOPS (single-precision) and 4.1 TFLOPS (double-precision) while consuming around 180 Watts. The PEZY-SC2 is designed using over 2.4 billion gates and is manufactured on TSMC's 16FF+ process.

In attempt to increase adaptability in the field of deep learning and AI as well as to increase throughput for specialized workloads, the PEZY-SC2 introduced support for 16-bit half precision floating point support. At 1 GHz, the SC2 can peak at 16.4 TFLOPS for half precision.

Architecture[edit]

Managing the tiny PEZY cores are six P-Class P6600 MIPS (MIPS64R6) processors. Previously, the PEZY-SC relied on two lightweight ARM926 cores that proved to be too much of a performance bottleneck. The SC2 got rid of the four "Prefecture" units that incorporated 256 cities along with 2 MiB of L3 cache. Instead, the SC2 now has 40 MiB of shared last level cache shared not only by all the cities, but also by the MIPS cores. In order to improve performance further, the MIPS cores and the PEZY cores now share the same address space, reducing data transfer overhead. It's worth noting that the use of powerful MIPS cores mean they no longer require to rely on an external Intel Xeon E5 host processor.


pezy-sc2 main block.svg

Cache[edit]

The SC2 a hexa-core MIPS P6600 process which has its own separate cache:

[Edit/Modify Cache Info]

hierarchy icon.svg
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory.

The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC.

Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies.

Note: All units are in kibibytes and mebibytes.
L1$768 KiB
786,432 B
0.75 MiB
L1I$384 KiB
393,216 B
0.375 MiB
6x64 KiB4-way set associative 
L1D$384 KiB
393,216 B
0.375 MiB
6x64 KiB8-way set associative 

L2$2 MiB
2,048 KiB
2,097,152 B
0.00195 GiB
  1x2 MiB8-way set associative 

The SC2 integrates a multi-level cache hierarchy:

[Edit/Modify Cache Info]

hierarchy icon.svg
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory.

The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC.

Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies.

Note: All units are in kibibytes and mebibytes.
L1$12 MiB
12,288 KiB
12,582,912 B
L1I$8 MiB
8,192 KiB
8,388,608 B
   
L1D$4 MiB
4,096 KiB
4,194,304 B
   

L2$12 MiB
12,288 KiB
12,582,912 B
0.0117 GiB
     

L3$40 MiB
40,960 KiB
41,943,040 B
0.0391 GiB
  1x40 MiBshared LLC 

Additionally, there is another 40 MiB consisting of 20 KiB per PE of scratch pad memory. This was increased from 16 KiB in the Pezy-SC.

Memory controller[edit]

For main memory, the PEZY-SC2 supports 4 channels of 64-bit DDR4-3200 memory with ECC support for a total aggregated bandwidth of 95.37 GiB/s

[Edit/Modify Memory Info]

ram icons.svg
Integrated Memory Controller
Max TypeDDR4-3200
Supports ECCYes
Max Mem123 GiB
Controllers4
Channels4
Width64 bit
Max Bandwidth95.37 GiB/s
97,658.88 MiB/s
102.403 GB/s
102,402.758 MB/s
0.0931 TiB/s
0.102 TB/s
Bandwidth
Single 23.84GiB/s
Double 47.68 GiB/s
Quad 95.37 GiB/s

In addition to main memory bandwidth, the PEZY-SC2 supports Wide-IO with a width of 1,024 bit. The SC2 uses ThruChip Interface (TCI), a wireless near-field inductive coupling technology, in order to communicate with the TCI-DRAM chips which are packaged together. The SC2 features four TCI-DRAM interfaces, each providing a maximum bandwidth of 500 GB/s for a total aggregated bandwidth of 2 TB/s.

[Edit/Modify Memory Info]

ram icons.svg
Integrated Memory Controller
MemoryWide I/O
Rate2,000 MHz
Width1,024 bit
Channels4
Max Bandwidth1.863 TiB/s
1,907.712 GiB/s
1,953,497.088 MiB/s
2,048.39 GB/s
2,048,390.163 MB/s
2.048 TB/s

Expansions[edit]

The SC2 upgraded the PCIe interface to Gen4, supporting up to 64 GB/s. The number of lanes remain unchanged from the PEZY-SC, at 32.

[Edit/Modify Expansions Info]

ide icon.svg
Expansion Options
PCIeRevision: 4.0
Max Lanes: 32
Configuration: 4x8, 2x16

See also[edit]

Facts about "PEZY-SC2 - PEZY"
Has subobject
"Has subobject" is a predefined property representing a container construct and is provided by Semantic MediaWiki.
PEZY-SC2 - PEZY#pcie +
base frequency1,000 MHz (1 GHz, 1,000,000 kHz) +
core count2,048 +
core voltage0.8 V (8 dV, 80 cV, 800 mV) +
designerPEZY +
die area620 mm² (0.961 in², 6.2 cm², 620,000,000 µm²) +
familyPEZY-SCx +
first announced2015 +
first launched2017 +
full page namepezy/pezy-scx/pezy-sc2 +
has ecc memory supporttrue + and false +
instance ofmicroprocessor +
l1$ size768 KiB (786,432 B, 0.75 MiB) + and 12,288 KiB (12,582,912 B, 12 MiB) +
l1d$ description8-way set associative +
l1d$ size384 KiB (393,216 B, 0.375 MiB) + and 4,096 KiB (4,194,304 B, 4 MiB) +
l1i$ description4-way set associative +
l1i$ size384 KiB (393,216 B, 0.375 MiB) + and 8,192 KiB (8,388,608 B, 8 MiB) +
l2$ description8-way set associative +
l2$ size2 MiB (2,048 KiB, 2,097,152 B, 0.00195 GiB) + and 12 MiB (12,288 KiB, 12,582,912 B, 0.0117 GiB) +
l3$ descriptionshared LLC +
l3$ size40 MiB (40,960 KiB, 41,943,040 B, 0.0391 GiB) +
ldate3000 +
manufacturerTSMC +
market segmentSupercomputer +
max memory bandwidth95.37 GiB/s (97,658.88 MiB/s, 102.403 GB/s, 102,402.758 MB/s, 0.0931 TiB/s, 0.102 TB/s) + and 1,907.712 GiB/s (1,953,497.088 MiB/s, 2,048.39 GB/s, 2,048,390.163 MB/s, 1.863 TiB/s, 2.048 TB/s) +
max memory channels4 +
model numberPEZY-SC2 +
namePEZY-SC2 +
peak flops (double-precision)4,096,000,000,000 FLOPS (4,096,000,000 KFLOPS, 4,096,000 MFLOPS, 4,096 GFLOPS, 4.096 TFLOPS, 0.0041 PFLOPS, 4.096e-6 EFLOPS, 4.096e-9 ZFLOPS) +
peak flops (half-precision)16,384,000,000,000 FLOPS (16,384,000,000 KFLOPS, 16,384,000 MFLOPS, 16,384 GFLOPS, 16.384 TFLOPS, 0.0164 PFLOPS, 1.6384e-5 EFLOPS, 1.6384e-8 ZFLOPS) +
peak flops (single-precision)8,192,000,000,000 FLOPS (8,192,000,000 KFLOPS, 8,192,000 MFLOPS, 8,192 GFLOPS, 8.192 TFLOPS, 0.00819 PFLOPS, 8.192e-6 EFLOPS, 8.192e-9 ZFLOPS) +
power dissipation180 W (180,000 mW, 0.241 hp, 0.18 kW) +
power dissipation (average)130 W (130,000 mW, 0.174 hp, 0.13 kW) +
process16 nm (0.016 μm, 1.6e-5 mm) +
supported memory typeDDR4-3200 +
technologyCMOS +
thread count16,384 +