From WikiChip
Difference between revisions of "pezy/pezy-scx/pezy-sc2"
< pezy‎ | pezy-scx

(Memory controller)
(Cache)
Line 62: Line 62:
 
}}
 
}}
  
Additionally, there is another 40 MiB consisting of 20 KiB per PE of scratch pad memory. This was increased from 16 KiB in the {{\\|Pezy-SC2}}.
+
Additionally, there is another 40 MiB consisting of 20 KiB per PE of scratch pad memory. This was increased from 16 KiB in the {{\\|Pezy-SC}}.
  
 
== Memory controller ==
 
== Memory controller ==

Revision as of 23:10, 2 November 2017

Template:mpu PEZY-SC2 (PEZY Super Computer 2) is third generation many-core microprocessor developed by PEZY and introduced in early 2017. This chip, which operates at 1 GHz, incorporates 2,048 cores dissipating 180 W.

Overview

Introduced by PEZY along with their second-generation ZettaScaler-2.0 supercomputer series, the SC2 incorporates 2,048 cores along with 8-way SMT support for a total of 16,384 threads, twice as many cores as its predecessor.

Operating at 1 GHz, the PEZY-SC2 has a peak performance of 8.192 TFLOPS (single-precision) and 4.1 TFLOPS (double-precision) while consuming around 180 Watts. The PEZY-SC2 is designed using over 2.4 billion gates and is manufactured on TSMC's 16FF+ process.

In attempt to increase adaptability in the field of deep learning and AI as well as to increase throughput for specialized workloads, the PEZY-SC2 introduced support for 16-bit half precision floating point support. At 1 GHz, the SC2 can peak at 16.4 TFLOPS for half precision.

Cache

The SC2 a hexa-core MIPS P6600 process which has its own separate cache:

[Edit/Modify Cache Info]

hierarchy icon.svg
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory.

The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC.

Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies.

Note: All units are in kibibytes and mebibytes.
L1$768 KiB
786,432 B
0.75 MiB
L1I$384 KiB
393,216 B
0.375 MiB
6x64 KiB4-way set associative 
L1D$384 KiB
393,216 B
0.375 MiB
6x64 KiB8-way set associative 

L2$2 MiB
2,048 KiB
2,097,152 B
0.00195 GiB
  1x2 MiB8-way set associative 

The SC2 integrates a multi-level cache hierarchy:

[Edit/Modify Cache Info]

hierarchy icon.svg
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory.

The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC.

Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies.

Note: All units are in kibibytes and mebibytes.
L1$12 MiB
12,288 KiB
12,582,912 B
L1I$8 MiB
8,192 KiB
8,388,608 B
   
L1D$4 MiB
4,096 KiB
4,194,304 B
   

L2$12 MiB
12,288 KiB
12,582,912 B
0.0117 GiB
     

L3$40 MiB
40,960 KiB
41,943,040 B
0.0391 GiB
  1x40 MiBshared LLC 

Additionally, there is another 40 MiB consisting of 20 KiB per PE of scratch pad memory. This was increased from 16 KiB in the Pezy-SC.

Memory controller

For main memory, the PEZY-SC2 supports 4 channels of 64-bit DDR4-3200 memory with ECC support for a total aggregated bandwidth of 95.37 GiB/s

[Edit/Modify Memory Info]

ram icons.svg
Integrated Memory Controller
Max TypeDDR4-3200
Supports ECCYes
Controllers4
Channels4
Width64 bit
Max Bandwidth95.37 GiB/s
97,658.88 MiB/s
102.403 GB/s
102,402.758 MB/s
0.0931 TiB/s
0.102 TB/s
Bandwidth
Single 23.84GiB/s
Double 47.68 GiB/s
Quad 95.37 GiB/s

In addition to main memory bandwidth, the PEZY-SC2 supports Wide-IO with a width of 1,024 bit. The SC2 uses ThruChip Interface (TCI), a wireless near-field inductive coupling technology, in order to communicate with the TCI-DRAM chips which are packaged together. The SC2 features four TCI-DRAM interfaces, each providing a maximum bandwidth of 500 GB/s for a total aggregated bandwidth of 2 TB/s.

[Edit/Modify Memory Info]

ram icons.svg
Integrated Memory Controller
MemoryWide I/O
Rate2,000 MHz
Width1,024 bit
Channels4
Max Bandwidth1.863 TiB/s
1,907.712 GiB/s
1,953,497.088 MiB/s
2,048.39 GB/s
2,048,390.163 MB/s
2.048 TB/s

Expansions

[Edit/Modify Expansions Info]

ide icon.svg
Expansion Options
PCIeRevision: 3.0
Max Lanes: 32
Configuration: 4x8
Facts about "PEZY-SC2 - PEZY"
Has subobject
"Has subobject" is a predefined property representing a container construct and is provided by Semantic MediaWiki.
PEZY-SC2 - PEZY#pcie +
has ecc memory supporttrue + and false +
l1$ size768 KiB (786,432 B, 0.75 MiB) + and 12,288 KiB (12,582,912 B, 12 MiB) +
l1d$ description8-way set associative +
l1d$ size384 KiB (393,216 B, 0.375 MiB) + and 4,096 KiB (4,194,304 B, 4 MiB) +
l1i$ description4-way set associative +
l1i$ size384 KiB (393,216 B, 0.375 MiB) + and 8,192 KiB (8,388,608 B, 8 MiB) +
l2$ description8-way set associative +
l2$ size2 MiB (2,048 KiB, 2,097,152 B, 0.00195 GiB) + and 12 MiB (12,288 KiB, 12,582,912 B, 0.0117 GiB) +
l3$ descriptionshared LLC +
l3$ size40 MiB (40,960 KiB, 41,943,040 B, 0.0391 GiB) +
max memory bandwidth95.37 GiB/s (97,658.88 MiB/s, 102.403 GB/s, 102,402.758 MB/s, 0.0931 TiB/s, 0.102 TB/s) + and 1,907.712 GiB/s (1,953,497.088 MiB/s, 2,048.39 GB/s, 2,048,390.163 MB/s, 1.863 TiB/s, 2.048 TB/s) +
max memory channels4 +
peak flops (double-precision)4,096,000,000,000 FLOPS (4,096,000,000 KFLOPS, 4,096,000 MFLOPS, 4,096 GFLOPS, 4.096 TFLOPS, 0.0041 PFLOPS, 4.096e-6 EFLOPS, 4.096e-9 ZFLOPS) +
peak flops (half-precision)16,384,000,000,000 FLOPS (16,384,000,000 KFLOPS, 16,384,000 MFLOPS, 16,384 GFLOPS, 16.384 TFLOPS, 0.0164 PFLOPS, 1.6384e-5 EFLOPS, 1.6384e-8 ZFLOPS) +
peak flops (single-precision)8,192,000,000,000 FLOPS (8,192,000,000 KFLOPS, 8,192,000 MFLOPS, 8,192 GFLOPS, 8.192 TFLOPS, 0.00819 PFLOPS, 8.192e-6 EFLOPS, 8.192e-9 ZFLOPS) +
supported memory typeDDR4-3200 +