From WikiChip
Difference between revisions of "pezy/pezy-scx/pezy-sc"
< pezy‎ | pezy-scx

(fixed typo)
 
(25 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
{{pezy title|PEZY-SC}}
 
{{pezy title|PEZY-SC}}
{{mpu
+
{{chip
| name               = PEZY-SC
+
|name=PEZY-SC
| no image           =
+
|image=pezy-sc (front).png
| image              = pezy sc.jpg
+
|designer=PEZY
| image size          =
+
|manufacturer=TSMC
| caption            =
+
|model number=PEZY-SC
| designer           = PEZY
+
|market=Supercomputer
| manufacturer       = TSMC
+
|first announced=2013
| model number       = PEZY-SC
+
|first launched=September, 2014
| part number        =
+
|family=PEZY-SCx
| market             = Industrial
+
|frequency=733.33 MHz
| first announced     = 2013
+
|process=28 nm
| first launched     = September, 2014
+
|transistors=3,730,000,000
| last order          =
+
|technology=CMOS
| last shipment      =
+
|die area=411.6 mm²
 
+
|die length=19.5 mm
| family             =
+
|die width=21.1 mm
| series              =
+
|core count=1,024
| locked              =  
+
|thread count=8,192
| frequency           = 733.33 MHz
+
|power=100 W
| bus type            =
+
|average power=70 W
| bus speed          = 66.66 MHz
+
|v core=1.0 V
| bus rate            =
+
|package module 1={{packages/pezy/fcbga-2112}}
| clock multiplier    = 11
 
 
 
| microarch          =
 
| platform            =
 
| chipset            =
 
| core name          =
 
| core family        =
 
| core model          =
 
| core stepping      =
 
| process             = 28 nm
 
| transistors         =  
 
| technology         = CMOS
 
| die area           = 411.6 mm²
 
| die width          = 21.1 mm
 
| die length          = 19.5 mm
 
| word size          =
 
| core count         = 1024
 
| thread count       =  
 
| max cpus            =  
 
| max memory          =
 
| max memory addr    =
 
 
 
| electrical          = Yes
 
| power               = 70 W
 
| v core             = 1.0 V
 
| v core tolerance    =
 
| v io                =
 
| v io tolerance      =
 
| sdp                =
 
| tdp                =
 
| ctdp down          =
 
| ctdp down frequency =
 
| ctdp up            =
 
| ctdp up frequency  =
 
| temp min            =
 
| temp max            =
 
| tjunc min          = <!-- °C -->
 
| tjunc max          =
 
| tcase min          =
 
| tcase max          =
 
| tstorage min        =
 
| tstorage max        =
 
 
 
| packaging          = Yes
 
| package 0          = fcBGA-2112
 
| package 0 type      = fcBGA
 
| package 0 pins      = 2112
 
| package 0 pitch    = 1 mm
 
| package 0 width    = 47.5 mm
 
| package 0 length    = 47.5 mm
 
| package 0 height    = 4.05 mm
 
| socket 0            = BGA-2112
 
| socket 0 type      = BGA
 
 
}}
 
}}
'''PEZY-SC''' ('''PEZY Super Computer''') is second generation [[many-core microprocessor]] developed by [[PEZY]] in 2014. PEZY-SC contains 2 {{armh|ARM926}} cores ({{arm|ARMv5TEJ}}) along with 1024 simpler RISC cores. Operating at 733 MHz, the processor is said to have peak performance of 3.0 TFLOPS (single-precision) and 1.5 TFLOPS (double-precision). PEZY-SC was designed using 580 million gates and manufactured on TSMC's 28HPC+ ([[28 nm process]]). The PEZY-SC is used in a number of [[TOP500]] & [[Green500]] supercomputers as the world's most efficient supercomputers.
+
'''PEZY-SC''' ('''PEZY Super Computer''') is a second generation [[many-core microprocessor]] developed by [[PEZY]] and introduced in 2014. This chip, which operates at 733 MHz, incorporates 1,024 cores dissipating 100 W. The PEZY-SC powers the [[ZettaScaler]]-1.x series of supercomputers. The PEZY-SC is used in a number of [[TOP500]] & [[Green500]] supercomputers as the world's most efficient supercomputers.
  
 
== Overview ==
 
== Overview ==
 
{{see also|pezy/pezy-1|l1=PEZY-1}}
 
{{see also|pezy/pezy-1|l1=PEZY-1}}
The PEZY-SC (SC for "Super Computer") is [[PEZY]]'s second generation microprocessors which builds upon the {{pezy|PEZY-1}}. The chip contains exactly twice as many cores and incorporates a large amount of cache including 8 MB of L3$.
+
The PEZY-SC (SC for "Super Computer") is [[PEZY]]'s second generation microprocessors which builds upon the {{pezy|PEZY-1}}. The chip contains exactly twice as many cores and incorporates a large amount of cache including 8 MB of L3$. The chip contains 2 {{armh|ARM926}} cores ({{arm|ARMv5TEJ}}) along with 1,024 simpler cores supporting 8-way [[simultaneous multithreading|SMT]] for a total of 8,192 [[logical core|threads]]. Operating at 733 MHz, the processor has a peak performance of 3.0 [[TFLOPS]] (single-precision) and 1.5 TFLOPS (double-precision). PEZY-SC was designed using 580 million gates and manufactured on [[28 nm process|TSMC's 28HPC+]].
 +
{{#set:
 +
| peak flops (single-precision) = {{#expr:733333333 * 4 * 1024}} FLOPS
 +
| peak flops (double-precision) = {{#expr:733333333 * 2 * 1024}} FLOPS
 +
}}
 +
The chip has a peak power dissipation of 100 W with a typical power consumption of 70 W which consists of 10 W [[static power|leakage]] + 60 W [[dynamic power|dynamic]].
  
 
In June of 2015, PEZY-SC-based [[supercomputer]]s took all top 3 spots on the [[Green500]] listing as the 3 most efficient supercomputers. PEZY-SC powers [[Shoubu]] (1,181,952 cores, ? kW, 605.624 TFlop/s [[Linpack]] Rmax), and [[Suiren Blue]] (262,656 cores, 40.86 kW, 247.752 TFlop/s Linpack Rmax), and [[Suiren]] (328,480 cores, 48.90 kW, 271.782 TFlop/s Linpack Rmax) supercomputers (ranked 1, 2, and 3 respectively).
 
In June of 2015, PEZY-SC-based [[supercomputer]]s took all top 3 spots on the [[Green500]] listing as the 3 most efficient supercomputers. PEZY-SC powers [[Shoubu]] (1,181,952 cores, ? kW, 605.624 TFlop/s [[Linpack]] Rmax), and [[Suiren Blue]] (262,656 cores, 40.86 kW, 247.752 TFlop/s Linpack Rmax), and [[Suiren]] (328,480 cores, 48.90 kW, 271.782 TFlop/s Linpack Rmax) supercomputers (ranked 1, 2, and 3 respectively).
  
 
== Architecture ==
 
== Architecture ==
The PEZY-SC microprocessors is made of 4 blocks called "Prefectures". The Prefecture contains 2 MB of L3$ enclosed by 16 smaller blocks called "Cities".  Each City is made of 64 KB of L2$, a number of special function units, and 4 smaller blocks called "Villages". A village is a block of 4 execution units. For ever 2 execution units there are 2 KB of L1d$.
+
{{further|pezy/pezy-scx#Architecture|l1=PEZY-SCx § Architecture}}
 +
The PEZY-SC microprocessors is made of 4 blocks called "Prefectures". The Prefecture contains 2 MiB of L3$ enclosed by 16 smaller blocks called "Cities".  Each City is made of 64 KiB of L2$, a number of special function units, and 4 smaller blocks called "Villages". A village is a block of 4 execution units. For every 2 execution units there is 2 KiB of L1D$.
  
 
[[File:pezy-sc arch.svg|700px]]
 
[[File:pezy-sc arch.svg|700px]]
 
=== Processor Element (PE) ===
 
The PE are the individual execution cores.
 
{{expand section}}
 
 
== Die Shot ==
 
[[File:pezy sc die shot.jpg|650px]]
 
 
 
[[File:pezy-sc die shot (annotated).png|650px]]
 
  
 
== Cache ==
 
== Cache ==
 
PEZY-SC's cache is separate from the {{armh|ARM926}}'s cache which has an L1$ of 32 KiB (2x) and 64 KiB L2$ (shared).
 
PEZY-SC's cache is separate from the {{armh|ARM926}}'s cache which has an L1$ of 32 KiB (2x) and 64 KiB L2$ (shared).
{{cache info
+
{{cache size
 +
|l1 cache=64 KiB
 +
|l1i cache=32 KiB
 +
|l1i break=2x16 KiB
 +
|l1d cache=32 KiB
 +
|l1d break=2x16 KiB
 +
|l2 cache=64 KiB
 +
|l2 break=1x64 KiB
 +
}}
 +
 
 +
The chip integrates a multi-level cache hierarchy:
 +
{{cache size
 +
|l1 cache=3 MiB
 
|l1i cache=2 MiB
 
|l1i cache=2 MiB
 
|l1i break=1024x2 KiB
 
|l1i break=1024x2 KiB
|l1i extra=(per processor element)
+
|l1i desc=per processor element
 
|l1d cache=1 MiB
 
|l1d cache=1 MiB
 
|l1d break=512x2 KiB
 
|l1d break=512x2 KiB
|l1d extra=(per 2 processor elements)
+
|l1d desc=per 2 processor elements
 +
|l1d policy=
 
|l2 cache=4 MiB
 
|l2 cache=4 MiB
 
|l2 break=4x2 MiB
 
|l2 break=4x2 MiB
|l2 extra=(per city)
+
|l2 desc=per city
 +
|l2 policy=write-back
 
|l3 cache=8 MiB
 
|l3 cache=8 MiB
 
|l3 break=4x2 MiB
 
|l3 break=4x2 MiB
|l3 extra=(per prefecture)
+
|l3 desc=per prefecture
 +
|l3 policy=
 
}}
 
}}
 +
 +
Additionally, there is another 16 MiB of scratch-pad memory consisting of 16 KiB per PE.
  
 
== Memory controller ==
 
== Memory controller ==
{{integrated memory controller
+
{{memory controller
| type               = DDR4-1333
+
|type=DDR4-2133
| controllers       = 1
+
|ecc=Yes
| channels           = 8
+
|controllers=8
| ecc support        = <!-- ?? -->
+
|channels=8
| bandwidth schan   = 10,600 MB/s
+
|width=64 bit
| bandwidth dchan   = 21,200 MB/s
+
|max bandwidth=127.156 GiB/s
| bandwidth qchan   = 42,400 MB/s
+
|bandwidth schan=15.89 GiB/s
| bandwidth ochan    = 84,800 MB/s
+
|bandwidth dchan=31.79 GiB/s
| max memory        =  
+
|bandwidth qchan=63.58 GiB/
 +
|bandwidth hchan=95.37 GiB/s
 +
|bandwidth ochan=127.156 GiB/s
 
}}
 
}}
  
 
== Expansions ==
 
== Expansions ==
{{mpu expansions
+
{{expansions main
| pcie revision     = 2.0
+
|
| pcie lanes         = 8
+
{{expansions entry
| pcie config        =  
+
|type=PCIe
| pcie config 1      =
+
|pcie revision=2.0
| pcie config 2      =
+
|pcie lanes=32
| usb revision      =
+
|pcie config=4x8
| usb revision 2    =
 
| usb ports          =
 
| sata revision      =
 
| sata ports        =
 
| integrated lan    =
 
| uart              = Yes
 
| gp io              = Yes
 
 
}}
 
}}
 +
}}
 +
 +
== Die Shot ==
 +
* [[28 nm process]]
 +
* 19.5 mm × 21.1 mm
 +
* 411.6 mm² die size
 +
 +
:[[File:pezy sc die shot.jpg|650px]]
 +
 +
 +
:[[File:pezy-sc die shot (annotated).png|650px]]
 +
 +
=== Floorplan ===
 +
 +
:[[File:pezy-sc floorplan.png|650px]]
  
 
== External Links ==
 
== External Links ==
Line 151: Line 121:
 
* [https://www.top500.org/system/178542 Shoubu Supercomputer on TOP500]
 
* [https://www.top500.org/system/178542 Shoubu Supercomputer on TOP500]
 
* [https://www.top500.org/system/178540 Suiren Blue on TOP500]
 
* [https://www.top500.org/system/178540 Suiren Blue on TOP500]
* [http://pc.watch.impress.co.jp/docs/news/714768.html Japanese supercomputer "PEZY system" has monopolized the 1 to 3-position of the Green500]
 
* [https://www.youtube.com/watch?v=_9Z1q0Kn2Qs 省エネ 液浸冷却小型スーパーコンピュータ ZettaScaler1.5]
 

Latest revision as of 10:14, 22 September 2018

Edit Values
PEZY-SC
pezy-sc (front).png
General Info
DesignerPEZY
ManufacturerTSMC
Model NumberPEZY-SC
MarketSupercomputer
Introduction2013 (announced)
September, 2014 (launched)
General Specs
FamilyPEZY-SCx
Frequency733.33 MHz
Microarchitecture
Process28 nm
Transistors3,730,000,000
TechnologyCMOS
Die411.6 mm²
19.5 mm × 21.1 mm
Cores1,024
Threads8,192
Electrical
Power dissipation100 W
Power dissipation (average)70 W
Vcore1.0 V
Packaging
PackageFCBGA-2112 (BGA)pezy-sc (back).png
Dimension47.5 mm x 47.5 mm x 4.05 mm
Pitch1 mm
Contacts2,112

PEZY-SC (PEZY Super Computer) is a second generation many-core microprocessor developed by PEZY and introduced in 2014. This chip, which operates at 733 MHz, incorporates 1,024 cores dissipating 100 W. The PEZY-SC powers the ZettaScaler-1.x series of supercomputers. The PEZY-SC is used in a number of TOP500 & Green500 supercomputers as the world's most efficient supercomputers.

Overview[edit]

See also: PEZY-1

The PEZY-SC (SC for "Super Computer") is PEZY's second generation microprocessors which builds upon the PEZY-1. The chip contains exactly twice as many cores and incorporates a large amount of cache including 8 MB of L3$. The chip contains 2 ARM926 cores (ARMv5TEJ) along with 1,024 simpler cores supporting 8-way SMT for a total of 8,192 threads. Operating at 733 MHz, the processor has a peak performance of 3.0 TFLOPS (single-precision) and 1.5 TFLOPS (double-precision). PEZY-SC was designed using 580 million gates and manufactured on TSMC's 28HPC+.

The chip has a peak power dissipation of 100 W with a typical power consumption of 70 W which consists of 10 W leakage + 60 W dynamic.

In June of 2015, PEZY-SC-based supercomputers took all top 3 spots on the Green500 listing as the 3 most efficient supercomputers. PEZY-SC powers Shoubu (1,181,952 cores, ? kW, 605.624 TFlop/s Linpack Rmax), and Suiren Blue (262,656 cores, 40.86 kW, 247.752 TFlop/s Linpack Rmax), and Suiren (328,480 cores, 48.90 kW, 271.782 TFlop/s Linpack Rmax) supercomputers (ranked 1, 2, and 3 respectively).

Architecture[edit]

Further information: PEZY-SCx § Architecture

The PEZY-SC microprocessors is made of 4 blocks called "Prefectures". The Prefecture contains 2 MiB of L3$ enclosed by 16 smaller blocks called "Cities". Each City is made of 64 KiB of L2$, a number of special function units, and 4 smaller blocks called "Villages". A village is a block of 4 execution units. For every 2 execution units there is 2 KiB of L1D$.

pezy-sc arch.svg

Cache[edit]

PEZY-SC's cache is separate from the ARM926's cache which has an L1$ of 32 KiB (2x) and 64 KiB L2$ (shared).

[Edit/Modify Cache Info]

hierarchy icon.svg
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory.

The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC.

Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies.

Note: All units are in kibibytes and mebibytes.
L1$64 KiB
65,536 B
0.0625 MiB
L1I$32 KiB
32,768 B
0.0313 MiB
2x16 KiB  
L1D$32 KiB
32,768 B
0.0313 MiB
2x16 KiB  

L2$64 KiB
0.0625 MiB
65,536 B
6.103516e-5 GiB
  1x64 KiB  

The chip integrates a multi-level cache hierarchy:

[Edit/Modify Cache Info]

hierarchy icon.svg
Cache Organization
Cache is a hardware component containing a relatively small and extremely fast memory designed to speed up the performance of a CPU by preparing ahead of time the data it needs to read from a relatively slower medium such as main memory.

The organization and amount of cache can have a large impact on the performance, power consumption, die size, and consequently cost of the IC.

Cache is specified by its size, number of sets, associativity, block size, sub-block size, and fetch and write-back policies.

Note: All units are in kibibytes and mebibytes.
L1$3 MiB
3,072 KiB
3,145,728 B
L1I$2 MiB
2,048 KiB
2,097,152 B
1024x2 KiBper processor element 
L1D$1 MiB
1,024 KiB
1,048,576 B
512x2 KiBper 2 processor elements 

L2$4 MiB
4,096 KiB
4,194,304 B
0.00391 GiB
  4x2 MiBper citywrite-back

L3$8 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
  4x2 MiBper prefecture 

Additionally, there is another 16 MiB of scratch-pad memory consisting of 16 KiB per PE.

Memory controller[edit]

[Edit/Modify Memory Info]

ram icons.svg
Integrated Memory Controller
Max TypeDDR4-2133
Supports ECCYes
Controllers8
Channels8
Width64 bit
Max Bandwidth127.156 GiB/s
130,207.744 MiB/s
136.533 GB/s
136,532.715 MB/s
0.124 TiB/s
0.137 TB/s
Bandwidth
Single 15.89 GiB/s
Double 31.79 GiB/s
Quad 63.58 GiB/
Hexa 95.37 GiB/s
Octa 127.156 GiB/s

Expansions[edit]

[Edit/Modify Expansions Info]

ide icon.svg
Expansion Options
PCIeRevision: 2.0
Max Lanes: 32
Configuration: 4x8


Die Shot[edit]

pezy sc die shot.jpg


pezy-sc die shot (annotated).png

Floorplan[edit]

pezy-sc floorplan.png

External Links[edit]

Facts about "PEZY-SC - PEZY"
Has subobject
"Has subobject" is a predefined property representing a container construct and is provided by Semantic MediaWiki.
PEZY-SC - PEZY#package + and PEZY-SC - PEZY#pcie +
base frequency733.33 MHz (0.733 GHz, 733,330 kHz) +
core count1,024 +
core voltage1 V (10 dV, 100 cV, 1,000 mV) +
designerPEZY +
die area411.6 mm² (0.638 in², 4.116 cm², 411,600,000 µm²) +
die length19.5 mm (1.95 cm, 0.768 in, 19,500 µm) +
die width21.1 mm (2.11 cm, 0.831 in, 21,100 µm) +
familyPEZY-SCx +
first announced2013 +
first launchedSeptember 2014 +
full page namepezy/pezy-scx/pezy-sc +
has ecc memory supporttrue +
instance ofmicroprocessor +
l1$ size64 KiB (65,536 B, 0.0625 MiB) + and 3,072 KiB (3,145,728 B, 3 MiB) +
l1d$ descriptionper 2 processor elements +
l1d$ size32 KiB (32,768 B, 0.0313 MiB) + and 1,024 KiB (1,048,576 B, 1 MiB) +
l1i$ descriptionper processor element +
l1i$ size32 KiB (32,768 B, 0.0313 MiB) + and 2,048 KiB (2,097,152 B, 2 MiB) +
l2$ descriptionper city +
l2$ size0.0625 MiB (64 KiB, 65,536 B, 6.103516e-5 GiB) + and 4 MiB (4,096 KiB, 4,194,304 B, 0.00391 GiB) +
l3$ descriptionper prefecture +
l3$ size8 MiB (8,192 KiB, 8,388,608 B, 0.00781 GiB) +
ldateSeptember 2014 +
main imageFile:pezy-sc (front).png +
manufacturerTSMC +
market segmentSupercomputer +
max memory bandwidth127.156 GiB/s (130,207.744 MiB/s, 136.533 GB/s, 136,532.715 MB/s, 0.124 TiB/s, 0.137 TB/s) +
max memory channels8 +
model numberPEZY-SC +
namePEZY-SC +
packageFCBGA-2112 +
peak flops (double-precision)1,501,866,665,984 FLOPS (1,501,866,665.984 KFLOPS, 1,501,866.666 MFLOPS, 1,501.867 GFLOPS, 1.502 TFLOPS, 0.0015 PFLOPS, 1.501867e-6 EFLOPS, 1.501867e-9 ZFLOPS) +
peak flops (single-precision)3,003,733,331,968 FLOPS (3,003,733,331.968 KFLOPS, 3,003,733.332 MFLOPS, 3,003.733 GFLOPS, 3.004 TFLOPS, 0.003 PFLOPS, 3.003733e-6 EFLOPS, 3.003733e-9 ZFLOPS) +
power dissipation100 W (100,000 mW, 0.134 hp, 0.1 kW) +
power dissipation (average)70 W (70,000 mW, 0.0939 hp, 0.07 kW) +
process28 nm (0.028 μm, 2.8e-5 mm) +
supported memory typeDDR4-2133 +
technologyCMOS +
thread count8,192 +
transistor count3,730,000,000 +