From WikiChip
Difference between revisions of "hisilicon/microarchitectures/taishan v110"
< hisilicon

(Memory Hierarchy)
(Fixes a little typo)
 
(22 intermediate revisions by one other user not shown)
Line 1: Line 1:
{{hisilicon title|TaiShan|arch}}
+
{{hisilicon title|TaiShan v110|arch}}
 
{{microarchitecture
 
{{microarchitecture
 
|atype=CPU
 
|atype=CPU
|name=TaiShan
+
|name=TaiShan v110
 
|designer=HiSilicon
 
|designer=HiSilicon
 
|manufacturer=TSMC
 
|manufacturer=TSMC
Line 28: Line 28:
 
|l3=1 MiB
 
|l3=1 MiB
 
|l3 per=core
 
|l3 per=core
|core name=TaiShan
+
|predecessor=TaiShan v100
 +
|predecessor link=hisilicon/microarchitectures/taishan_v100
 
}}
 
}}
'''TaiShan''' is a high-performance [[ARM]] server microarchitecture designed by [[HiSilicon]] for [[Huawei]]'s own TaiShan servers.
+
'''TaiShan v110''' is the successor to the {{\\|TaiShan v100}}, a high-performance [[ARM]] server microarchitecture designed by [[HiSilicon]] for [[Huawei]]'s own TaiShan servers.
  
 
== Brands ==
 
== Brands ==
Line 39: Line 40:
  
 
== Architecture ==
 
== Architecture ==
* [[TSMC]] [[7 nm|7 nm HPC process]]
+
[[File:hi1620 overview.png|500px|thumb|right|Overview]]
 +
=== Key changes from {{\\|TaiShan v100}} ===
 +
* [[TSMC]] [[7 nm|7 nm HPC process]] (from [[16 nm]])
 +
* 2x [[core count]] (64, up from 32)
 +
** Custom cores (from {{armh|Cortex-A72|l=arch}})
 +
*** ASIMD
 +
**** double SP Vector throughput (2 inst/cycle, up from 1)
 +
* Memory
 +
** 2x memory channels (8, up from 4)
 +
* I/O
 +
** PCIe Gen 4 (from Gen 3)
 
{{expand list}}
 
{{expand list}}
  
 
=== Block Diagram ===
 
=== Block Diagram ===
 
==== Entire Chip ====
 
==== Entire Chip ====
:[[File:taishan soc block diagram.svg|900px]]
+
:[[File:taishan v110 soc block diagram.svg|900px]]
  
 
=== Memory Hierarchy ===
 
=== Memory Hierarchy ===
Line 50: Line 61:
 
** L1I Cache
 
** L1I Cache
 
*** 64 KiB/core, private
 
*** 64 KiB/core, private
 +
*** 64-byte cache lines
 
** L1D Cache
 
** L1D Cache
 
*** 64 KiB/core, private
 
*** 64 KiB/core, private
 +
*** 64-byte cache lines
 
** L2 Cache
 
** L2 Cache
 
*** 512 KiB/core, private
 
*** 512 KiB/core, private
Line 63: Line 76:
 
**** 1 DPC and 2 DPC support
 
**** 1 DPC and 2 DPC support
 
*** 8 B/cycle/channel (@ memory clock)
 
*** 8 B/cycle/channel (@ memory clock)
*** ECC
+
*** ECC, SDDC, DDDC
  
 
== Overview ==
 
== Overview ==
{{empty section}}
+
[[File:taishan v110 overview.svg|right|500px|thumb|Overview]]
 +
Though HiSilicon has a history of designing Arm processors. The TaiShan v110 core is HiSilicons' first custom homegrown high-performance [[ARM]] core and SoC design. The chip, which incorporates multiple compute dies and an I/O is a multi-chip package, is fabricated on [[TSMC]]'s [[7 nm|7-nanometers HPC process]] and integrates up to 64 cores and up to 64 MiB of [[last level cache]].
 +
 
 +
The SoC also incorporates a number of [[hardware accelerators]]. There is a crypto engine that supports AES, DES/3DES, MD5, SHA1, SHA2, HMAC, CMAC with throughputs of up to 100 Gbit/s. Additionally, there is also a compression engine supporting GZIP, LZS, LZ4 with compression throughputs of up to 40 Gbit/s and decompression of up to 100 Gbit/s.
 +
 
 +
Marketed as the Kunpeng 920, this SoC supports up to 4-way multiprocessing support through HiSilicon's Hydra interface. In order to keep the cores fed, eight [[DDR4]] [[memory channels]] are incorporated per socket. Additionally, designed to facilitate an easy [[accelerator]] platform, there are 40 PCIe Gen 4 lanes provided per socket with [[CCIX]] support, enabling cache coherency.
  
 
== Core ==
 
== Core ==
{{empty section}}
+
Each core is a 4-way out-of-order superscalar that implements the [[ARMv8.2]]-A ISA. Huawei stated that the core supports almost all the [[ARMv8.4]] features with a few exceptions, including dot product and the FP16 FML extension. It features private 64 KiB L1 instruction and data caches as well as 512 KiB of private L2. Though light on details, Huawei says that compared to Arm's {{armh|Cortex}} cores, their core features an improved memory subsystem, a larger number of execution units, and a better branch predictor.
 +
 
 +
=== ASIMD ===
 +
Each core features a single 128-bit {{arm|NEON}} unit. It is capable of executing single double-precision FMA vector instruction per cycle or two single-precision vector instructions per cycle. Operating at 2 GHz, a 64-core chip will have a peak compute of 512 GigaFLOPS of [[double-precision floating point]]. It's worth noting that compared to the {{\\|TaiShan v100}}, the throughput for single-precision vector has been doubled from 1 to 2 instructions per cycle.
 +
 
 +
== MCP physical design ==
 +
The SoC itself comprises 3 dies - two '''Super CPU Cluster''' ('''SCCL''') compute dies and a '''Super IO Cluster''' ('''SICL'''). The SCCL compute dies contains 8 CPU Clusters (CCLs), memory controllers, and the L3 cache block. There are eight CCLs on each of the SICL dies for a total of 64 cores. The CCLs are TaiShan V110 quadplex along with the L3 cache tags partition. The Super IO Clusters include the various I/O peripherals including PCIe Gen 4, SAS, the network interface controllers, and the Hydra links.
 +
 
 +
:[[File:taishan v110 soc details.svg|700px]]
  
 
== Scalability ==
 
== Scalability ==
{{empty section}}
+
{{see also|hisilicon/hydra|l1=Hydra Interface}}
 +
Each chip incorporates three Hydra interface ports. The Hydra interface facilitates the cache coherency between the dies on the chip. Every link supports 240 Gb/s (30 GB/s) of peak bandwidth for a total aggregated bandwidth of 720 Gb/s (90 GB/s) in a 2-way [[symmetric multiprocessing]] configuration.
 +
 
 +
:[[File:Kunpeng 920 2smp.svg|600px]]
 +
 
 +
With all three links, there is also support for 4-way SMP. In this configuration, one link from each socket is connected to another socket for an all-for-all connection.
 +
 
 +
 
 +
:[[File:Kunpeng 920 4smp.svg|600px]]
 +
 
 +
== Chipset ==
 +
Along with the Hi1620 SoC, HiSilicon developed a number of integrated circuits as part of the chipset platform.
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Chip !! Description
 +
|-
 +
| Hi1620 || CPU, Kunpeng 920 series Chip
 +
|-
 +
| Hi1503 || CPU interconnect chip, supports scaling-up to 32 sockets
 +
|-
 +
| Hi1812 || SSD  storage controller, for read/write I/O acceleration
 +
|-
 +
| Hi1822 || Network controller chip, DC high-speed flexible interconnect
 +
|-
 +
| Hi1710 || BMC management chip + enhanced RAS features chip
 +
|}
 +
 
 +
:[[File:hi1620 chipset.png|600px]]
  
 
== Die ==
 
== Die ==
 
* TSMC [[7 nm|7 nm HPC]]
 
* TSMC [[7 nm|7 nm HPC]]
 
* 20,000,000,000 transistors
 
* 20,000,000,000 transistors
 +
** 3-4 dies
  
== All TaiShan Chips ==
+
== All TaiShan v110 Chips ==
{{empty section}}
+
{{comp table start}}
 +
<table class="comptable sortable tc4">
 +
{{comp table header|main|6:List of TaiShan v110-based Processors}}
 +
{{comp table header|cols|Launched|Cores|Arch|%Frequency|L3|TDP}}
 +
{{#ask: [[Category:microprocessor models by hisilicon]] [[core name::TaiShan v110]]
 +
|?full page name
 +
|?model number
 +
|?first launched
 +
|?core count
 +
|?core name
 +
|?base frequency#GHz
 +
|?l3$ size
 +
|?tdp
 +
|format=template
 +
|template=proc table 3
 +
|userparam=8
 +
|mainlabel=-
 +
}}
 +
{{comp table count|ask=[[Category:microprocessor models by hisilicon]] [[core name::TaiShan v110]]}}
 +
</table>
 +
{{comp table end}}
  
 
== Bibliography ==
 
== Bibliography ==
Line 85: Line 160:
 
* Huawei Connect 2018. October 2018
 
* Huawei Connect 2018. October 2018
 
* HiSilicon Event. January 7, 2019
 
* HiSilicon Event. January 7, 2019
 +
* Huawei, Supercomputing 2018

Latest revision as of 09:20, 9 September 2022

Edit Values
TaiShan v110 µarch
General Info
Arch TypeCPU
DesignerHiSilicon
ManufacturerTSMC
Introduction2019
Process7 nm
Core Configs32, 48, 64
Pipeline
TypeSuperscalar, Superpipeline
OoOEYes
SpeculativeYes
Reg RenamingYes
Decode4-way
Instructions
ISAARMv8.2-A
ExtensionsNEON
Cache
L1I Cache64 KiB/core
L1D Cache64 KiB/core
L2 Cache512 KiB/core
L3 Cache1 MiB/core
Succession

TaiShan v110 is the successor to the TaiShan v100, a high-performance ARM server microarchitecture designed by HiSilicon for Huawei's own TaiShan servers.

Brands[edit]

TaiShan-based CPUs are branded as the Kunpeng 920 series.

Release Dates[edit]

Kunpeng 920 CPUs were officially launched in early 2019.

Architecture[edit]

Overview

Key changes from TaiShan v100[edit]

  • TSMC 7 nm HPC process (from 16 nm)
  • 2x core count (64, up from 32)
    • Custom cores (from Cortex-A72)
      • ASIMD
        • double SP Vector throughput (2 inst/cycle, up from 1)
  • Memory
    • 2x memory channels (8, up from 4)
  • I/O
    • PCIe Gen 4 (from Gen 3)

This list is incomplete; you can help by expanding it.

Block Diagram[edit]

Entire Chip[edit]

taishan v110 soc block diagram.svg

Memory Hierarchy[edit]

  • Cache
    • L1I Cache
      • 64 KiB/core, private
      • 64-byte cache lines
    • L1D Cache
      • 64 KiB/core, private
      • 64-byte cache lines
    • L2 Cache
      • 512 KiB/core, private
    • L3 Cache
      • 1 MiB/core
      • Shared by all cores
    • System DRAM
      • 1 TiB Max Memory / socket
      • 8 Channels
      • DDR4, up to 2933 MT/s
        • 1 DPC and 2 DPC support
      • 8 B/cycle/channel (@ memory clock)
      • ECC, SDDC, DDDC

Overview[edit]

Overview

Though HiSilicon has a history of designing Arm processors. The TaiShan v110 core is HiSilicons' first custom homegrown high-performance ARM core and SoC design. The chip, which incorporates multiple compute dies and an I/O is a multi-chip package, is fabricated on TSMC's 7-nanometers HPC process and integrates up to 64 cores and up to 64 MiB of last level cache.

The SoC also incorporates a number of hardware accelerators. There is a crypto engine that supports AES, DES/3DES, MD5, SHA1, SHA2, HMAC, CMAC with throughputs of up to 100 Gbit/s. Additionally, there is also a compression engine supporting GZIP, LZS, LZ4 with compression throughputs of up to 40 Gbit/s and decompression of up to 100 Gbit/s.

Marketed as the Kunpeng 920, this SoC supports up to 4-way multiprocessing support through HiSilicon's Hydra interface. In order to keep the cores fed, eight DDR4 memory channels are incorporated per socket. Additionally, designed to facilitate an easy accelerator platform, there are 40 PCIe Gen 4 lanes provided per socket with CCIX support, enabling cache coherency.

Core[edit]

Each core is a 4-way out-of-order superscalar that implements the ARMv8.2-A ISA. Huawei stated that the core supports almost all the ARMv8.4 features with a few exceptions, including dot product and the FP16 FML extension. It features private 64 KiB L1 instruction and data caches as well as 512 KiB of private L2. Though light on details, Huawei says that compared to Arm's Cortex cores, their core features an improved memory subsystem, a larger number of execution units, and a better branch predictor.

ASIMD[edit]

Each core features a single 128-bit NEON unit. It is capable of executing single double-precision FMA vector instruction per cycle or two single-precision vector instructions per cycle. Operating at 2 GHz, a 64-core chip will have a peak compute of 512 GigaFLOPS of double-precision floating point. It's worth noting that compared to the TaiShan v100, the throughput for single-precision vector has been doubled from 1 to 2 instructions per cycle.

MCP physical design[edit]

The SoC itself comprises 3 dies - two Super CPU Cluster (SCCL) compute dies and a Super IO Cluster (SICL). The SCCL compute dies contains 8 CPU Clusters (CCLs), memory controllers, and the L3 cache block. There are eight CCLs on each of the SICL dies for a total of 64 cores. The CCLs are TaiShan V110 quadplex along with the L3 cache tags partition. The Super IO Clusters include the various I/O peripherals including PCIe Gen 4, SAS, the network interface controllers, and the Hydra links.

taishan v110 soc details.svg

Scalability[edit]

See also: Hydra Interface

Each chip incorporates three Hydra interface ports. The Hydra interface facilitates the cache coherency between the dies on the chip. Every link supports 240 Gb/s (30 GB/s) of peak bandwidth for a total aggregated bandwidth of 720 Gb/s (90 GB/s) in a 2-way symmetric multiprocessing configuration.

Kunpeng 920 2smp.svg

With all three links, there is also support for 4-way SMP. In this configuration, one link from each socket is connected to another socket for an all-for-all connection.


Kunpeng 920 4smp.svg

Chipset[edit]

Along with the Hi1620 SoC, HiSilicon developed a number of integrated circuits as part of the chipset platform.

Chip Description
Hi1620 CPU, Kunpeng 920 series Chip
Hi1503 CPU interconnect chip, supports scaling-up to 32 sockets
Hi1812 SSD storage controller, for read/write I/O acceleration
Hi1822 Network controller chip, DC high-speed flexible interconnect
Hi1710 BMC management chip + enhanced RAS features chip
hi1620 chipset.png

Die[edit]

  • TSMC 7 nm HPC
  • 20,000,000,000 transistors
    • 3-4 dies

All TaiShan v110 Chips[edit]

 List of TaiShan v110-based Processors
ModelLaunchedCoresArchFrequencyL3TDP
920-322626 April 201932TaiShan v1102.6 GHz
2,600 MHz
2,600,000 kHz
32 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
120 W
120,000 mW
0.161 hp
0.12 kW
920-482626 April 201948TaiShan v1102.6 GHz
2,600 MHz
2,600,000 kHz
48 MiB
49,152 KiB
50,331,648 B
0.0469 GiB
158 W
158,000 mW
0.212 hp
0.158 kW
920-64267 January 201964TaiShan v1102.6 GHz
2,600 MHz
2,600,000 kHz
64 MiB
65,536 KiB
67,108,864 B
0.0625 GiB
195 W
195,000 mW
0.261 hp
0.195 kW
Count: 3

Bibliography[edit]

  • Huawei. Personal Communication. 2019
  • Huawei Connect 2018. October 2018
  • HiSilicon Event. January 7, 2019
  • Huawei, Supercomputing 2018
codenameTaiShan v110 +
core count32 +, 48 + and 64 +
designerHiSilicon +
first launched2019 +
full page namehisilicon/microarchitectures/taishan v110 +
instance ofmicroarchitecture +
instruction set architectureARMv8.2-A +
manufacturerTSMC +
microarchitecture typeCPU +
nameTaiShan v110 +
process7 nm (0.007 μm, 7.0e-6 mm) +