|  (→Scalability) | |||
| Line 40: | Line 40: | ||
| == Architecture == | == Architecture == | ||
| + | [[File:hi1620 overview.png|500px|thumb|right|Overview]] | ||
| === Key changes from {{\\|TaiShan v100}} === | === Key changes from {{\\|TaiShan v100}} === | ||
| * [[TSMC]] [[7 nm|7 nm HPC process]] (from [[16 nm]]) | * [[TSMC]] [[7 nm|7 nm HPC process]] (from [[16 nm]]) | ||
| Line 75: | Line 76: | ||
| == Overview == | == Overview == | ||
| [[File:taishan v110 overview.svg|right|500px|thumb|Overview]] | [[File:taishan v110 overview.svg|right|500px|thumb|Overview]] | ||
| − | Though HiSilicon has a history of designing Arm processors. The TaiShan v110 core is HiSilicons' first custom homegrown high-performance [[ARM]] core and SoC design. The chip, which incorporates multiple compute dies and an I/O is a multi-chip package, is fabricated on [[TSMC]]'s [[7 nm process]] and integrates up to 64 cores and up to 64 MiB of [[last level cache]]. | + | Though HiSilicon has a history of designing Arm processors. The TaiShan v110 core is HiSilicons' first custom homegrown high-performance [[ARM]] core and SoC design. The chip, which incorporates multiple compute dies and an I/O is a multi-chip package, is fabricated on [[TSMC]]'s [[7 nm|7-nanometers HPC process]] and integrates up to 64 cores and up to 64 MiB of [[last level cache]]. | 
| + | |||
| + | The SoC also incorporates a number of [[hardware accelerators]]. There is a crypto engine that supports AES, DES/3DES, MD5, SHA1, SHA2, HMAC, CMAC with throughputs of up to 100 Gbit/s. Additionally, there is also a compression engine supporting GZIP, LZS, LZ4 with compression throughputs of up to 40 Gbit/s and decompression of p to 100 Gbit/s. | ||
| Marketed as the Kunpeng 920, this SoC supports up to 4-way multiprocessing support through HiSilicon's Hydra interface. In order to keep the cores fed, eight [[DDR4]] [[memory channels]] are incorporated per socket. Additionally, designed to facilitate an easy [[accelerator]] platform, there are 40 PCIe Gen 4 lanes provided per socket with [[CCIX]] support, enabling cache coherency. | Marketed as the Kunpeng 920, this SoC supports up to 4-way multiprocessing support through HiSilicon's Hydra interface. In order to keep the cores fed, eight [[DDR4]] [[memory channels]] are incorporated per socket. Additionally, designed to facilitate an easy [[accelerator]] platform, there are 40 PCIe Gen 4 lanes provided per socket with [[CCIX]] support, enabling cache coherency. | ||
| == Core == | == Core == | ||
| − | {{ | + | Each core is a 4-way out-of-order superscalar that implements the [[ARMv8.2]]-A ISA. Huawei stated that the core supports almost all the [[ARMv8.4]] features with a few exceptions, including dot product and the FP16 FML extension. It features private 64 KiB L1 instruction and data caches as well as 512 KiB of private L2. Though light on details, Huawei says that compared to Arm's {{armh|Cortex}} cores, their core features an improved memory subsystem, a larger number of execution units, and a better branch predictor. | 
| + | |||
| + | === ASIMD === | ||
| + | Each core features a single 128-bit {{arm|NEON}} unit. It is capable of executing single double-precision FMA vector instruction per cycle or two single-precision vector instructions per cycle. Operating at 2 GHz, a 64-core chip will have a peak compute of 512 GigaFLOPS of [[double-precision floating point]]. | ||
| == MCP physical design == | == MCP physical design == | ||
| Line 97: | Line 103: | ||
| :[[File:Kunpeng 920 4smp.svg|600px]] | :[[File:Kunpeng 920 4smp.svg|600px]] | ||
| + | |||
| + | == Chipset == | ||
| == Die == | == Die == | ||
Revision as of 21:14, 5 May 2019
| Edit Values | |
| TaiShan v110 µarch | |
| General Info | |
| Arch Type | CPU | 
| Designer | HiSilicon | 
| Manufacturer | TSMC | 
| Introduction | 2019 | 
| Process | 7 nm | 
| Core Configs | 32, 48, 64 | 
| Pipeline | |
| Type | Superscalar, Superpipeline | 
| OoOE | Yes | 
| Speculative | Yes | 
| Reg Renaming | Yes | 
| Decode | 4-way | 
| Instructions | |
| ISA | ARMv8.2-A | 
| Extensions | NEON | 
| Cache | |
| L1I Cache | 64 KiB/core | 
| L1D Cache | 64 KiB/core | 
| L2 Cache | 512 KiB/core | 
| L3 Cache | 1 MiB/core | 
| Succession | |
TaiShan v110 is the successor to the TaiShan v100, a high-performance ARM server microarchitecture designed by HiSilicon for Huawei's own TaiShan servers.
Contents
Brands
TaiShan-based CPUs are branded as the Kunpeng 920 series.
Release Dates
Kunpeng 920 CPUs were officially launched in early 2019.
Architecture
Key changes from TaiShan v100
- TSMC 7 nm HPC process (from 16 nm)
-  2x core count (64, up from 32)
- Custom cores (from Cortex-A72)
 
-  Memory
- 2x memory channels (8, up from 4)
 
-  I/O
- PCIe Gen 4 (from Gen 3)
 
This list is incomplete; you can help by expanding it.
Block Diagram
Entire Chip
Memory Hierarchy
-  Cache
-  L1I Cache
- 64 KiB/core, private
 
-  L1D Cache
- 64 KiB/core, private
 
-  L2 Cache
- 512 KiB/core, private
 
-  L3 Cache
- 1 MiB/core
- Shared by all cores
 
-  System DRAM
- 1 TiB Max Memory / socket
- 8 Channels
-  DDR4, up to 2933 MT/s
- 1 DPC and 2 DPC support
 
- 8 B/cycle/channel (@ memory clock)
- ECC, SDDC, DDDC
 
 
-  L1I Cache
Overview
Though HiSilicon has a history of designing Arm processors. The TaiShan v110 core is HiSilicons' first custom homegrown high-performance ARM core and SoC design. The chip, which incorporates multiple compute dies and an I/O is a multi-chip package, is fabricated on TSMC's 7-nanometers HPC process and integrates up to 64 cores and up to 64 MiB of last level cache.
The SoC also incorporates a number of hardware accelerators. There is a crypto engine that supports AES, DES/3DES, MD5, SHA1, SHA2, HMAC, CMAC with throughputs of up to 100 Gbit/s. Additionally, there is also a compression engine supporting GZIP, LZS, LZ4 with compression throughputs of up to 40 Gbit/s and decompression of p to 100 Gbit/s.
Marketed as the Kunpeng 920, this SoC supports up to 4-way multiprocessing support through HiSilicon's Hydra interface. In order to keep the cores fed, eight DDR4 memory channels are incorporated per socket. Additionally, designed to facilitate an easy accelerator platform, there are 40 PCIe Gen 4 lanes provided per socket with CCIX support, enabling cache coherency.
Core
Each core is a 4-way out-of-order superscalar that implements the ARMv8.2-A ISA. Huawei stated that the core supports almost all the ARMv8.4 features with a few exceptions, including dot product and the FP16 FML extension. It features private 64 KiB L1 instruction and data caches as well as 512 KiB of private L2. Though light on details, Huawei says that compared to Arm's Cortex cores, their core features an improved memory subsystem, a larger number of execution units, and a better branch predictor.
ASIMD
Each core features a single 128-bit NEON unit. It is capable of executing single double-precision FMA vector instruction per cycle or two single-precision vector instructions per cycle. Operating at 2 GHz, a 64-core chip will have a peak compute of 512 GigaFLOPS of double-precision floating point.
MCP physical design
The SoC itself comprises 3 dies - two Super CPU Cluster (SCCL) compute dies and a Super IO Cluster (SICL). The SCCL compute dies contains 8 CPU Clusters (CCLs), memory controllers, and the L3 cache block. There are eight CCLs on each of the SICL dies for a total of 64 cores. The CCLs are TaiShan V110 quadplex along with the L3 cache tags partition. The Super IO Clusters include the various I/O peripherals including PCIe Gen 4, SAS, the network interface controllers, and the Hydra links.
Scalability
- See also: Hydra Interface
Each chip incorporates three Hydra interface ports. The Hydra interface facilitates the cache coherency between the dies on the chip. Every link supports 240 Gb/s (30 GB/s) of peak bandwidth for a total aggregated bandwidth of 720 Gb/s (90 GB/s) in a 2-way symmetric multiprocessing configuration.
With all three links, there is also support for 4-way SMP. In this configuration, one link from each socket is connected to another socket for an all-for-all connection.
Chipset
Die
- TSMC 7 nm HPC
-  20,000,000,000 transistors
- 3-4 dies
 
All TaiShan Chips
|   | This section is empty; you can help add the missing info by editing this page. | 
Bibliography
- Huawei. Personal Communication. 2019
- Huawei Connect 2018. October 2018
- HiSilicon Event. January 7, 2019
| codename | TaiShan v110 + | 
| core count | 32 +, 48 + and 64 + | 
| designer | HiSilicon + | 
| first launched | 2019 + | 
| full page name | hisilicon/microarchitectures/taishan v110 + | 
| instance of | microarchitecture + | 
| instruction set architecture | ARMv8.2-A + | 
| manufacturer | TSMC + | 
| microarchitecture type | CPU + | 
| name | TaiShan v110 + | 
| process | 7 nm (0.007 μm, 7.0e-6 mm) + | 





