(→Key changes from {{\\|Zhangjiang}}) |
(→Memory Hierarchy) |
||
(53 intermediate revisions by 3 users not shown) | |||
Line 5: | Line 5: | ||
|designer=Zhaoxin | |designer=Zhaoxin | ||
|manufacturer=HLMC | |manufacturer=HLMC | ||
+ | |manufacturer 2=SMIC | ||
|introduction=December 28, 2017 | |introduction=December 28, 2017 | ||
|process=28 nm | |process=28 nm | ||
Line 14: | Line 15: | ||
|speculative=Yes | |speculative=Yes | ||
|renaming=Yes | |renaming=Yes | ||
+ | |stages=18 | ||
|isa=x86-64 | |isa=x86-64 | ||
+ | |feature=SM3 | ||
+ | |feature 2=SM4 | ||
+ | |extension=MMX | ||
+ | |extension 2=SSE | ||
+ | |extension 3=SSE2 | ||
+ | |extension 4=SSE3 | ||
+ | |extension 5=SSSE3 | ||
+ | |extension 6=SSE4.1 | ||
+ | |extension 7=SSE4.2 | ||
+ | |extension 8=AVX | ||
+ | |extension 9=AVX2 | ||
+ | |extension 10=AES | ||
+ | |extension 11=RDRND | ||
+ | |extension 12=BMI | ||
+ | |extension 13=BMI2 | ||
+ | |extension 14=TXT | ||
+ | |extension 15=RDSEED | ||
+ | |l1i=32 KiB | ||
+ | |l1i per=core | ||
+ | |l1i desc=8-way set associative | ||
+ | |l1d=32 KiB | ||
+ | |l1d per=core | ||
+ | |l1d desc=8-way set associative | ||
+ | |l2=4 MiB | ||
+ | |l2 per=cluster | ||
+ | |l2 desc=8-way set associative | ||
|predecessor=Zhangjiang | |predecessor=Zhangjiang | ||
|predecessor link=zhaoxin/microarchitectures/zhangjiang | |predecessor link=zhaoxin/microarchitectures/zhangjiang | ||
Line 21: | Line 49: | ||
}} | }} | ||
'''WuDaoKou''' is the successor to {{\\|Zhangjiang}}, a [[28 nm]] [[x86]] microarchitecture designed by [[Zhaoxin]] for mainstream laptops, desktops, and servers. | '''WuDaoKou''' is the successor to {{\\|Zhangjiang}}, a [[28 nm]] [[x86]] microarchitecture designed by [[Zhaoxin]] for mainstream laptops, desktops, and servers. | ||
+ | |||
+ | == Etymology == | ||
+ | WuDaoKou is named after the [[wikipedia:Wudaokou Station|Wudaokou Station]] of the Beijing Subway in China. | ||
== Brands == | == Brands == | ||
Line 28: | Line 59: | ||
| {{zhaoxin|KaiXian}} || KX (5000) || Desktop, Laptops | | {{zhaoxin|KaiXian}} || KX (5000) || Desktop, Laptops | ||
|- | |- | ||
− | | {{zhaoxin| | + | | {{zhaoxin|KaisHeng}} || KH (20000) || Storage, Servers |
|} | |} | ||
+ | |||
+ | <div> | ||
+ | <div style="float:left;">[[File:hk-20000.png|450px]] </div> | ||
+ | <div style="float:left;">[[File:kx-5000.png|450px]]</div> | ||
+ | </div> | ||
+ | |||
+ | {{clear}} | ||
+ | |||
+ | == Release Dates == | ||
+ | [[File:zhaoxin roadmap (2017).png|right|400px]] | ||
+ | Development for WuDaoKou started in August 2013. The basic architecture design was completed by June 2014 with basic design done in July 2015. WuDaoKou hardware implementation was completed in April 2016 and [[taped out]] in August 2016. Final verification was done in October 2016 and mass production started in October 2017. The KX-5000 (formerly ZX-D) was announced at Semicon China 2017. The architecture and SKUs were officially unveiled at a conference on December 28, 2017. | ||
+ | |||
+ | [[File:wudaokou timeline.png|500px]] | ||
+ | |||
+ | WuDaoKou is said to be a result of 9,000 engineering months. Development data exceeded 200 TB with 4,000 cores being used for simulations with ten hardware emulators used for verification simulating a total of 150 billion instructions testing more than 300 different kinds of software, testing the CPU, GPU, memory controller, and bus. | ||
+ | |||
+ | {{clear}} | ||
== Process Technology == | == Process Technology == | ||
Line 37: | Line 85: | ||
=== Key changes from {{\\|Zhangjiang}} === | === Key changes from {{\\|Zhangjiang}} === | ||
+ | [[File:WuDaoKou performance.png|right|450px]] | ||
+ | * 25% higher [[IPC]] | ||
+ | * 140% higher performance in multi-threaded workloads | ||
* [[8 cores]] per die (up from 4) | * [[8 cores]] per die (up from 4) | ||
* SoC design | * SoC design | ||
Line 43: | Line 94: | ||
*** [[PCIe]] 3.0 (from 2.0) | *** [[PCIe]] 3.0 (from 2.0) | ||
*** [[DDR4]] (From [[DDR3]]) | *** [[DDR4]] (From [[DDR3]]) | ||
− | ** New integrated graphics processor | + | ** New [[integrated graphics processor]] |
*** HD Audio Output/Codec | *** HD Audio Output/Codec | ||
+ | *** DirectX 11.1 | ||
+ | *** Up to 3 displays | ||
+ | **** DP (1.2a) / eDP (1.3) / HDMI (1.4b) / VGA | ||
* Core | * Core | ||
+ | ** Improved OoOE algorithm | ||
** Pipeline was reduced by 5 stages | ** Pipeline was reduced by 5 stages | ||
** Execution engines were re-balanced | ** Execution engines were re-balanced | ||
Line 55: | Line 110: | ||
** USB 3.1 Gen2 (Type-C) ports | ** USB 3.1 Gen2 (Type-C) ports | ||
** SATA 3.0 ports | ** SATA 3.0 ports | ||
+ | * Formal OS certification | ||
+ | ** Windows Hardware Quality Labs (WHQL) certification | ||
+ | *** Windows 7/10 | ||
{{expand list}} | {{expand list}} | ||
+ | |||
+ | === Block Diagram === | ||
+ | :[[File:wudaokou soc block diagram.svg|550px]] | ||
+ | |||
+ | === Memory Hierarchy === | ||
+ | * Cache | ||
+ | ** L1D Cache | ||
+ | *** 32 KiB, 8-way set associative | ||
+ | *** Per core | ||
+ | ** L1I Cache | ||
+ | *** 32 KiB, 8-way set associative | ||
+ | *** Per core | ||
+ | ** L2 Cache | ||
+ | *** 4/8 MiB, 16/32-way set associative | ||
+ | *** Per quad-core cluster | ||
+ | * System DRAM | ||
+ | ** 2 Channels | ||
+ | ** DDR4, Up to 2400 MT/s | ||
+ | |||
+ | == Overview == | ||
+ | [[File:wudaokou overview.svg|right|350px]] | ||
+ | WuDaoKou is largely a brand new architecture designed by Zhaoxin. This is a departure from earlier microarchitectures such as {{\\|ZhangJiang}} which were a lightly modified version of [[VIA Technologies]] ([[Centaur Technology|Centaur]]) architecture. WuDaoKou is a new and complete [[SoC]] design. Whereas prior processors had separate [[dies]] connected together over the legacy [[front-side bus]], the new design is a single-die [[system-on-a-chip]] design that features [[8 cores|8]] integrated [[x86]] cores consisting of two clusters of four cores each connected over a new point-to-point crossbar, improving the internal bandwidth and latency considerably. The new chip also integrated the memory controller and the rest of the [[north-bridge]] on-die as well which further improved latency, bandwidth, and performance. The new chip also has an [[integrated graphics processor]] supporting 4K resolution and up to three screens via an array of display ports. | ||
+ | |||
+ | Overall, [[Zhaoxin]] has reported the new microarchitecture to have 25% improvement in [[IPC]], 140% improvement in multi-core workloads, and 120% higher memory access bandwidth. | ||
+ | |||
+ | === Uncore === | ||
+ | WuDaoKou features a new point-to-point high-speed interconnect [[crossbar]] which replaces the [[front-side bus]] from prior architectures. The new crossbar reduces the latency and provides facilities for control flow and cache coherency. Going through the crossbar is also the newly integrated graphics processor as well the memory controller. The new memory controller now supports up to dual-channel [[DDR4]] with data rates of up to 2400 MT/s (although current SKUs only seem to support up to 2133 MT/s). [[Zhaoxin]] has stated that this is the first domestic CPU to have a dual-channel DDR4 memory controller. | ||
+ | |||
+ | == Core == | ||
+ | === Pipeline === | ||
+ | WuDaoKou features an 18-stage pipeline with a 15 cycle misprediction penalty. | ||
+ | :[[File:wudaokou pipeline.svg|800px]] | ||
+ | |||
+ | == Graphics == | ||
+ | The exact architecture of the [[GPU]] has not been disclosed but there is some evidence that suggest they may be using a [[S3 Graphics]] IP (originally owned by [[VIA Technologies]] as well but has since been purchased by HTC.) The GPU supports up to three displays using [[HDMI]] 1.4b, [[DisplayPort]] 1.2a, [[Embedded DisplayPort]] 1.3, and [[VGA]]. The GPU supports DirectX 11.1 and up to [[4K]] resolution. | ||
+ | |||
+ | == Sockets/Platform == | ||
+ | [[File:zhaoxin zx-200 chipset.png|right|200px]] | ||
+ | All parts use a HFCBGA 37.5×37.5 mm package and are effectively a [[system on a chip]]. However, for the most part, those parts get paired with a chipset which serves as an I/O extension chip. The chipset communicates with the microprocessor over standard PCIe 3.0 x4 lanes. | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! colspan="11" | Chipset | ||
+ | |- | ||
+ | ! rowspan="2" | Chipset !! rowspan="2" | TDP !! colspan="2" | PCIe || SATA || colspan="3" | USB || rowspan="2" | Network || rowspan="2" | Process || rowspan="2" | Package | ||
+ | |- | ||
+ | ! 2.0 !! 3.0 || 3.0 || 2.0 || 3.1 Gen 1 || 3.1 Gen 2 | ||
+ | |- | ||
+ | | ZX-200 || 6 W || 9 lanes || - || 4 || 6 || 3 || 2 || 10/100M/1 Gbps || [[40 nm]] || FCBGA (21mm x 21mm) | ||
+ | |} | ||
+ | |||
+ | [[File:zx-200 slide.png|400px]] | ||
+ | |||
+ | {{clear}} | ||
+ | |||
+ | == Die == | ||
+ | [[File:wudaokou floorplan at conference.png|right|250px]] | ||
+ | |||
+ | === Core module === | ||
+ | : [[File:wudaokou core.png|500px]] | ||
+ | |||
+ | |||
+ | : [[File:wudaokou core (annotated).png|500px]] | ||
+ | |||
+ | === Octa-core die === | ||
+ | * [[HLMC]] [[28 nm process]] | ||
+ | * 187 mm² die size | ||
+ | * 2,100,000,000 transistors | ||
+ | |||
+ | : [[File:wudaokou die shot.png|class=wikichip_ogimage|650px]] | ||
+ | |||
+ | |||
+ | : [[File:wudaokou die shot (annotated).png|650px]] | ||
+ | |||
+ | == All WuDaoKou Processors == | ||
+ | <!-- NOTE: | ||
+ | This table is generated automatically from the data in the actual articles. | ||
+ | If a microprocessor is missing from the list, an appropriate article for it needs to be | ||
+ | created and tagged accordingly. | ||
+ | |||
+ | Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips | ||
+ | --> | ||
+ | {{comp table start}} | ||
+ | <table class="comptable sortable tc4 tc8"> | ||
+ | {{comp table header|main|7:List of WuDaoKou-based Processors}} | ||
+ | {{comp table header|main|7:Main processor}} | ||
+ | {{comp table header|cols|Family|Launched|Cores|L2|%Frequency|Max Memory|ECC}} | ||
+ | {{#ask: [[Category:microprocessor models by zhaoxin]] [[microarchitecture::WuDaoKou]] | ||
+ | |?full page name | ||
+ | |?model number | ||
+ | |?family | ||
+ | |?first launched | ||
+ | |?core count | ||
+ | |?l2$ size | ||
+ | |?base frequency#GHz | ||
+ | |?max memory#GiB | ||
+ | |?has ecc memory support | ||
+ | |format=template | ||
+ | |template=proc table 3 | ||
+ | |userparam=9:9 | ||
+ | |mainlabel=- | ||
+ | }} | ||
+ | {{comp table count|ask=[[Category:microprocessor models by zhaoxin]] [[microarchitecture::WuDaoKou]]}} | ||
+ | </table> | ||
+ | {{comp table end}} | ||
+ | |||
+ | == Documents == | ||
+ | * [[:File:wudaokou.pdf|WuDaoKou]] | ||
+ | |||
+ | == References == | ||
+ | * Information was obtained directly from Zhaoxin | ||
+ | * [https://fuse.wikichip.org/news/733/zhaoxin-launches-their-highest-performance-chinese-x86-chips/ Zhaoxin launches their highest-performance Chinese x86 chips] |
Latest revision as of 12:02, 17 June 2019
Edit Values | |
WuDaoKou µarch | |
General Info | |
Arch Type | CPU |
Designer | Zhaoxin |
Manufacturer | HLMC, SMIC |
Introduction | December 28, 2017 |
Process | 28 nm |
Core Configs | 2, 4, 8 |
Pipeline | |
Type | Superscalar |
OoOE | Yes |
Speculative | Yes |
Reg Renaming | Yes |
Stages | 18 |
Instructions | |
ISA | x86-64 |
Extensions | MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AES, RDRND, BMI, BMI2, TXT, RDSEED |
Cache | |
L1I Cache | 32 KiB/core 8-way set associative |
L1D Cache | 32 KiB/core 8-way set associative |
L2 Cache | 4 MiB/cluster 8-way set associative |
Succession | |
WuDaoKou is the successor to Zhangjiang, a 28 nm x86 microarchitecture designed by Zhaoxin for mainstream laptops, desktops, and servers.
Contents
Etymology[edit]
WuDaoKou is named after the Wudaokou Station of the Beijing Subway in China.
Brands[edit]
Family | Series | Description |
---|---|---|
KaiXian | KX (5000) | Desktop, Laptops |
KaisHeng | KH (20000) | Storage, Servers |
Release Dates[edit]
Development for WuDaoKou started in August 2013. The basic architecture design was completed by June 2014 with basic design done in July 2015. WuDaoKou hardware implementation was completed in April 2016 and taped out in August 2016. Final verification was done in October 2016 and mass production started in October 2017. The KX-5000 (formerly ZX-D) was announced at Semicon China 2017. The architecture and SKUs were officially unveiled at a conference on December 28, 2017.
WuDaoKou is said to be a result of 9,000 engineering months. Development data exceeded 200 TB with 4,000 cores being used for simulations with ten hardware emulators used for verification simulating a total of 150 billion instructions testing more than 300 different kinds of software, testing the CPU, GPU, memory controller, and bus.
Process Technology[edit]
WuDaoKou is manufactured on HLMC's 28 nm process.
Architecture[edit]
Key changes from Zhangjiang[edit]
- 25% higher IPC
- 140% higher performance in multi-threaded workloads
- 8 cores per die (up from 4)
- SoC design
- New Uncore
- northbridge moved on-die
- PCIe 3.0 (from 2.0)
- DDR4 (From DDR3)
- New integrated graphics processor
- HD Audio Output/Codec
- DirectX 11.1
- Up to 3 displays
- DP (1.2a) / eDP (1.3) / HDMI (1.4b) / VGA
- New Uncore
- Core
- Improved OoOE algorithm
- Pipeline was reduced by 5 stages
- Execution engines were re-balanced
- Branch prediction unit was reworked and optimized
- FSB removed
- x4 PCIe 3.0 communication with southbridge chipset
- Chipset
- Gigabit Ethernet port (RGMII)
- USB 3.1 Gen2 (Type-C) ports
- SATA 3.0 ports
- Formal OS certification
- Windows Hardware Quality Labs (WHQL) certification
- Windows 7/10
- Windows Hardware Quality Labs (WHQL) certification
This list is incomplete; you can help by expanding it.
Block Diagram[edit]
Memory Hierarchy[edit]
- Cache
- L1D Cache
- 32 KiB, 8-way set associative
- Per core
- L1I Cache
- 32 KiB, 8-way set associative
- Per core
- L2 Cache
- 4/8 MiB, 16/32-way set associative
- Per quad-core cluster
- L1D Cache
- System DRAM
- 2 Channels
- DDR4, Up to 2400 MT/s
Overview[edit]
WuDaoKou is largely a brand new architecture designed by Zhaoxin. This is a departure from earlier microarchitectures such as ZhangJiang which were a lightly modified version of VIA Technologies (Centaur) architecture. WuDaoKou is a new and complete SoC design. Whereas prior processors had separate dies connected together over the legacy front-side bus, the new design is a single-die system-on-a-chip design that features 8 integrated x86 cores consisting of two clusters of four cores each connected over a new point-to-point crossbar, improving the internal bandwidth and latency considerably. The new chip also integrated the memory controller and the rest of the north-bridge on-die as well which further improved latency, bandwidth, and performance. The new chip also has an integrated graphics processor supporting 4K resolution and up to three screens via an array of display ports.
Overall, Zhaoxin has reported the new microarchitecture to have 25% improvement in IPC, 140% improvement in multi-core workloads, and 120% higher memory access bandwidth.
Uncore[edit]
WuDaoKou features a new point-to-point high-speed interconnect crossbar which replaces the front-side bus from prior architectures. The new crossbar reduces the latency and provides facilities for control flow and cache coherency. Going through the crossbar is also the newly integrated graphics processor as well the memory controller. The new memory controller now supports up to dual-channel DDR4 with data rates of up to 2400 MT/s (although current SKUs only seem to support up to 2133 MT/s). Zhaoxin has stated that this is the first domestic CPU to have a dual-channel DDR4 memory controller.
Core[edit]
Pipeline[edit]
WuDaoKou features an 18-stage pipeline with a 15 cycle misprediction penalty.
Graphics[edit]
The exact architecture of the GPU has not been disclosed but there is some evidence that suggest they may be using a S3 Graphics IP (originally owned by VIA Technologies as well but has since been purchased by HTC.) The GPU supports up to three displays using HDMI 1.4b, DisplayPort 1.2a, Embedded DisplayPort 1.3, and VGA. The GPU supports DirectX 11.1 and up to 4K resolution.
Sockets/Platform[edit]
All parts use a HFCBGA 37.5×37.5 mm package and are effectively a system on a chip. However, for the most part, those parts get paired with a chipset which serves as an I/O extension chip. The chipset communicates with the microprocessor over standard PCIe 3.0 x4 lanes.
Chipset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Chipset | TDP | PCIe | SATA | USB | Network | Process | Package | |||
2.0 | 3.0 | 3.0 | 2.0 | 3.1 Gen 1 | 3.1 Gen 2 | |||||
ZX-200 | 6 W | 9 lanes | - | 4 | 6 | 3 | 2 | 10/100M/1 Gbps | 40 nm | FCBGA (21mm x 21mm) |
Die[edit]
Core module[edit]
Octa-core die[edit]
- HLMC 28 nm process
- 187 mm² die size
- 2,100,000,000 transistors
All WuDaoKou Processors[edit]
List of WuDaoKou-based Processors | |||||||
---|---|---|---|---|---|---|---|
Main processor | |||||||
Model | Family | Launched | Cores | L2 | Frequency | Max Memory | ECC |
KH-25800 | KaisHeng | 28 December 2017 | 8 | 8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB | 1.8 GHz 1,800 MHz 1,800,000 kHz | 128 GiB 131,072 MiB 134,217,728 KiB 137,438,953,472 B 0.125 TiB | ✔ |
KH-26800 | KaisHeng | 28 December 2017 | 8 | 8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB | 2 GHz 2,000 MHz 2,000,000 kHz | 128 GiB 131,072 MiB 134,217,728 KiB 137,438,953,472 B 0.125 TiB | ✔ |
KX-5540 | KaiXian | 28 December 2017 | 4 | 4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB | 1.8 GHz 1,800 MHz 1,800,000 kHz | 64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB | ✘ |
KX-5640 | KaiXian | 28 December 2017 | 4 | 4 MiB 4,096 KiB 4,194,304 B 0.00391 GiB | 2 GHz 2,000 MHz 2,000,000 kHz | 64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB | ✘ |
KX-U5580 | KaiXian | 28 December 2017 | 8 | 8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB | 1.8 GHz 1,800 MHz 1,800,000 kHz | 64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB | ✘ |
KX-U5580M | KaiXian | 28 December 2017 | 8 | 8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB | 1.8 GHz 1,800 MHz 1,800,000 kHz | 64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB | ✘ |
KX-U5680 | KaiXian | 28 December 2017 | 8 | 8 MiB 8,192 KiB 8,388,608 B 0.00781 GiB | 2 GHz 2,000 MHz 2,000,000 kHz | 64 GiB 65,536 MiB 67,108,864 KiB 68,719,476,736 B 0.0625 TiB | ✘ |
Count: 7 |
Documents[edit]
References[edit]
- Information was obtained directly from Zhaoxin
- Zhaoxin launches their highest-performance Chinese x86 chips
codename | WuDaoKou + |
core count | 2 +, 4 + and 8 + |
designer | Zhaoxin + |
first launched | December 28, 2017 + |
full page name | zhaoxin/microarchitectures/wudaokou + |
instance of | microarchitecture + |
instruction set architecture | x86-64 + |
manufacturer | HLMC + |
microarchitecture type | CPU + |
name | WuDaoKou + |
process | 28 nm (0.028 μm, 2.8e-5 mm) + |