From WikiChip
Difference between revisions of "zhaoxin/microarchitectures/wudaokou"
< zhaoxin

(Octa-core die)
Line 166: Line 166:
 
* 2,100,000,000 transistors
 
* 2,100,000,000 transistors
  
: [[File:wudaokou floorplan.png|600px]]
+
: [[File:wudaokou die shot.png|800px]]
  
: [[File:wudaokou floorplan (annotated).png|600px]]
+
: [[File:wudaokou die shot (annotated).png|800px]]
  
 
== All WuDaoKou Processors ==
 
== All WuDaoKou Processors ==

Revision as of 11:21, 2 February 2018

Edit Values
WuDaoKou µarch
General Info
Arch TypeCPU
DesignerZhaoxin
ManufacturerHLMC
IntroductionDecember 28, 2017
Process28 nm
Core Configs2, 4, 8
Pipeline
TypeSuperscalar
OoOEYes
SpeculativeYes
Reg RenamingYes
Stages18
Instructions
ISAx86-64
ExtensionsMMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AES, RDRND, BMI, BMI2, TXT, RDSEED
Succession

WuDaoKou is the successor to Zhangjiang, a 28 nm x86 microarchitecture designed by Zhaoxin for mainstream laptops, desktops, and servers.

Etymology

WuDaoKou is named after the Wudaokou Station of the Beijing Subway in China.

Brands

Family Series Description
KaiXian KX (5000) Desktop, Laptops
KaisHeng KH (20000) Storage, Servers
hk-20000.png 
kx-5000.png

Release Dates

zhaoxin roadmap (2017).png

Development for WuDaoKou started in August 2013. The basic architecture design was completed by June 2014 with basic design done in July 2015. WuDaoKou hardware implementation was completed in April 2016 and taped out in August 2016. Final verification was done in October 2016 and mass production started in October 2017. The KX-5000 (formerly ZX-D) was announced at Semicon China 2017. The architecture and SKUs were officially unveiled at a conference on December 28, 2018.

wudaokou timeline.png

WuDaoKou is said to be a result of 9,000 engineering months. Development data exceeded 200 TB with 4,000 cores being used for simulations with ten hardware emulators used for verification simulating a total of 150 billion instructions testing more than 300 different kinds of software, testing the CPU, GPU, memory controller, and bus.

Process Technology

WuDaoKou is manufactured on HLMC's 28 nm process.

Architecture

Key changes from Zhangjiang

WuDaoKou performance.png
  • 25% higher IPC
  • 140% higher performance in multi-threaded workloads
  • 8 cores per die (up from 4)
  • SoC design
  • Core
    • Improved OoOE algorithm
    • Pipeline was reduced by 5 stages
    • Execution engines were re-balanced
    • Branch prediction unit was reworked and optimized
  • FSB removed
  • Chipset
    • Gigabit Ethernet port (RGMII)
    • USB 3.1 Gen2 (Type-C) ports
    • SATA 3.0 ports
  • Formal OS certification
    • Windows Hardware Quality Labs (WHQL) certification
      • Windows 7/10

This list is incomplete; you can help by expanding it.

Block Diagram

New text document.svg This section is empty; you can help add the missing info by editing this page.

Memory Hierarchy

  • Cache
    • L1D Cache
      • 32 KiB, 8-way set associative
      • Per core
    • L1I Cache
      • 32 KiB, 8-way set associative
      • Per core
    • L2 Cache
      • 4 MiB, 32-way set associative
      • Per quad-core cluster
  • System DRAM
    • 2 Channels
    • DDR4, Up to 2400 MT/s

Overview

New text document.svg This section is empty; you can help add the missing info by editing this page.

Core

Pipeline

WuDaoKou features an 18-stage pipeline with a 15 cycle misprediction penalty.

wudaokou pipeline.svg

Graphics

The exact architecture of the GPU has not been disclosed but there is some evidence that suggest they may be using a S3 Graphics IP (originally owned by VIA Technologies as well but has since been purchased by HTC.) The GPU supports up to three displays using HDMI 1.4b, DisplayPort 1.2a, Embedded DisplayPort 1.3, and VGA. The GPU supports DirectX 11.1 and up to 4K resolution.

Sockets/Platform

zhaoxin zx-200 chipset.png

All parts use a HFCBGA 37.5×37.5 mm package and are effectively a system on a chip. However, for the most part, those parts get paired with a chipset which serves as an I/O extension chip. The chipset communicates with the microprocessor over standard PCIe 3.0 x4 lanes.

Chipset
Chipset TDP PCIe SATA USB Network Process Package
2.0 3.0 3.0 2.0 3.1 Gen 1 3.1 Gen 2
ZX-200 6 W 9 lanes - 4 6 3 2 10/100M/1 Gbps 40 nm FCBGA (21mm x 21mm)

zx-200 slide.png

Die

wudaokou floorplan at conference.png

Core

wudaokou core.png
wudaokou core (annotated).png

Octa-core die

wudaokou die shot.png
wudaokou die shot (annotated).png

All WuDaoKou Processors

 List of WuDaoKou-based Processors
 Main processor
ModelFamilyLaunchedCoresL2FrequencyMax MemoryECC
KH-25800KaisHeng28 December 201788 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
1.8 GHz
1,800 MHz
1,800,000 kHz
128 GiB
131,072 MiB
134,217,728 KiB
137,438,953,472 B
0.125 TiB
KH-26800KaisHeng28 December 201788 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
2 GHz
2,000 MHz
2,000,000 kHz
128 GiB
131,072 MiB
134,217,728 KiB
137,438,953,472 B
0.125 TiB
KX-5540KaiXian28 December 201744 MiB
4,096 KiB
4,194,304 B
0.00391 GiB
1.8 GHz
1,800 MHz
1,800,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
KX-5640KaiXian28 December 201744 MiB
4,096 KiB
4,194,304 B
0.00391 GiB
2 GHz
2,000 MHz
2,000,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
KX-U5580KaiXian28 December 201788 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
1.8 GHz
1,800 MHz
1,800,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
KX-U5580MKaiXian28 December 201788 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
1.8 GHz
1,800 MHz
1,800,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
KX-U5680KaiXian28 December 201788 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
2 GHz
2,000 MHz
2,000,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
Count: 7

Documents

References

codenameWuDaoKou +
core count2 +, 4 + and 8 +
designerZhaoxin +
first launchedDecember 28, 2017 +
full page namezhaoxin/microarchitectures/wudaokou +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerHLMC +
microarchitecture typeCPU +
nameWuDaoKou +
pipeline stages18 +
process28 nm (0.028 μm, 2.8e-5 mm) +