From WikiChip
Cascade Lake - Microarchitectures - Intel
< intel‎ | microarchitectures

Edit Values
Cascade Lake µarch
General Info
Arch TypeCPU
DesignerIntel
ManufacturerIntel
Introduction2018
Process14 nm
Pipeline
TypeSuperscalar
OoOEYes
SpeculativeYes
Reg RenamingYes
Stages14-19
Instructions
ISAx86-64
ExtensionsMOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA3, F16C, BMI, BMI2, VT-x, VT-d, TXT, TSX, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVE, SGX, MPX, AVX-512
Cache
L1I Cache32 KiB/core
8-way set associative
L1D Cache32 KiB/core
8-way set associative
L2 Cache1 MiB/core
16-way set associative
L3 Cache1.375 MiB/core
11-way set associative
Cores
Core NamesCascade Lake X,
Cascade Lake SP,
Cascade Lake AP
Succession
Contemporary
Coffee Lake

Cascade Lake (CSL/CLX) is Intel's successor to Skylake, a 14 nm microarchitecture for enthusiasts and servers. Cascade Lake is the "Optimization" phase as part of Intel's PAO model.

For desktop enthusiasts, Cascade Lake is branded Core i7, and Core i9 processors (under the Core X series). For scalable server class processors, Intel branded it as Xeon Bronze, Xeon Silver, Xeon Gold, and Xeon Platinum.

Codenames

Core Abbrev Target
Cascade Lake X CSL-X High-end desktops & enthusiasts market
Cascade Lake W CSL-W Enterprise/Business workstations
Cascade Lake SP CSL-SP Server Scalable Processors
Cascade Lake AP CSL-AP Server Advanced Processors

Brands

Cascade Lake is sold under eight different families.

Symbol version future.svg Preliminary Data! Information presented in this article deal with future products, data, features, and specifications that have yet to be finalized, announced, or released. Information may be incomplete and can change by final release.
Logo Family General Description Differentiating Features
Cores HT AVX AVX2 AVX-512 TBT ECC
core i7 logo (2015).png Core i7 Enthusiasts/High Performance (X)  ?-?
core i9x logo.png Core i9 Enthusiasts/High Performance  ?-?
Logo Family General Description Differentiating Features
Cores HT TBT AVX-512 AVX-512 Units UPI links Scalability
xeon logo (2015).png Xeon D Dense servers / edge computing  ?-? 1
xeon logo (2015).png Xeon W Business workstations  ?-? 2
xeon bronze (2017).png Xeon Bronze Entry-level performance /
Cost-sensitive
 ?-? 1 2 Up to 2
xeon silver (2017).png Xeon Silver Mid-range performance /
Efficient lower power
 ?-? 1 2 Up to 2
xeon gold (2017).png Xeon Gold 5000 High performance  ?-? 1 2 Up to 4
Xeon Gold 6000 Higher performance  ?-? 2 3 Up to 4
xeon platinum (2017).png Xeon Platinum Highest performance / flexibility  ?-? 2 3 Up to 8

Release Dates

Cascade Lake is expected to be released in late 2018.

Process Technology

Cascade Lake is fabricated on Intel's enhanced 14 nm process.

Architecture

As with Skylake, Cascade Lake is also based on the Purley platform and is designed as a drop-in upgrade.

Key changes from Skylake

  • System Architecture
  • Core
  • Integrated Memory Controller
  • Memory
    • Support for up to 3 TiB per socket (up from 1.5 TiB)

New instructions

Cascade Lake introduced a number of new instructions:

Block Diagram

Entire SoC Overview

LCC SoC
skylake sp lcc block diagram.svg
HCC SoC
skylake sp hcc block diagram.svg
XCC SoC
skylake sp xcc block diagram.svg
Individual Core

The high-level core architecture is identical to that of Skylake.

skylake server block diagram.svg

Memory Hierarchy

The memory hierarchy of Cascade Lake is identical to that of Skylake's.

  • Cache
    • L0 µOP cache:
      • 1,536 µOPs/core, 8-way set associative
        • 32 sets, 6-µOP line size
        • statically divided between threads, inclusive with L1I
    • L1I Cache:
      • 32 KiB/core, 8-way set associative
        • 64 sets, 64 B line size
        • competitively shared by the threads/core
    • L1D Cache:
      • 32 KiB/core, 8-way set associative
      • 64 sets, 64 B line size
      • competitively shared by threads/core
      • 4 cycles for fastest load-to-use (simple pointer accesses)
        • 5 cycles for complex addresses
      • 128 B/cycle load bandwidth
      • 64 B/cycle store bandwidth
      • Write-back policy
    • L2 Cache:
      • 1 MiB/core, 16-way set associative
      • 64 B line size
      • Inclusive
      • 64 B/cycle bandwidth to L1$
      • Write-back policy
      • 14 cycles latency
    • L3 Cache:
      • 1.375 MiB/core, 11-way set associative, shared across all cores
        • Note that a few models have non-default cache sizes due to disabled cores
      • 2,048 sets, 64 B line size
      • Non-inclusive victim cache
      • Write-back policy
      • 50-70 cycles latency
    • Snoop Filter (SF):
      • 2,048 sets, 12-way set associative
  • DRAM
    • 6 channels of DDR4, up to 2666 MT/s
      • RDIMM and LRDIMM
      • bandwidth of 21.33 GB/s
      • aggregated bandwidth of 128 GB/s

Cascade Lake TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally there is a unified L2 TLB (STLB).

  • TLBs:
    • ITLB
      • 4 KiB page translations:
        • 128 entries; 8-way set associative
        • dynamic partitioning
      • 2 MiB / 4 MiB page translations:
        • 8 entries per thread; fully associative
        • Duplicated for each thread
    • DTLB
      • 4 KiB page translations:
        • 64 entries; 4-way set associative
        • fixed partition
      • 2 MiB / 4 MiB page translations:
        • 32 entries; 4-way set associative
        • fixed partition
      • 1G page translations:
        • 4 entries; fully associative
        • fixed partition
    • STLB
      • 4 KiB + 2 MiB page translations:
        • 1536 entries; 12-way set associative
        • fixed partition
      • 1 GiB page translations:
        • 16 entries; 4-way set associative
        • fixed partition

Overview

skylake server overview.svg

Cascade Lake is Intel's direct successor to the Skylake server microarchitecture. It is designed to be compatible with the Skylake parts (LGA-3647) and utilize the Purley platform. To that end, Cascade Lake shares the same socket and pinout as well as the same core count, cache size, and I/O capabilities.

Cascade Lake introduces initial in-hardware Spectre and Meltdown mitigation, including Variant 2, 3, and L1TF. Chips are fabricated on an enhanced 14 nm process which allows Intel to extract an additional power efficiency, allowing them to clock those processors higher. Intel noted that targeted performance improvements were applied to some of the critical paths to make this possible. Although the core architecture is largely identical to that of Skylake, Cascade Lake introduces support for AVX512VNNI which is designed to improve the performance of Artificial Intelligence workloads by improving the throughput of tight inner convolutional loop operations.

The chief modification to Cascade Lake is the overhauling of the integrated memory controller in order to introduce support for persistent memory. The IMC on Cascade Lake is capable of interfacing with both DDR4 DIMMs and Intel's Optane DC DIMMs. Memory channels can be shared between DDR4 and Optane DC modules. For example, a single channel can have one regular DDR4 DIMM while the other DIMM can be an Optane DC DIMM. All in all, Optane DC DIMMs allow for greater than 3 TiB of system memory per socket.

A superset model is shown on the right. Cascade Lake-based servers make use of Intel's mesh interconnect architecture. In this configuration, the cores, caches, and the memory controllers are organized in rows and columns - each with dedicated connections going through each of the rows and columns allowing for the shortest path between any tile, reducing latency, and improving the bandwidth. Those processors are offered from 4 cores up to 28 cores with 8 to 56 threads.

All models incorporate 6 channels of DDR4 supporting up to 12 DIMMS for a total of 1.5 TiB (with extended models support 3 TiB). For I/O all models incorporate 48x (3x16) lanes of PCIe 3.0. There is an additional x4 lanes PCIe 3.0 reserved exclusively for DMI for the the Lewisburg (LBG) chipset. For a selected number of models, specifically those with F suffix, they have an Omni-Path Host Fabric Interface (HFI) on-package (see Integrated Omni-Path).

Cascade Lake processors are designed for scalability, supporting 2-way, 4-way, and 8-way multiprocessing through Intel's Ultra Path Interconnect (UPI) interconnect links, with two to three links being offered (see § Scalability). High-end models have node controller support allowing for even higher way configuration (e.g., 32-way multiprocessing).

Core

Cascade Lake core is largely identical to that of Skylake's. For in-depth detail of the Skylake core/pipeline see Skylake (client) § Pipeline.

New Technologies

AVX-512 Vector Neural Network Instructions

Main article: AVX-512 Vector Neural Network Instructions

Cascade Lake added support for AVX-512 Vector Neural Network Instructions (AVX512 VNNI). This extension introduces new instructions for accelerating inner convolutional neural network loops. Operations on both 8-bits and 16-bit pairs are supported. The new extension reduces the memory bandwidth required to perform a scalar-pair multiply followed by the summation of horizontal pairs and accumulate. For 16-bit operations, two common operations were fused into a single instruction while for 8-bit operations, the three common operations were fused into one.

Persistent memory support

New text document.svg This section is empty; you can help add the missing info by editing this page.

Scalability

See also: Ultra Path Interconnect

Cascade Lake continues to use Ultra Path Interconnect (UPI) which was first introduced with Skylake. UPI is a high-efficiency coherent interconnect for scalable systems, allowing multiple processors to share a single shared address space. Depending on the exact model, each processor can have either two or three UPI links connecting to the other processors.

Depending on the exact model, Cascade Lake processors can scale from 2-way all the way up to 8-way multiprocessing. Note that the high-end models that support 8-way multiprocessing also only come with three UPI links for this purpose while the lower end processors can have either two or three UPI links. Below are the typical configurations for those processors.


2-way SMP; 2 UPI links

cascade lake sp 2-way 2 upi.svg
2-way SMP; 3 UPI links

cascade lake sp 2-way 3 upi.svg


4-way SMP; 2 UPI links

cascade lake sp 4-way 2 upi.svg
4-way SMP; 3 UPI links

cascade lake sp 4-way 3 upi.svg


8-way SMP; 3 UPI links

cascade lake sp 8-way 3 upi.svg

All Cascade Lake Chips

 List of Cascade Lake Processors
 Main processorFrequency/TurboMemMajor Feature Diff
ModelLaunchedPriceFamilyCore NameCoresThreadsL2$L3$TDPFrequencyMax TurboMax MemTurboSMT
 Uniprocessors
 Multiprocessors (2-way)
 Multiprocessors (4-way)
5215December 2018Xeon GoldCascade Lake SP102010 MiB
10,240 KiB
10,485,760 B
0.00977 GiB
13.75 MiB
14,080 KiB
14,417,920 B
0.0134 GiB
2.2 GHz
2,200 MHz
2,200,000 kHz
3.4 GHz
3,400 MHz
3,400,000 kHz
5220December 2018Xeon GoldCascade Lake SP183618 MiB
18,432 KiB
18,874,368 B
0.0176 GiB
24.75 MiB
25,344 KiB
25,952,256 B
0.0242 GiB
2.2 GHz
2,200 MHz
2,200,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
6230December 2018Xeon GoldCascade Lake SP204020 MiB
20,480 KiB
20,971,520 B
0.0195 GiB
27.5 MiB
28,160 KiB
28,835,840 B
0.0269 GiB
2.1 GHz
2,100 MHz
2,100,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
6240December 2018Xeon GoldCascade Lake SP183618 MiB
18,432 KiB
18,874,368 B
0.0176 GiB
24.75 MiB
25,344 KiB
25,952,256 B
0.0242 GiB
2.6 GHz
2,600 MHz
2,600,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
6240CDecember 2018Xeon GoldCascade Lake SP183618 MiB
18,432 KiB
18,874,368 B
0.0176 GiB
24.75 MiB
25,344 KiB
25,952,256 B
0.0242 GiB
2.6 GHz
2,600 MHz
2,600,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
6242December 2018Xeon GoldCascade Lake SP163216 MiB
16,384 KiB
16,777,216 B
0.0156 GiB
22 MiB
22,528 KiB
23,068,672 B
0.0215 GiB
2.8 GHz
2,800 MHz
2,800,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
6252December 2018Xeon GoldCascade Lake SP244824 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
33 MiB
33,792 KiB
34,603,008 B
0.0322 GiB
2.1 GHz
2,100 MHz
2,100,000 kHz
3.7 GHz
3,700 MHz
3,700,000 kHz
6254December 2018Xeon GoldCascade Lake SP183618 MiB
18,432 KiB
18,874,368 B
0.0176 GiB
24.75 MiB
25,344 KiB
25,952,256 B
0.0242 GiB
3.1 GHz
3,100 MHz
3,100,000 kHz
4 GHz
4,000 MHz
4,000,000 kHz
 Multiprocessors (8-way)
8260December 2018Xeon PlatinumCascade Lake SP244824 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
33 MiB
33,792 KiB
34,603,008 B
0.0322 GiB
165 W
165,000 mW
0.221 hp
0.165 kW
2.4 GHz
2,400 MHz
2,400,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
8260CDecember 2018Xeon PlatinumCascade Lake SP244824 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
33 MiB
33,792 KiB
34,603,008 B
0.0322 GiB
165 W
165,000 mW
0.221 hp
0.165 kW
2.4 GHz
2,400 MHz
2,400,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
8260LDecember 2018Xeon PlatinumCascade Lake SP244824 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
33 MiB
33,792 KiB
34,603,008 B
0.0322 GiB
165 W
165,000 mW
0.221 hp
0.165 kW
2.4 GHz
2,400 MHz
2,400,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
8268December 2018Xeon PlatinumCascade Lake SP244824 MiB
24,576 KiB
25,165,824 B
0.0234 GiB
33 MiB
33,792 KiB
34,603,008 B
0.0322 GiB
205 W
205,000 mW
0.275 hp
0.205 kW
2.9 GHz
2,900 MHz
2,900,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
8270December 2018Xeon PlatinumCascade Lake SP265226 MiB
26,624 KiB
27,262,976 B
0.0254 GiB
35.75 MiB
36,608 KiB
37,486,592 B
0.0349 GiB
205 W
205,000 mW
0.275 hp
0.205 kW
2.7 GHz
2,700 MHz
2,700,000 kHz
4 GHz
4,000 MHz
4,000,000 kHz
8276December 2018Xeon PlatinumCascade Lake SP285628 MiB
28,672 KiB
29,360,128 B
0.0273 GiB
38.5 MiB
39,424 KiB
40,370,176 B
0.0376 GiB
165 W
165,000 mW
0.221 hp
0.165 kW
2.3 GHz
2,300 MHz
2,300,000 kHz
4 GHz
4,000 MHz
4,000,000 kHz
8276LDecember 2018Xeon PlatinumCascade Lake SP285628 MiB
28,672 KiB
29,360,128 B
0.0273 GiB
38.5 MiB
39,424 KiB
40,370,176 B
0.0376 GiB
165 W
165,000 mW
0.221 hp
0.165 kW
2.3 GHz
2,300 MHz
2,300,000 kHz
4 GHz
4,000 MHz
4,000,000 kHz
8276MDecember 2018Xeon PlatinumCascade Lake SP285628 MiB
28,672 KiB
29,360,128 B
0.0273 GiB
38.5 MiB
39,424 KiB
40,370,176 B
0.0376 GiB
165 W
165,000 mW
0.221 hp
0.165 kW
2.3 GHz
2,300 MHz
2,300,000 kHz
4 GHz
4,000 MHz
4,000,000 kHz
8280December 2018Xeon PlatinumCascade Lake SP285628 MiB
28,672 KiB
29,360,128 B
0.0273 GiB
38.5 MiB
39,424 KiB
40,370,176 B
0.0376 GiB
205 W
205,000 mW
0.275 hp
0.205 kW
2.7 GHz
2,700 MHz
2,700,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
8280LDecember 2018Xeon PlatinumCascade Lake SP285628 MiB
28,672 KiB
29,360,128 B
0.0273 GiB
38.5 MiB
39,424 KiB
40,370,176 B
0.0376 GiB
205 W
205,000 mW
0.275 hp
0.205 kW
2.7 GHz
2,700 MHz
2,700,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
8280MDecember 2018Xeon PlatinumCascade Lake SP285628 MiB
28,672 KiB
29,360,128 B
0.0273 GiB
38.5 MiB
39,424 KiB
40,370,176 B
0.0376 GiB
205 W
205,000 mW
0.275 hp
0.205 kW
2.7 GHz
2,700 MHz
2,700,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
Count: 19

Documents

codenameCascade Lake +
designerIntel +
first launched2018 +
full page nameintel/microarchitectures/cascade lake +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameCascade Lake +
pipeline stages (max)19 +
pipeline stages (min)14 +
process14 nm (0.014 μm, 1.4e-5 mm) +