From WikiChip
Difference between revisions of "intel/microarchitectures/cascade lake"
< intel‎ | microarchitectures

m (Reverted edits by 193.106.131.182 to last revision by David)
(Blanked the page)
Line 1: Line 1:
{{intel title|Cascade Lake|arch}}
 
{{microarchitecture
 
|atype=CPU
 
|name=Cascade Lake
 
|designer=Intel
 
|manufacturer=Intel
 
|introduction=2019
 
|process=14 nm
 
|cores=2
 
|cores 2=4
 
|cores 3=6
 
|cores 4=8
 
|cores 5=10
 
|cores 6=12
 
|cores 7=16
 
|cores 8=18
 
|cores 9=20
 
|cores 10=22
 
|cores 11=24
 
|cores 12=26
 
|cores 13=28
 
|cores 14=32
 
|cores 15=48
 
|cores 16=56
 
|type=Superscalar
 
|oooe=Yes
 
|speculative=Yes
 
|renaming=Yes
 
|stages min=14
 
|stages max=19
 
|isa=x86-64
 
|extension=MOVBE
 
|extension 2=MMX
 
|extension 3=SSE
 
|extension 4=SSE2
 
|extension 5=SSE3
 
|extension 6=SSSE3
 
|extension 7=SSE4.1
 
|extension 8=SSE4.2
 
|extension 9=POPCNT
 
|extension 10=AVX
 
|extension 11=AVX2
 
|extension 12=AES
 
|extension 13=PCLMUL
 
|extension 14=FSGSBASE
 
|extension 15=RDRND
 
|extension 16=FMA3
 
|extension 17=F16C
 
|extension 18=BMI
 
|extension 19=BMI2
 
|extension 20=VT-x
 
|extension 21=VT-d
 
|extension 22=TXT
 
|extension 23=TSX
 
|extension 24=RDSEED
 
|extension 25=ADCX
 
|extension 26=PREFETCHW
 
|extension 27=CLFLUSHOPT
 
|extension 28=XSAVE
 
|extension 29=SGX
 
|extension 30=MPX
 
|extension 31=AVX-512
 
|l1i=32 KiB
 
|l1i per=core
 
|l1i desc=8-way set associative
 
|l1d=32 KiB
 
|l1d per=core
 
|l1d desc=8-way set associative
 
|l2=1 MiB
 
|l2 per=core
 
|l2 desc=16-way set associative
 
|l3=1.375 MiB
 
|l3 per=core
 
|l3 desc=11-way set associative
 
|core name=Cascade Lake X
 
|core name 2=Cascade Lake SP
 
|core name 3=Cascade Lake AP
 
|predecessor=Skylake (server)
 
|predecessor link=intel/microarchitectures/skylake (server)
 
|successor=Cooper Lake
 
|successor link=intel/microarchitectures/cooper lake
 
|contemporary=Coffee Lake
 
|contemporary link=intel/microarchitectures/coffee lake
 
}}
 
[[File:cascade lake chip.JPG|right|thumb|Cascade Lake]]
 
'''Cascade Lake''' ('''CSL'''/'''CLX''') is [[Intel]]'s successor to {{\\|Skylake (server)|Skylake}}, a [[14 nm]] [[microarchitecture]] for enthusiasts and servers. Cascade Lake is the "Optimization" phase as part of Intel's {{intel|PAO}} model.
 
  
For desktop enthusiasts, Cascade Lake is branded {{intel|Core i7}}, and {{intel|Core i9}} processors (under the {{intel|Core X}} series). For scalable server class processors, Intel branded it as {{intel|Xeon Bronze}}, {{intel|Xeon Silver}}, {{intel|Xeon Gold}}, and {{intel|Xeon Platinum}}.
 
 
== Codenames ==
 
{| class="wikitable"
 
|-
 
! Core !! Abbrev !! Platform !! Target
 
|-
 
| {{intel|Cascade Lake X|l=core}} || CSL-X || Glacier Falls || High-end desktops & enthusiasts market
 
|-
 
| {{intel|Cascade Lake W|l=core}} || CSL-W || || Enterprise/Business workstations
 
|-
 
| {{intel|Cascade Lake SP|l=core}} || CSL-SP || || Server Scalable Processors
 
|-
 
| {{intel|Cascade Lake AP|l=core}} || CSL-AP || || Server Advanced Processors
 
|}
 
 
== Brands ==
 
Cascade Lake is sold under five different families.
 
 
{| class="wikitable tc4 tc5 tc6 tc7 tc8" style="text-align: center;"
 
|-
 
! rowspan="2" | Logo !! rowspan="2" | Family !! rowspan="2" | General Description !! colspan="7" | Differentiating Features
 
|-
 
! Cores !! {{intel|Hyper-Threading|HT}} !! {{intel|Turbo Boost|TBT}} !! {{x86|AVX-512}} !! AVX-512 Units !! {{intel|Ultra Path Interconnect|UPI}} links !! Scalability
 
|-
 
| || {{intel|Xeon W}}  || style="text-align: left;" | High-performance Workstations || 8-28 || {{tchk|yes}} || {{tchk|yes}} || {{tchk|yes}} || 1 || - || -
 
|-
 
| colspan="11" |
 
|-
 
| [[File:xeon bronze (2017).png|50px]] || {{intel|Xeon Bronze}} || style="text-align: left;" | Entry-level performance / <br>Cost-sensitive || 6 || {{tchk|no}} || {{tchk|no}} || {{tchk|yes}} || 1 || 2 || Up to 2
 
|-
 
| [[File:xeon silver (2017).png|50px]] || {{intel|Xeon Silver}} || style="text-align: left;" | Mid-range performance / <br>Efficient lower power || 8-16 || {{tchk|yes}} || {{tchk|yes}} || {{tchk|yes}} || 1 || 2 || Up to 2
 
|-
 
| rowspan="2" | [[File:xeon gold (2017).png|50px]] || {{intel|Xeon Gold}} 5000 || style="text-align: left;" | High performance || 4-18 || {{tchk|yes}} || {{tchk|yes}} || {{tchk|yes}} || 1 || 2 || Up to 4
 
|-
 
| {{intel|Xeon Gold}} 6000 || style="text-align: left;" | Higher performance || 8-24 || {{tchk|yes}} || {{tchk|yes}} || {{tchk|yes}} || 2  || 3 || Up to 4
 
|-
 
| [[File:xeon platinum (2017).png|50px]] || {{intel|Xeon Platinum}} || style="text-align: left;" | Highest performance / flexibility || 4-28 || {{tchk|yes}} || {{tchk|yes}} || {{tchk|yes}} || 2 || 3 || Up to 8
 
|}
 
 
:[[File:xeon sp naming change.svg|800px]]
 
 
=== Identification ===
 
:[[File:cascade lake naming scheme.svg|750px]]
 
 
 
Where,
 
 
* "''F''" suffix integrates the {{intel|Omni-Path}} Host Fabric Interface (HFI) die on-package
 
* "''L''" suffix indicates the SKU is a large memory (4.5 TiB) tier SKU
 
* "''M''" suffix indicates the SKU is a medium memory (2 TiB) tier SKU
 
* "''N''" suffix indicates the SKU is a networking-specialized model
 
* "''S''" suffix indicates the SKU is a search application-specialized model
 
* "''T''" suffix indicates that SKU has an extended lifetime (10 year use) guarantees and [[NEBS]]-friendly packing specification
 
* "''U''" suffix indicates the SKU is a single-socket model (even if part of the [[Xeon Gold]] family that normally supports up two 4-way [[SMP]])
 
* "''V''" suffix indicates the SKU targets the VM density value market
 
* "''Y''" suffix indicates the SKU has {{intel|Speed Select Technology}} (SST)
 
 
Note that Speed Select (SST) SKUs were originally suffixed with the 'C' suffix but were later changed to 'Y'. Some of the early engineering samples that are circulating around still suffixed with a 'C'.
 
 
== Release Dates ==
 
Cascade Lake was released on April 2, 2019. {{intel|Cascade Lake W|l=core}} for workstations were released on June 3, 2019.
 
 
== Process Technology ==
 
Cascade Lake is fabricated on Intel's enhanced [[14 nm process]].
 
 
== Compiler support ==
 
 
=== CPUID ===
 
{| class="wikitable tc1 tc2 tc3 tc4 tc5"
 
! Core !! Extended<br>Family !! Family !! Extended<br>Model !! Model !! Stepping
 
|-
 
| rowspan="2" | Cascade Lake || 0 || 0x6 || 0x5 || 0x5 || 0x5..0x7
 
|-
 
| colspan="5" | Family 6 Model 85 Stepping 5..7
 
|}
 
 
There are early CPUs which report stepping 4. Normally stepping 4 is Skylake X.
 
 
== Architecture ==
 
[[File:intel cascade core changes.png|right|thumb|Cascade lake core changes]]
 
As with {{\\|Skylake (server)|Skylake}}, Cascade Lake is also based on the {{intel|Purley|l=platform}} platform and is designed as a drop-in upgrade.
 
=== Key changes from {{\\|Skylake (server)|Skylake}} ===
 
* System Architecture
 
** New [[multi-chip packaged]] SKUs
 
*** Up to 56 cores, 12 DDR4 channels
 
* Core
 
** Execution units
 
*** New {{x86|avx512vnni|VNNI}} logic on Port 0 and Port 1 as part of the FMAs
 
** Higher frequency (100-300 MHz higher for both base/turbo)
 
** Security
 
*** Hardware mitigations for {{cve|CVE-2017-5715}} (Spectre, Variant 2)
 
*** Hardware mitigations for {{cve|CVE-2017-5754}} (Meltdown, Variant 3)
 
*** Hardware mitigations for {{cve|CVE-2018-3640}} (Rogue System Register Read (RSRE), Variant 3a)
 
*** Hardware mitigations for {{cve|CVE-2018-3620}}/{{cve|CVE-2018-3646}} (L1 Terminal Fault, Foreshadow)
 
*** Hardware mitigations for {{cve|CVE-2018-12130}}/{{cve|CVE-2018-12126}}/{{cve|CVE-2018-12127}}/{{cve|CVE-2019-11091}} (MDS; MFBDS, RIDL, MSBDS, Fallout, MLPDS, MDSUM)
 
**** ''note that while steppings 6 & 7 are fully mitigated, earlier stepping 5 is not protected against MSBDS, MLPDS, nor MDSUM''
 
** New {{x86|CPUID}} Level Type field for "die"
 
* Integrated Memory Controller
 
** Added support for [[persistent memory]]
 
*** Support for DDR-T / Optane DIMMs
 
**** Apache Pass DIMMs
 
* Memory
 
** Higher data rate (2933 MT/s, up from 2666 MT/s)
 
** Standard support for up to 1 TiB per socket (up from 768 GiB)
 
** Extended memory support for up to 2 TiB per socket (up from 1.5 TiB)
 
** Large memory support for up to 4.5 TiB per socket
 
* I/O
 
** x64 PCIe lanes exposed to the platform (up from x48) (''{{intel|Xeon W}} only'')
 
 
{{expand list}}
 
====New instructions ====
 
Cascade Lake introduced a number of {{x86|extensions|new instructions}}:
 
 
* {{x86|avx512vnni|<code>AVX-512 VNNI</code>}} - AVX-512 Vector Neural Network Instructions
 
 
=== Block Diagram ===
 
==== Entire SoC Overview ====
 
===== LCC SoC =====
 
:[[File:skylake sp lcc block diagram.svg|500px]]
 
===== HCC SoC =====
 
:[[File:skylake sp hcc block diagram.svg|600px]]
 
===== XCC SoC =====
 
:[[File:skylake sp xcc block diagram.svg|800px]]
 
===== Individual Core =====
 
<small>The high-level core architecture is identical to that of {{\\|Skylake (server)#Individual Core|Skylake}}.</small>
 
:[[File:skylake server block diagram.svg|950px]]
 
 
=== Memory Hierarchy ===
 
The memory hierarchy of Cascade Lake is identical to that of {{\\|Skylake (Server)|Skylake's}}.
 
 
* Cache
 
** L0 µOP cache:
 
*** 1,536 µOPs/core, 8-way set associative
 
**** 32 sets, 6-µOP line size
 
**** statically divided between threads, inclusive with L1I
 
** L1I Cache:
 
*** 32 [[KiB]]/core, 8-way set associative
 
**** 64 sets, 64 B line size
 
**** competitively shared by the threads/core
 
** L1D Cache:
 
*** 32 KiB/core, 8-way set associative
 
*** 64 sets, 64 B line size
 
*** competitively shared by threads/core
 
*** 4 cycles for fastest load-to-use (simple pointer accesses)
 
**** 5 cycles for complex addresses
 
*** 128 B/cycle load bandwidth
 
*** 64 B/cycle store bandwidth
 
*** Write-back policy
 
** L2 Cache:
 
*** 1 MiB/core, 16-way set associative
 
*** 64 B line size
 
*** Inclusive
 
*** 64 B/cycle bandwidth to L1$
 
*** Write-back policy
 
*** 14 cycles latency
 
** L3 Cache:
 
*** 1.375 MiB/core, 11-way set associative, shared across all cores
 
**** Note that a few models have non-default cache sizes due to disabled cores
 
*** 2,048 sets, 64 B line size
 
*** Non-inclusive victim cache
 
*** Write-back policy
 
*** 50-70 cycles latency
 
** Snoop Filter (SF):
 
*** 2,048 sets, 12-way set associative
 
* DRAM
 
** 6 channels of DDR4, up to 2666 MT/s
 
*** RDIMM and LRDIMM
 
*** bandwidth of 21.33 GB/s
 
*** aggregated bandwidth of 128 GB/s
 
 
Cascade Lake TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally there is a unified L2 TLB (STLB).
 
* TLBs:
 
** ITLB
 
*** 4 KiB page translations:
 
**** 128 entries; 8-way set associative
 
**** dynamic partitioning
 
*** 2 MiB / 4 MiB page translations:
 
**** 8 entries per thread; fully associative
 
**** Duplicated for each thread
 
** DTLB
 
*** 4 KiB page translations:
 
**** 64 entries; 4-way set associative
 
**** fixed partition
 
*** 2 MiB / 4 MiB page translations:
 
**** 32 entries; 4-way set associative
 
**** fixed partition
 
*** 1G page translations:
 
**** 4 entries; 4-way associative
 
**** fixed partition
 
** STLB
 
*** 4 KiB + 2 MiB page translations:
 
**** 1536 entries; 12-way set associative
 
**** fixed partition
 
*** 1 GiB page translations:
 
**** 16 entries; 4-way set associative
 
**** fixed partition
 
 
== Overview ==
 
[[File:skylake server overview.svg|right|550px]]
 
Cascade Lake is Intel's direct successor to the {{\\|Skylake (server)|Skylake server microarchitecture}}. It is designed to be compatible with the Skylake parts ({{intel|LGA-3647}}) and utilize the {{intel|Purley|Purley platform|l=platform}}. To that end, Cascade Lake shares the same socket and pinout as well as the same core count, cache size, and I/O capabilities.
 
 
Cascade Lake introduces initial in-hardware Spectre and Meltdown mitigation, including Variant 2, 3, and L1TF. Chips are fabricated on an enhanced [[14 nm process]] which allows Intel to extract an additional power efficiency, allowing them to clock those processors higher. Intel noted that targeted performance improvements were applied to some of the critical paths to make this possible. Although the core architecture is largely identical to that of Skylake, Cascade Lake introduces support for {{x86|AVX512VNNI}} which is designed to improve the performance of [[Artificial Intelligence]] workloads by improving the throughput of tight inner convolutional loop operations.
 
 
The chief modification to Cascade Lake is the overhauling of the [[integrated memory controller]] in order to introduce support for [[persistent memory]]. The IMC on Cascade Lake is capable of interfacing with both DDR4 DIMMs and Intel's Optane DC DIMMs. Memory channels can be shared between DDR4 and Optane DC modules. For example, a single channel can have one regular DDR4 DIMM while the other DIMM can be an Optane DC DIMM. All in all, Optane DC DIMMs allow for greater than 3 TiB of system memory per socket.
 
 
A superset model is shown on the right. Cascade Lake-based servers make use of Intel's {{intel|mesh interconnect}} architecture. In this configuration, the cores, caches, and the memory controllers are organized in rows and columns - each with dedicated connections going through each of the rows and columns allowing for the shortest path between any tile, reducing latency, and improving the bandwidth. Those processors are offered from [[4 cores]] up to [[28 cores]] with 8 to 56 threads.
 
 
All models incorporate 6 channels of DDR4 supporting up to 12 DIMMS for a total of 1.5 TiB (with extended models support 3 TiB). For I/O all models incorporate 48x (3x16) lanes of PCIe 3.0. There is an additional x4 lanes PCIe 3.0 reserved exclusively for DMI for the the {{intel|Lewisburg|l=chipset}} (LBG) chipset. For a selected number of models, specifically those with ''F'' suffix, they have an {{intel|Omni-Path}} Host Fabric Interface (HFI) on-package (see [[#Integrated_Omni-Path|Integrated Omni-Path]]).
 
 
Cascade Lake processors are designed for scalability, supporting 2-way, 4-way, and 8-way multiprocessing through Intel's {{intel|Ultra Path Interconnect}} (UPI) interconnect links, with two to three links being offered (see [[#Scalability|§ Scalability]]). High-end models have node controller support allowing for even higher way configuration (e.g., 32-way multiprocessing).
 
 
== Core ==
 
Cascade Lake core is largely identical to that of {{\\|skylake_(server)#Core|Skylake's}}. For in-depth detail of the Skylake core/pipeline see {{\\|skylake_(client)#Pipeline|Skylake (client) § Pipeline}}.
 
 
=== Execution engine ===
 
==== Scheduler Ports & Execution Units ====
 
[[File:cascade lake scheduler.svg|right|500px]]
 
Although the core is largely identical to {{\\|skylake_(server)#Core|Skylake's}}, there are a few minor changes in Cascade Lake. In order to accommodate the new {{x86|AVX512-VNNI}} instructions, new logic was added on Port 0 and Port 1. Where there was previously two FMA units for doing [[fused multiply-add]] [[floating-point]] operations, in Cascade Lake, new VNNI logic was added to that block which does a similar operation but works on [[integer]] data types. Support for 8-bit and 16-bit integers was added. It's worth noting that since integers dynamic range is quite low, the accumulation is performed on a 32-bit integer destination.
 
 
 
{{clear}}
 
 
== Higher core-count multi-chip processors ==
 
[[File:cascade lake ap overview.png|thumb|right|Overview slide]]
 
[[File:intel cascade lake ap chip with heatsink.JPG|right|thumb|Cascade Lake AP]]
 
[[File:cascade lake ap board.JPG|right|thumb|Cascade Lake AP board]]
 
{{main|intel/cores/cascade_lake_ap|l1=Cascade Lake Advanced Performance}}
 
Intel introduced a number of new products, code name {{intel|Cascade Lake Advanced Performance|l=core}}, which doubled the core count. Intel achieved this by packaging two extreme core count (XCC) dies together on the same substrate in a BGA package. The two dies are interconnected through 1 {{intel|UPI}} link which paved the way for models up to 56 cores. This was done to support a fully-connected system in a 2-way SMP system which is where those chips are designed to go. In those systems, every die is interconnected to every other die (four in total) over a {{intel|UPI}} link. With two dies per package, each CPU exposes 12 DDR4 memory channels and x40 PCIe Gen3 lanes.
 
 
 
:[[File:cascade lake ap block diagram (2-way).svg|600px]]
 
 
 
=== Package ===
 
Cascade Lake AP uses a high-density high-performance ball grid array (BGA) package. It has 5,903 balls at a 0.99 mm pitch. There are two separate power corridors for performance and power reasons. It uses a single heat spreader designed to cover the entire TDP range for all processor models. In total, each package exposes 12 channel DDR4 supporting rates of up to 2933 MT/s as well as 4 {{intel|UPI}} links operating at 10.4 GT/s.
 
 
 
<div>
 
<div style="float: left;">'''Package Top side view'''<br>[[File:cascade lake ap package.png|400px]]</div>
 
<div style="float: left;">'''Package bottom side view'''<br>[[File:cascade lake ap package bottom.jpg|400px]]</div>
 
</div>
 
 
{{clear}}
 
 
=== CPUID level type ===
 
Because there are multiple dies in the same package for {{intel|Cascade Lake AP|l=core}} processors, software that wanted to take this into consideration for better optimization can do so through a new level type {{x86|CPUID}} value for ''5 = die''.
 
 
{| class="wikitable"
 
|-
 
! Initial EAX Value !! Register !! Level Type
 
|-
 
| 0x1F || EDX
 
|
 
* 0 - Invalid
 
* 1 - SMT
 
* 2 - Core
 
* 3 - Module
 
* 4 - Tile
 
* '''5 - Die'''
 
* 6+ - Reserved
 
|}
 
 
== New Technologies ==
 
=== AVX-512 Vector Neural Network Instructions ===
 
{{main|x86/avx512vnni|l1=AVX-512 Vector Neural Network Instructions}}
 
Cascade Lake added support for AVX-512 Vector Neural Network Instructions (AVX512 VNNI). This extension introduces new instructions for accelerating inner [[convolutional neural network]] loops. Operations on both 8-bits and 16-bit pairs are supported. The new extension reduces the memory bandwidth required to perform a scalar-pair multiply followed by the summation of horizontal pairs and accumulate. For 16-bit operations, two common operations were fused into a single instruction while for 8-bit operations, the three common operations were fused into one.
 
 
=== Speed Select Technology ===
 
{{main|intel/speed select technology|l1=Speed Select Technology}}
 
One of the other features that was added in Cascade Lake is Speed Select Technology (SST) which allows a chip to be configured in the field for various workloads such as throughput and single-core performance. This is done through configurations that allow the end user to determine how the power budget is spent. For example, single-thread performance can be further increased for fewer chips by further decreasing the base frequency of more inactive core. Alternatively, higher throughput can be achieved by increasing the base frequency of some cores while reducing the base frequency of other cores. All in all, the conigurations are ultimately bound by the power budget of the chip, however the custom tweaking of frquency (base, turbo, or otherwise) enable custom tuning of affinitized workloads.
 
 
== Specialized SKUs ==
 
Though Intel has been providing various customers with specialized SKUs for a long time, with Cascade Lake, Intel started doubling down on specialized SKUs. Many SKUs that were sought after by customers were introduced as mainstream SKUs in Intel's standard Xeon lineup. New SKUs can be grouped into the following categories:
 
 
* Speed Select Technology SKUs
 
* Specialized for network function virtualization
 
* Specialized for networking and IoT (NEBS Friendly)
 
* Specialized Search Application
 
* Extended Memory
 
 
=== Speed Select Technology (SST) ===
 
{{main|intel/speed_select_technology|l1=Speed Select Technology (SST)}}
 
Speed Select Technology is a new feature found on SST-enabled SKUs that allows for finer per-core power/performance configuration. SST-enabled SKUs come with additional controls that allow system administrators to change the turbo and base frequencies of certain cores. Those cores (called prioritized cores) can then have certain applications with higher priority affinitized to them. Since the power budget of the processor is fixed, with less prioritized cores, it’s possible to increase the base or turbo frequency. This can be furthered improved by reducing the frequencies of lower-priority cores (below their pre-defined base frequencies). In other words, SST allows for higher performance for priority workloads through the sacrifice of lower-priority workloads.
 
 
=== VM Density Value Specialized SKUs ===
 
VM density value optimized are SKUs that have been optimized to provide higher ROI for customers that benefit more from the higher thread count. There are two new VM density SKUs. Those models are suffixed with a ‘V’.
 
 
=== Network Function Virtualization (NFV) SKUs ===
 
NFV models are optimized for dynamic density VMs with additional headroom for higher subscribe capacity. NFV SKUs also feature speed select profiles with configurable priority based on the kind of workloads that are running. There are three new NFV SKUs. Those models are suffixed with a ‘N’.
 
 
=== Search Application Value specialized ===
 
The search-optimized SKU is mainly designed for cloud search applications. Those models are designed such that they have predictable performance and latencies. There is one new search-optimized SKUs. Those models are suffixed with an ‘S’.
 
 
=== Specialized for networking and IoT (NEBS Friendly) ===
 
Models that are suffixed with “T” have extended lifetime (10 year use) guarantees and NEBS-friendly packing specification. There are four new NEBS-friendly SKUs.
 
 
=== Extended Memory SKUs ===
 
Beyond the normal supported memory, Intel also offers fourteen additional SKUs with extended memory support. Xeon Scalable models with the ‘M’ suffix offer medium DDR memory tier support with up to 2 TiB of memory per socket. Above those SKUs are ‘L’-suffixed SKUs which offer large memory support of up to 4.5 TiB of memory per socket.
 
 
== Persistent memory support ==
 
{{see also|x86/persistent_memory_extensions|snia/npm|l1=Persistent Memory Extensions|l2=NVM Programming Model}}
 
Cascade Lake introduces support for persistent memory. For Cascade Lake, this comes in the form of first-generation Optane DC Persistent Memory Modules (DCPMM), codename Apache Pass. PMMs are designed to improve the overall data center system performance by bringing a larger amount of data closer to the processor, in terms of latency, but behind DRAM.
 
 
For first-generation Optane DC DIMMs, Intel supports capacities of 128, 256, and 512 GiB. The DIMMs are DDR4 pin-compatible and although they are slightly slower than DRAM, they are considerably faster than a typical SSD - fast enough to double up as "slow main memory". Optane DC DIMMs are designed to work with direct byte-addressable load/store accesses and have built-in encryption. They allow cache line access (i.e., 64B granularity) and offer idle latency close to that of DDR4 DIMMs.
 
 
[[File:cascade-presistence.png|right|thumb|Persistence domain]]
 
Although Optane DC DIMMs are DDR4 pin-compatible, meaning they use the same electrical and mechanical interface as DDR4, they are not a direct drop-in replacement. Those DIMMs have different characteristics and therefore they interface with the CPU over proprietary protocol extensions. For this reason, Cascade Lake features an overhauled memory controller capable of interfacing with both DDR4 DIMMs and Optane DC DIMMs. Memory channels can be shared between DDR4 and Optane DC modules. For example, a single channel can have one regular DDR4 DIMM while the other DIMM can be an Optane DC DIMM. All in all, Optane DC DIMMs allow for greater than 3 TiB of system memory per socket.
 
 
Although the instructions to support persistent memory were already introduced in {{\\|Skylake (Server)|Skylake}}, Cascade Lake can now actually make use of them. The bare minimum for persistent memory support can be realized by simply putting the data in the write pending queue (WPQ) of the integrated memory controller within the persistence domain. The persistence domain is a unique checkpoint. Once data makes it to that point, it’s persistence is guaranteed by the platform interface. This is shown in the diagram on the right in the bottom dotted box. Any data within that box is either saved on the DIMM, on the way to the DIMM, or in the WPQ in the IMC. Regardless of where it is, the platform is required to store enough energy (e.g., through on-board supercapacitors) to save everything within that box in the event of a power loss.
 
 
=== Encryption ===
 
[[File:cascade-lake-optane-dimm-encryption.png|right|thumb|PMM Encryption]]
 
One of the unique problems associated with persistent memory is the security of the persistent data itself. To that end, Optane DIMMs protects all data with 256b AES-XTP. There are two supported modes. The first one is memory mode where the DIMM is treated like DRAM memory (one big cache) and as such, the key is regenerated each boot and data is lost on a power cycle. The second mode is App Direct which keeps the key on a power cycle. In this mode, the passphrase must be securely stored to unlock the data. Interestingly, one of the features that are currently missing is support for any form of virtualization. Currently, in a virtualized environment, the DIMM is unlocked using a single passphrase meaning the host has access to all the data. It’s reasonable to expect that in future Xeons, support for virtualization will be added such that data can be remained encrypted and private for that VM.
 
 
== Scalability ==
 
{{see also|intel/ultra path interconnect|l1=Ultra Path Interconnect}}
 
Cascade Lake continues to use {{intel|Ultra Path Interconnect}} (UPI) which was {{\\|Skylake (server)#Scalability|first introduced}} with {{\\|Skylake (server)|Skylake}}. UPI is a high-efficiency coherent interconnect for scalable systems, allowing multiple processors to share a single shared address space. Depending on the exact model, each processor can have either two or three UPI links connecting to the other processors.
 
 
Depending on the exact model, Cascade Lake processors can scale from 2-way all the way up to 8-way multiprocessing. Note that the high-end models that support 8-way multiprocessing also only come with three UPI links for this purpose while the lower end processors can have either two or three UPI links. Below are the typical configurations for those processors.
 
 
 
<div style="display: inline-block;">
 
<div style="float: left; margin: 15px; text-align: center;">'''2-way SMP; 2 UPI links'''<br><br>[[File:cascade lake sp 2-way 2 upi.svg|400px]]</div>
 
<div style="float: left; margin: 15px; text-align: center;">'''2-way SMP; 3 UPI links'''<br><br>[[File:cascade lake sp 2-way 3 upi.svg|400px]]</div>
 
</div>
 
 
 
<div style="display: inline-block;">
 
<div style="float: left; margin: 15px; text-align: center;">'''4-way SMP; 2 UPI links'''<br><br>[[File:cascade lake sp 4-way 2 upi.svg|400px]]</div>
 
<div style="float: left; margin: 15px; text-align: center;">'''4-way SMP; 3 UPI links'''<br><br>[[File:cascade lake sp 4-way 3 upi.svg|400px]]</div>
 
</div>
 
 
 
<div style="display: inline-block;">
 
<div style="text-align: center;">'''8-way SMP; 3 UPI links'''<br><br>[[File:cascade lake sp 8-way 3 upi.svg|400px]]</div>
 
</div>
 
 
== All Cascade Lake Chips ==
 
<!-- NOTE:
 
          This table is generated automatically from the data in the actual articles.
 
          If a microprocessor is missing from the list, an appropriate article for it needs to be
 
          created and tagged accordingly.
 
 
          Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 
-->
 
{{comp table start}}
 
<table class="comptable sortable tc6 tc7 tc14 tc15">
 
<tr class="comptable-header"><th>&nbsp;</th><th colspan="24">List of Cascade Lake Processors</th></tr>
 
<tr class="comptable-header"><th>&nbsp;</th><th colspan="9">Main processor</th><th colspan="2">Frequency/{{intel|Turbo Boost|Turbo}}</th><th>Mem</th><th colspan="7">Major Feature Diff</th></tr>
 
{{comp table header 1|cols=Launched, Price, Family, Core Name, Cores, Threads, %L2$, %L3$, TDP, %Frequency, %Max Turbo, Max Mem, Turbo, SMT}}
 
<tr class="comptable-header comptable-header-sep"><th>&nbsp;</th><th colspan="25">[[Uniprocessors]]</th></tr>
 
{{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Cascade Lake]] [[max cpu count::1]]
 
|?full page name
 
|?model number
 
|?first launched
 
|?release price
 
|?microprocessor family
 
|?core name
 
|?core count
 
|?thread count
 
|?l2$ size
 
|?l3$ size
 
|?tdp
 
|?base frequency#GHz
 
|?turbo frequency (1 core)#GHz
 
|?max memory#GiB
 
|?has intel turbo boost technology 2_0
 
|?has simultaneous multithreading
 
|format=template
 
|template=proc table 3
 
|searchlabel=
 
|sort=microprocessor family, model number
 
|order=asc,asc
 
|userparam=16:15
 
|mainlabel=-
 
|limit=200
 
}}
 
<tr class="comptable-header comptable-header-sep"><th>&nbsp;</th><th colspan="25">[[Multiprocessors]] (2-way)</th></tr>
 
{{#ask:
 
[[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Cascade Lake]] [[max cpu count::2]]
 
|?full page name
 
|?model number
 
|?first launched
 
|?release price
 
|?microprocessor family
 
|?core name
 
|?core count
 
|?thread count
 
|?l2$ size
 
|?l3$ size
 
|?tdp
 
|?base frequency#GHz
 
|?turbo frequency (1 core)#GHz
 
|?max memory#GiB
 
|?has intel turbo boost technology 2_0
 
|?has simultaneous multithreading
 
|format=template
 
|template=proc table 3
 
|searchlabel=
 
|sort=microprocessor family, model number
 
|order=asc,asc
 
|userparam=16:15
 
|mainlabel=-
 
|limit=60
 
}}
 
<tr class="comptable-header comptable-header-sep"><th>&nbsp;</th><th colspan="25">[[Multiprocessors]] (4-way)</th></tr>
 
{{#ask:
 
[[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Cascade Lake]] [[max cpu count::4]]
 
|?full page name
 
|?model number
 
|?first launched
 
|?release price
 
|?microprocessor family
 
|?core name
 
|?core count
 
|?thread count
 
|?l2$ size
 
|?l3$ size
 
|?tdp
 
|?base frequency#GHz
 
|?turbo frequency (1 core)#GHz
 
|?max memory#GiB
 
|?has intel turbo boost technology 2_0
 
|?has simultaneous multithreading
 
|format=template
 
|template=proc table 3
 
|searchlabel=
 
|sort=microprocessor family, model number
 
|order=asc,asc
 
|userparam=16:15
 
|mainlabel=-
 
|limit=60
 
}}
 
<tr class="comptable-header comptable-header-sep"><th>&nbsp;</th><th colspan="25">[[Multiprocessors]] (8-way)</th></tr>
 
{{#ask:
 
[[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Cascade Lake]] [[max cpu count::8]]
 
|?full page name
 
|?model number
 
|?first launched
 
|?release price
 
|?microprocessor family
 
|?core name
 
|?core count
 
|?thread count
 
|?l2$ size
 
|?l3$ size
 
|?tdp
 
|?base frequency#GHz
 
|?turbo frequency (1 core)#GHz
 
|?max memory#GiB
 
|?has intel turbo boost technology 2_0
 
|?has simultaneous multithreading
 
|format=template
 
|template=proc table 3
 
|searchlabel=
 
|sort=microprocessor family, model number
 
|order=asc,asc
 
|userparam=16:15
 
|mainlabel=-
 
|limit=60
 
}}
 
{{comp table count|ask=[[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Cascade Lake]]}}
 
</table>
 
{{comp table end}}
 
 
=== SKU Comparison ===
 
Below are a number of SKU comparison graphs based on their specifications.
 
 
<div style="float: left; margin: 10px">
 
{{#ask: [[Category:microprocessor models by intel]] [[microarchitecture::Cascade Lake]]
 
|?core count
 
|?base frequency
 
|charttitle=Cores vs. Base Frequency
 
|numbersaxislabel=Frequency (MHz)
 
|labelaxislabel=Core Count
 
|height=400
 
|width=400
 
|theme=vector
 
|group=property
 
|grouplabel=subject
 
|charttype=scatter
 
|format=jqplotseries
 
|mainlabel=-
 
}}
 
</div>
 
 
<div style="float: left; margin: 10px">
 
{{#ask: [[Category:microprocessor models by intel]] [[microarchitecture::Cascade Lake]]
 
|?core count
 
|?turbo frequency (1 core)
 
|charttitle=Cores vs. Turbo Frequency
 
|numbersaxislabel=Frequency (MHz)
 
|labelaxislabel=Core Count
 
|height=400
 
|width=400
 
|theme=vector
 
|group=property
 
|grouplabel=subject
 
|charttype=scatter
 
|format=jqplotseries
 
|mainlabel=-
 
}}
 
</div>
 
 
<div style="float: left; margin: 10px">
 
{{#ask: [[Category:microprocessor models by intel]] [[microarchitecture::Cascade Lake]]
 
|?core count
 
|?tdp
 
|charttitle=Cores vs. TDP
 
|numbersaxislabel=TDP (W)
 
|labelaxislabel=Core Count
 
|height=400
 
|width=400
 
|theme=vector
 
|group=property
 
|grouplabel=subject
 
|charttype=scatter
 
|format=jqplotseries
 
|mainlabel=-
 
}}
 
</div>
 
 
<div style="float: left; margin: 10px;">
 
{{#ask: [[Category:microprocessor models by intel]] [[microarchitecture::Cascade Lake]]
 
|?turbo frequency (1 core)
 
|?tdp
 
|charttitle=Frequency vs. TDP
 
|numbersaxislabel=TDP (W)
 
|labelaxislabel=Frequency (MHz)
 
|height=400
 
|width=90%
 
|theme=vector
 
|group=property
 
|grouplabel=subject
 
|charttype=scatter
 
|format=jqplotseries
 
|mainlabel=-
 
}}
 
</div>
 
 
{{clear}}
 
 
== Bibliography ==
 
* Intel DC Tech Day, May 2019
 
* Intel. ''personal communication''.
 
 
== Documents ==
 
* [[:File:cascade-lake-advanced-performance-press-deck.pdf|Cascade Lake Advanced Performance]]
 

Revision as of 18:42, 23 March 2020

codenameCascade Lake +
core count2 +, 4 +, 6 +, 8 +, 10 +, 12 +, 16 +, 18 +, 20 +, 22 +, 24 +, 26 +, 28 +, 32 +, 48 + and 56 +
designerIntel +
first launched2019 +
full page nameintel/microarchitectures/cascade lake +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameCascade Lake +
pipeline stages (max)19 +
pipeline stages (min)14 +
process14 nm (0.014 μm, 1.4e-5 mm) +