From WikiChip
Editing intel/microarchitectures/skylake (server)

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 26: Line 26:
 
|stages min=14
 
|stages min=14
 
|stages max=19
 
|stages max=19
|isa=x86-64
+
|isa=x86-16
 +
|isa 2=x86-32
 +
|isa 3=x86-64
 
|extension=MOVBE
 
|extension=MOVBE
 
|extension 2=MMX
 
|extension 2=MMX
Line 71: Line 73:
 
|l3 desc=11-way set associative
 
|l3 desc=11-way set associative
 
|core name=Skylake X
 
|core name=Skylake X
|core name 2=Skylake W
+
|core name 2=Skylake SP
|core name 3=Skylake SP
 
 
|predecessor=Broadwell
 
|predecessor=Broadwell
 
|predecessor link=intel/microarchitectures/broadwell
 
|predecessor link=intel/microarchitectures/broadwell
Line 94: Line 95:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Core !! Abbrev !! Platform !! Target
+
! Core !! Abbrev !! Target
 
|-
 
|-
| {{intel|Skylake SP|l=core}} || SKL-SP || {{intel|Purley|l=platform}} || Server Scalable Processors
+
| {{intel|Skylake X|l=core}} || SKL-X || High-end desktops & enthusiasts market
 
|-
 
|-
| {{intel|Skylake X|l=core}} || SKL-X || {{intel|Basin Falls|l=platform}} || High-end desktops & enthusiasts market
+
| {{intel|Skylake SP|l=core}} || SKL-SP || Server Scalable Processors
|-
 
| {{intel|Skylake W|l=core}} || SKL-W || {{intel|Basin Falls|l=platform}} || Enterprise/Business workstations
 
|-
 
| {{intel|Skylake DE|l=core}} || SKL-DE || || Dense server/edge computing
 
 
|}
 
|}
  
Line 123: Line 120:
 
|-
 
|-
 
! Cores !! {{intel|Hyper-Threading|HT}} !! {{intel|Turbo Boost|TBT}} !! {{x86|AVX-512}} !! AVX-512 Units !! {{intel|Ultra Path Interconnect|UPI}} links !! Scalability
 
! Cores !! {{intel|Hyper-Threading|HT}} !! {{intel|Turbo Boost|TBT}} !! {{x86|AVX-512}} !! AVX-512 Units !! {{intel|Ultra Path Interconnect|UPI}} links !! Scalability
|-
 
| [[File:xeon logo (2015).png|50px|link=intel/xeon d]] || {{intel|Xeon D}} || style="text-align: left;" | Dense servers / edge computing || [[4 cores|4]]-[[18 cores|18]] || {{tchk|yes}} || {{tchk|yes}} || {{tchk|yes}} || 1 || colspan="2" {{tchk|no}}
 
|-
 
| [[File:xeon logo (2015).png|50px|link=intel/xeon w]] || {{intel|Xeon W}} || style="text-align: left;" | Business workstations || [[4 cores|4]]-[[18 cores|18]] || {{tchk|yes}} || {{tchk|yes}} || {{tchk|yes}} || 2 || colspan="2" {{tchk|no}}
 
 
|-
 
|-
 
| [[File:xeon bronze (2017).png|50px]] || {{intel|Xeon Bronze}} || style="text-align: left;" | Entry-level performance / <br>Cost-sensitive || [[6 cores|6]] - [[8 cores|8]] || {{tchk|no}} || {{tchk|no}} || {{tchk|yes}} || 1 || 2 || Up to 2
 
| [[File:xeon bronze (2017).png|50px]] || {{intel|Xeon Bronze}} || style="text-align: left;" | Entry-level performance / <br>Cost-sensitive || [[6 cores|6]] - [[8 cores|8]] || {{tchk|no}} || {{tchk|no}} || {{tchk|yes}} || 1 || 2 || Up to 2
Line 143: Line 136:
  
 
== Process Technology ==
 
== Process Technology ==
{{main|14 nm lithography process}}
+
{{main|intel/microarchitectures/kaby lake#Process_Technology|l1=Kaby Lake § Process Technology}}
 
Unlike mainstream Skylake models, all Skylake server configuration models are fabricated on Intel's [[14 nm process#Intel|enhanced 14+ nm process]] which is used by {{\\|Kaby Lake}}.
 
Unlike mainstream Skylake models, all Skylake server configuration models are fabricated on Intel's [[14 nm process#Intel|enhanced 14+ nm process]] which is used by {{\\|Kaby Lake}}.
  
Line 161: Line 154:
 
|-  
 
|-  
 
| Linux || Linux || style="background-color: #d6ffd8;" | Kernel 3.19 || Initial Support (MPX support)
 
| Linux || Linux || style="background-color: #d6ffd8;" | Kernel 3.19 || Initial Support (MPX support)
|-
 
| Apple || macOS || style="background-color: #d6ffd8;" | 10.12.3 || iMac Pro
 
 
|}
 
|}
  
Line 183: Line 174:
 
! Core !! Extended<br>Family !! Family !! Extended<br>Model !! Model
 
! Core !! Extended<br>Family !! Family !! Extended<br>Model !! Model
 
|-
 
|-
| rowspan="2" | {{intel|Skylake X|X|l=core}}, {{intel|Skylake SP|SP|l=core}}, {{intel|Skylake DE|DE|l=core}}, {{intel|Skylake W|W|l=core}} || 0 || 0x6 || 0x5 || 0x5
+
| rowspan="2" | {{intel|Skylake X|X|l=core}} || 0 || 0x6 || 0x5 || 0xE
 +
|-
 +
| colspan="4" | Family 6 Model 94
 +
|-
 +
| rowspan="2" | {{intel|Skylake SP|SP|l=core}} || 0 || 0x6 || 0x5 || 0x5
 
|-
 
|-
 
| colspan="4" | Family 6 Model 85
 
| colspan="4" | Family 6 Model 85
Line 189: Line 184:
  
 
== Architecture ==
 
== Architecture ==
Skylake server configuration introduces a number of significant changes from both Intel's previous microarchitecture, {{\\|Broadwell}}, as well as the {{\\|Skylake (client)}} architecture. Unlike client models, Skylake servers and HEDT models will still incorporate the fully integrated voltage regulator (FIVR) on-die. Those chips also have an entirely new multi-core system architecture that brought a new {{intel|mesh interconnect}} network (from [[ring topology]]).
+
Skylake server configuration introduces a number of significant changes from both Intel's previous microarchitecture, {{\\|Broadwell}}, as well as the {{\\|Skylake (client)}} architecture. Unlike client models, Skylake servers and HEDT models will still incorporate the fully integrated voltage regulator (FIVR) on-die. Those chips also have an entirely new multi-core architecture along with a new [[mesh topology]] interconnect network (from [[ring topology]]).
  
 
=== Key changes from {{\\|Broadwell}} ===
 
=== Key changes from {{\\|Broadwell}} ===
Line 195: Line 190:
 
* Improved "14 nm+" process (see {{\\|kaby_lake#Process_Technology|Kaby Lake § Process Technology}})
 
* Improved "14 nm+" process (see {{\\|kaby_lake#Process_Technology|Kaby Lake § Process Technology}})
 
* {{intel|Omni-Path Architecture}} (OPA)
 
* {{intel|Omni-Path Architecture}} (OPA)
* {{intel|Mesh architecture}} (from {{intel|Ring architecture|ring}})
+
* Mesh architecture
 
** {{intel|Sub-NUMA Clustering}} (SNC) support (replaces the {{intel|Cluster-on-Die}} (COD) implementation)
 
** {{intel|Sub-NUMA Clustering}} (SNC) support (replaces the {{intel|Cluster-on-Die}} (COD) implementation)
 
* Chipset
 
* Chipset
Line 205: Line 200:
 
** DMI upgraded to Gen3
 
** DMI upgraded to Gen3
 
* Core
 
* Core
** All the changes from Skylake Client (For full list, see {{\\|Skylake (Client)#Key changes from Broadwell|Skylake (Client) § Key changes from Broadwell}})
 
 
** Front End
 
** Front End
 
*** LSD is disabled (Likely due to a bug; see [[#Front-end|§ Front-end]] for details)
 
*** LSD is disabled (Likely due to a bug; see [[#Front-end|§ Front-end]] for details)
 +
*** Larger legacy pipeline delivery (5 µOPs, up from 4)
 +
**** Another simple decoder has been added.
 +
*** Allocation Queue (IDQ)
 +
**** Larger delivery (6 µOPs, up from 4)
 +
**** 2.28x larger buffer (64/thread, up from 56)
 +
**** Partitioned for each active threads (from unified)
 +
*** Improved [[branch prediction unit]]
 +
**** reduced penalty for wrong direct jump target
 +
**** No specifics were disclosed
 +
*** µOP Cache
 +
**** instruction window is now 64 Bytes (from 32)
 +
**** 1.5x bandwidth (6 µOPs/cycle, up from 4)
 +
** Execution Engine
 +
*** Larger [[re-order buffer]] (224 entries, up from 192)
 +
*** Larger scheduler (97 entries, up from 64)
 +
**** Larger Integer Register File (180 entries, up from 168)
 
** Back-end
 
** Back-end
 
*** Port 4 now performs 512b stores (from 256b)
 
*** Port 4 now performs 512b stores (from 256b)
Line 218: Line 228:
 
*** Store is now 64B/cycle (from 32B/cycle)
 
*** Store is now 64B/cycle (from 32B/cycle)
 
*** Load is now 2x64B/cycle (from 2x32B/cycle)
 
*** Load is now 2x64B/cycle (from 2x32B/cycle)
*** New Features
 
**** Adaptive Double Device Data Correction (ADDDC)
 
  
 
* Memory
 
* Memory
 
** L2$
 
** L2$
*** Increased to 1 MiB/core (from 256 KiB/core)
+
*** Increased to 1 MiB/core (from 250 KiB/core)
*** Latency increased from 12 to 14
 
 
** L3$
 
** L3$
 +
*** Was made non-inclusive (from inclusive)
 
*** Reduced to 1.375 MiB/core (from 2.5 MiB/core)
 
*** Reduced to 1.375 MiB/core (from 2.5 MiB/core)
*** Now non-inclusive (was inclusive)
 
 
** DRAM
 
** DRAM
 
*** hex-channel DDR4-2666 (from quad-channel)
 
*** hex-channel DDR4-2666 (from quad-channel)
Line 241: Line 248:
  
 
==== CPU changes ====
 
==== CPU changes ====
See {{\\|Skylake (Client)#CPU changes|Skylake (Client) § CPU changes}}
+
* Most ALU operations have 4 op/cycle 1 for 8 and 32-bit registers. 64-bit ops are still limited to 3 op/cycle. (16-bit throughput varies per op, can be 4, 3.5 or 2 op/cycle).
 +
* MOVSX and MOVZX have 4 op/cycle throughput for 16->32 and 32->64 forms, in addition to Haswell's 8->32, 8->64 and 16->64 bit forms.
 +
* ADC and SBB have throughput of 1 op/cycle, same as Haswell.
 +
* Vector moves have throughput of 4 op/cycle (move elimination).
 +
* Not only zeroing vector vpXORxx and vpSUBxx ops, but also vPCMPxxx on the same register, have throughput of 4 op/cycle.
 +
* Vector ALU ops are often "standardized" to latency of 4. for example, vADDPS and vMULPS used to have L of 3 and 5, now both are 4.
 +
* Fused multiply-add ops have latency of 4 and throughput of 0.5 op/cycle.
 +
* Throughput of vADDps, vSUBps, vCMPps, vMAXps, their scalar and double analogs is increased to 2 op/cycle.
 +
* Throughput of vPSLxx and vPSRxx with immediate (i.e. fixed vector shifts) is increased to 2 op/cycle.
 +
* Throughput of vANDps, vANDNps, vORps, vXORps, their scalar and double analogs, vPADDx, vPSUBx is increased to 3 op/cycle.
 +
* vDIVPD, vSQRTPD have approximately twice as good throughput: from 8 to 4 and from 28 to 12 cycles/op.
 +
* Throughput of some MMX ALU ops (such as PAND mm1, mm2) is decreased to 2 or 1 op/cycle (users are expected to use wider SSE/AVX registers instead).
  
 
====New instructions ====
 
====New instructions ====
Line 247: Line 265:
 
Skylake server introduced a number of {{x86|extensions|new instructions}}:
 
Skylake server introduced a number of {{x86|extensions|new instructions}}:
  
* {{x86|MPX|<code>MPX</code>}} - Memory Protection Extensions
+
* {{x86|SGX1|<code>SGX1</code>}} - Software Guard Extensions, Version 1
 +
* {{x86|MPX|<code>MPX</code>}} -Memory Protection Extensions
 
* {{x86|XSAVEC|<code>XSAVEC</code>}} - Save processor extended states with compaction to memory
 
* {{x86|XSAVEC|<code>XSAVEC</code>}} - Save processor extended states with compaction to memory
 
* {{x86|XSAVES|<code>XSAVES</code>}} - Save processor supervisor-mode extended states to memory.
 
* {{x86|XSAVES|<code>XSAVES</code>}} - Save processor supervisor-mode extended states to memory.
Line 255: Line 274:
 
** {{x86|AVX512CD|<code>AVX512CD</code>}} - AVX-512 Conflict Detection
 
** {{x86|AVX512CD|<code>AVX512CD</code>}} - AVX-512 Conflict Detection
 
** {{x86|AVX512BW|<code>AVX512BW</code>}} - AVX-512 Byte and Word
 
** {{x86|AVX512BW|<code>AVX512BW</code>}} - AVX-512 Byte and Word
** {{x86|AVX512DQ|<code>AVX512DQ</code>}} - AVX-512 Doubleword and Quadword  
+
** {{x86|AVX512BW|<code>AVX512DQ</code>}} - AVX-512 Doubleword and Quadword  
** {{x86|AVX512VL|<code>AVX512VL</code>}} - AVX-512 Vector Length
+
** {{x86|AVX512BW|<code>AVX512VL</code>}} - AVX-512 Vector Length
 
* {{x86|PKU|<code>PKU</code>}} - Memory Protection Keys for Userspace
 
* {{x86|PKU|<code>PKU</code>}} - Memory Protection Keys for Userspace
 
* {{x86|PCOMMIT|<code>PCOMMIT</code>}} - PCOMMIT instruction
 
* {{x86|PCOMMIT|<code>PCOMMIT</code>}} - PCOMMIT instruction
* {{x86|CLWB|<code>CLWB</code>}} - Force cache line write-back without flush
+
* {{x86|CLWB|<code>CLWB</code>}} - CLWB instruction
  
 
=== Block Diagram ===
 
=== Block Diagram ===
 
==== Entire SoC Overview ====
 
==== Entire SoC Overview ====
===== LCC SoC =====
+
Note that the LCC die is identical without the two bottom rows. The XCC (28-core) die has one additional row and two additional columns of cores. Otherwise the die is identical.
:[[File:skylake sp lcc block diagram.svg|500px]]
+
 
===== HCC SoC =====
+
[[File:skylake sp hcc block diagram.svg|650px]]
:[[File:skylake sp hcc block diagram.svg|600px]]
+
 
===== XCC SoC =====
+
* '''CHA''' - Caching and Home Agent
:[[File:skylake sp xcc block diagram.svg|800px]]
+
* '''SF''' - Snooping Filter
 +
 
 
===== Individual Core =====
 
===== Individual Core =====
:[[File:skylake server block diagram.svg|850px]]
+
[[File:skylake server block diagram.svg|950px]]
  
 
=== Memory Hierarchy ===
 
=== Memory Hierarchy ===
Line 278: Line 298:
 
* Cache
 
* Cache
 
** L0 µOP cache:
 
** L0 µOP cache:
*** 1,536 µOPs/core, 8-way set associative
+
*** 1,536 µOPs, 8-way set associative
 
**** 32 sets, 6-µOP line size
 
**** 32 sets, 6-µOP line size
**** statically divided between threads, inclusive with L1I
+
**** statically divided between threads, per core, inclusive with L1I
 
** L1I Cache:
 
** L1I Cache:
*** 32 [[KiB]]/core, 8-way set associative
+
*** 32 [[KiB]], 8-way set associative
 
**** 64 sets, 64 B line size
 
**** 64 sets, 64 B line size
**** competitively shared by the threads/core
+
**** shared by the two threads, per core
 
** L1D Cache:
 
** L1D Cache:
*** 32 KiB/core, 8-way set associative
+
*** 32 KiB, 8-way set associative
 
*** 64 sets, 64 B line size
 
*** 64 sets, 64 B line size
*** competitively shared by threads/core
+
*** shared by the two threads, per core
 
*** 4 cycles for fastest load-to-use (simple pointer accesses)
 
*** 4 cycles for fastest load-to-use (simple pointer accesses)
 
**** 5 cycles for complex addresses
 
**** 5 cycles for complex addresses
Line 295: Line 315:
 
*** Write-back policy
 
*** Write-back policy
 
** L2 Cache:
 
** L2 Cache:
*** 1 MiB/core, 16-way set associative
+
*** Unified, 1 MiB, 16-way set associative
 
*** 64 B line size
 
*** 64 B line size
*** Inclusive
+
*** Non-inclusive
 
*** 64 B/cycle bandwidth to L1$
 
*** 64 B/cycle bandwidth to L1$
 
*** Write-back policy
 
*** Write-back policy
 
*** 14 cycles latency
 
*** 14 cycles latency
 
** L3 Cache:
 
** L3 Cache:
*** 1.375 MiB/core, 11-way set associative, shared across all cores
+
*** 1.375 MiB/s, shared across all cores
**** Note that a few models have non-default cache sizes due to disabled cores
+
**** Note that some models have non-default cache sizes which are larger due to some disabled cores
*** 2,048 sets, 64 B line size
+
*** 64 B line size
*** Non-inclusive victim cache
+
*** 11-way set associative
 +
*** Non-Inclusive
 
*** Write-back policy
 
*** Write-back policy
 
*** 50-70 cycles latency
 
*** 50-70 cycles latency
** Snoop Filter (SF):
 
*** 2,048 sets, 12-way set associative
 
* DRAM
 
** 6 channels of DDR4, up to 2666 MT/s
 
*** RDIMM and LRDIMM
 
*** bandwidth of 21.33 GB/s
 
*** aggregated bandwidth of 128 GB/s
 
  
 
Skylake TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally there is a unified L2 TLB (STLB).
 
Skylake TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally there is a unified L2 TLB (STLB).
Line 333: Line 347:
 
**** fixed partition
 
**** fixed partition
 
*** 1G page translations:
 
*** 1G page translations:
**** 4 entries; 4-way set associative
+
**** 4 entries; fully associative
 
**** fixed partition
 
**** fixed partition
 
** STLB
 
** STLB
 
*** 4 KiB + 2 MiB page translations:
 
*** 4 KiB + 2 MiB page translations:
**** 1536 entries; 12-way set associative. (Note: STLB is incorrectly reported as "6-way" by CPUID leaf 2 (EAX=02H). Skylake erratum SKL148 recommends software to simply ignore that value.)
+
**** 1536 entries; 12-way set associative
 
**** fixed partition
 
**** fixed partition
 
*** 1 GiB page translations:
 
*** 1 GiB page translations:
Line 344: Line 358:
  
 
== Overview ==
 
== Overview ==
[[File:skylake server overview.svg|right|550px]]
+
[[File:skylake sp (superset features).png|right|300px]]
The Skylake server architecture marks a significant departure from the previous decade of multi-core system architecture at Intel. Since {{\\|Westmere (server)|Westmere}} Intel has been using a {{intel|ring bus interconnect}} to interlink multiple cores together. As Intel continued to add more I/O, increase the memory bandwidth, and added more cores which increased the data traffic flow, that architecture started to show its weakness. With the introduction of the Skylake server architecture, the interconnect was entirely re-architected to a 2-dimensional {{intel|mesh interconnect}}.
+
Skylake-based servers have been entirely re-architected to meet the need for increased scalabiltiy and performance all while meeting power requirements. A superset model is shown on the right. Skylake-based servers are the first mainstream servers to make use of Intel's new mesh interconnect architecture, an architecture that was previously explored, experimented with, and enhanced with Intel's {{intel|Phi}} [[many-core processors]]. Those processors are offered from [[4 cores]] up to [[28 cores]] with 8 to 56 threads. With Skylake, Intel now has a separate core architecture for those chips which incorporate a plethora of new technologies and features including support for the new {{x86|AVX-512}} instruction set extension.
 
 
A superset model is shown on the right. Skylake-based servers are the first mainstream servers to make use of Intel's new {{intel|mesh interconnect}} architecture, an architecture that was previously explored, experimented with, and enhanced with Intel's {{intel|Phi}} [[many-core processors]]. In this configuration, the cores, caches, and the memory controllers are organized in rows and columns - each with dedicated connections going through each of the rows and columns allowing for a shortest path between any tile, reducing latency, and improving the bandwidth. Those processors are offered from [[4 cores]] up to [[28 cores]] with 8 to 56 threads. In addition to the system-level architectural changes, with Skylake, Intel now has a separate core architecture for those chips which incorporate a plethora of new technologies and features including support for the new {{x86|AVX-512}} instruction set extension.
 
  
All models incorporate 6 channels of DDR4 supporting up to 12 DIMMS for a total of 768 GiB (with extended models support 1.5 TiB). For I/O all models incorporate 48x (3x16) lanes of PCIe 3.0. There is an additional x4 lanes PCIe 3.0 reserved exclusively for DMI for the the {{intel|Lewisburg|l=chipset}} (LBG) chipset. For a selected number of models, specifically those with ''F'' suffix, they have an {{intel|Omni-Path}} Host Fabric Interface (HFI) on-package (see [[#Integrated_Omni-Path|Integrated Omni-Path]]).
+
All models incorporate 6 channels of DDR4 supporting up to 12 DIMMS for a total of 768 GiB (with extended models support 1.5 TiB). For I/O all models incorporate 48x (3x16) lanes of PCIe 3.0. There is an additional x4 lanes PCIe 3.0 reserved exclusively for DMI for the the {{intel|Lewisburg|l=chipset}} chipset. For a selected number of models (specifically those with ''F'' suffix) have an {{intel|Omni-Path}} Host Fabric Interface (HFI) on-package (see [[#Integrated_Omni-Path|Integrated Omni-Path]]).
  
Skylake processors are designed for scalability, supporting 2-way, 4-way, and 8-way multiprocessing through Intel's new {{intel|Ultra Path Interconnect}} (UPI) interconnect links, with two to three links being offered (see [[#Scalability|§ Scalability]]). High-end models have node controller support allowing for even higher way configuration (e.g., 32-way multiprocessing).
+
Skylake processors are designed for scalability, supporting 2-way, 4-way, and 8-way multiprocessing through Intel's new {{intel|Ultra Path Interconnect}} (UPI) interconnect links, with two to three links being offered (see [[#Scalability|§ Scalability]]). High-end models have node controller support allowing higher way (e.g., 32-way multiprocessing).
  
 
== Core ==
 
== Core ==
Line 364: Line 376:
 
Intel has been experiencing a growing divergence in functionality over the last number of iterations of [[intel/microarchitectures|their microarchitecture]] between their mainstream consumer products and their high-end HPC/server models. Traditionally, Intel has been using the same exact core design for everything from their lowest end value models (e.g. {{intel|Celeron}}) all the way up to the highest-performance enterprise models (e.g. {{intel|Xeon E7}}). While the two have fundamentally different chip architectures, they use the same exact CPU core architecture as the building block.  
 
Intel has been experiencing a growing divergence in functionality over the last number of iterations of [[intel/microarchitectures|their microarchitecture]] between their mainstream consumer products and their high-end HPC/server models. Traditionally, Intel has been using the same exact core design for everything from their lowest end value models (e.g. {{intel|Celeron}}) all the way up to the highest-performance enterprise models (e.g. {{intel|Xeon E7}}). While the two have fundamentally different chip architectures, they use the same exact CPU core architecture as the building block.  
  
This design philosophy has changed with Skylake. In order to better accommodate the different functionalities of each segment without sacrificing features or making unnecessary compromises, Intel went with a configurable core. The Skylake core is a single development project, making up a master superset core. The project results in two derivatives: one for servers (the substance of this article) and {{\\|skylake (client)|one for clients}}. All mainstream models (from {{intel|Celeron}}/{{intel|Pentium (2009)|Pentium}} all the way up to {{intel|Core i7}}/{{intel|Xeon E3}}) use {{\\|skylake (client)|the client core configuration}}. Server models (e.g. {{intel|Xeon Gold}}/{{intel|Xeon Platinum}}) are using the new server configuration instead.
+
This design philosophy has changed with Skylake. In order to better accommodate the different functionalities of each segment without sacrificing features or making unnecessary compromises Intel went with a configurable core. The Skylake core is a single development project, making up a master superset core. The project result in two derivatives: one for servers (the substance of this article) and {{\\|skylake (client)|one for clients}}. All mainstream models (from {{intel|Celeron}}/{{intel|Pentium (2009)|Pentium}} all the way up to {{intel|Core i7}}/{{intel|Xeon E3}}) use {{\\|skylake (client)|the client core configuration}}. Server models (e.g. {{intel|Xeon Gold}}/{{intel|Xeon Platinum}}) are using the new server configuration instead.
 
 
The server core is considerably larger than the client one, featuring [[Advanced Vector Extensions 512]] (AVX-512). Skylake servers support what was formerly called AVX3.2 (AVX512F + AVX512CD + AVX512BW + AVX512DQ + AVX512VL). The server core also incorporates a number of new technologies not found in the client configuration. In addition to the execution units that were added, the cache hierarchy has changed for the server core as well, incorporating a large L2 and a portion of the LLC as well as the caching and home agent and the snoop filter that needs to accommodate the new cache changes.
 
 
 
Below is a visual that helps show how the server core was evolved from the client core.
 
  
:[[File:skylake sp mesh core tile zoom with client shown.png|1000px]]
+
The server core is considerably larger than the client one, featuring [[Advanced Vector Extensions 512]] (AVX-512). Skylake servers support what was formerly called AVX3.2 (AVX512F + AVX512CD + AVX512BW + AVX512DQ + AVX512VL). Additionally, those processors Memory Protection Keys for Userspace (PKU), {{x86|PCOMMIT}}, and {{x86|CLWB}}.
  
 
=== Pipeline ===
 
=== Pipeline ===
Line 388: Line 396:
  
 
[[File:skylake sp added cach and vpu.png|left|300px]]
 
[[File:skylake sp added cach and vpu.png|left|300px]]
This is the first implementation to incorporate {{x86|AVX-512}}, a 512-bit [[SIMD]] [[x86]] instruction set extension. AVX-512 operations can take place on every port. For 512-bit wide FMA SIMD operations, Intel introduced two different mechanisms ways:
+
This is the first implementation to incorporate {{x86|AVX-512}}, a 512-bit [[SIMD]] [[x86]] instruction set extension. Intel introduced AVX-512 in two different ways:
  
In the simple implementation, the variants used in the {{intel|Xeon Bronze|entry-level}} and {{intel|Xeon Silver|mid-range}} Xeon servers, AVX-512 fuses Port 0 and Port 1 to form a 512-bit FMA unit. Since those two ports are 256-wide, an AVX-512 option that is dispatched by the scheduler to port 0 will execute on both ports. Note that unrelated operations can still execute in parallel. For example, an AVX-512 operation and an Int ALU operation may execute in parallel - the AVX-512 is dispatched on port 0 and use the AVX unit on port 1 as well and the Int ALU operation will execute independently in parallel on port 1.
+
In the simple implementation, the variants used in the {{intel|Xeon Bronze|entry-level}} and {{intel|Xeon Silver|mid-range}} Xeon servers, AVX-512 fuses Port 0 and Port 1 to form a 512-bit unit. Since those two ports are 256-wide, an AVX-512 option that is dispatched by the scheduler to port 0 will execute on both ports. Note that unrelated operations can still execute in parallel. For example, an AVX-512 operation and an Int ALU operation may execute in parallel - the AVX-512 is dispatched on port 0 and use the AVX unit on port 1 as well and the Int ALU operation will execute independently in parallel on port 1.
  
In the {{intel|Xeon Gold|high-end}} and {{intel|Xeon Platinum|highest}} performance Xeons, Intel added a second dedicated 512-bit wide AVX-512 FMA unit in addition to the fused Port0-1 operations described above. The dedicated unit is situated on Port 5.
+
In the {{intel|Xeon Gold|high-end}} and {{intel|Xeon Platinum|highest}} performance Xeons, Intel added a second dedicated AVX-512 unit in addition to the fused Port0-1 operations described above. The dedicated unit is situated on Port 5.
  
 
Physically, Intel added 768 KiB L2 cache and the second AVX-512 VPU externally to the core.  
 
Physically, Intel added 768 KiB L2 cache and the second AVX-512 VPU externally to the core.  
Line 465: Line 473:
  
 
== New Technologies ==
 
== New Technologies ==
 +
=== Software Guard Extension (SGX) ===
 +
{{main|x86/sgx|l1=Intel's Software Guard Extension}}
 +
'''Software Guard Extension''' ('''SGX''') is a new inter-software guard [[x86]] {{x86|extension}} that allows software in user-level mode to create isolated secure environments called "enclaves" for storing data or code. Data and code stored in enclaves are protected from external processes including code executing with higher privileges including the [[operating system]] or a [[hypervisor]] (including all forms of debugging).
  
 
=== Memory Protection Extension (MPX) ===
 
=== Memory Protection Extension (MPX) ===
Line 477: Line 488:
  
 
=== Mode-Based Execute (MBE) Control ===
 
=== Mode-Based Execute (MBE) Control ===
'''Mode-Based Execute''' ('''MBE''') is an enhancement to the Extended Page Tables (EPT) that provides finer level of control of execute permissions. With MBE the previous Execute Enable (''X'') bit is turned into Execute Userspace page (XU) and Execute Supervisor page (XS). The processor selects the mode based on the guest page permission. With proper software support, hypervisors can take advantage of this as well to ensure integrity of kernel-level code.
+
'''Mode-Based Execute''' ('''MBE''') is an enhancement to the Extended Page Tables (EPT) that provides finer level of control of execute permissions. With MBE the previous Execute Enable (''X'') bit is turned into Excuse Userspace page (XU) and Execute Supervisor page (XS). The processor selects the mode based on the guest page permission. With proper software support, hypervisors can take advantage of this as well to ensure integrity of kernel-level code.
  
 
== Mesh Architecture ==
 
== Mesh Architecture ==
{{main|intel/mesh interconnect architecture|l1=Intel's Mesh Interconnect Architecture}}
 
 
[[File:skylake sp xcc die config.png|right|400px]]
 
[[File:skylake sp xcc die config.png|right|400px]]
 
On the {{intel|microarchitectures|previous number of generations}}, Intel has been adding cores onto the die and connecting them via a {{intel|ring architecture}}. This was sufficient until recently. With each generation, the added cores increased the access latency while lowering the available bandwidth per core. Intel mitigated this problem by splitting up the die into two halves each on its own ring. This reduced hopping distance and added additional bandwidth but it did not solve the growing fundamental inefficiencies of the ring architecture.
 
On the {{intel|microarchitectures|previous number of generations}}, Intel has been adding cores onto the die and connecting them via a {{intel|ring architecture}}. This was sufficient until recently. With each generation, the added cores increased the access latency while lowering the available bandwidth per core. Intel mitigated this problem by splitting up the die into two halves each on its own ring. This reduced hopping distance and added additional bandwidth but it did not solve the growing fundamental inefficiencies of the ring architecture.
  
This was completely addressed with the new {{intel|mesh architecture}} that is implemented in the Skylake server processors. The mesh consists of a 2-dimensional array of half rings going in the vertical and horizontal directions which allow communication to take the shortest path to the correct node. The new mesh architecture implements a modular design for the routing resources in order to remove the various bottlenecks. That is, the mesh architecture now integrates the caching agent, the home agent, and the IO subsystem on the mesh interconnect distributed across all the cores. Each core now has its own associated LLC slice as well as the snooping filter and the Caching and Home Agent (CHA). Additional nodes such as the two memory controllers, the {{intel|Ultra Path Interconnect}} (UPI) nodes and PCIe are not independent node on the mesh as well and they now behave identically to any other node/core in the network. This means that in addition to the performance increase expected from core-to-core and core-to-memory latency, there should be a substantial increase in I/O performance. The CHA which is found on each of the LLC slices now maps addresses being accessed to the specific LLC bank, memory controller, or I/O subsystem. This provides the necessary information required for the routing to take place.
+
This was completely addressed with the new mesh architecture that is implemented in the Skylake server processors. The mesh is arranged as a matrix of vertical and horizontal communication paths which allow communication to take the shortest path to the correct node. The new mesh architecture implements a modular design for the routing resources in order to remove the various bottlenecks. That is, the mesh architecture now integrates the caching agent, the home agent, and the IO subsystem on the mesh interconnect distributed across all the cores. Each core now has its own associated LLC slice as well as the snooping filter and the Caching and Home Agent (CHA). Additional nodes such as the two memory controllers, the {{intel|Ultra Path Interconnect}} (UPI) nodes and PCIe are not independent node on the mesh as well and they now behave identically to any other node/core in the network. This means that in addition to the performance increase expected from core-to-core and core-to-memory latency, there should be substantial increase in I/O performance. The CHA which is found on each of the LLC slices now maps addresses being accessed to the specific LLC bank, memory controller, or I/O subsystem. This provides the necessary information required for the routing to take place.
 
 
=== Organization ===
 
[[File:skylake (server) half rings.png|right|400px]]
 
Each die has a grid of converged mesh stops (CMS). For example, for the XCC die, there are 36 CMSs. As the name implies, the CMS is a block that effectively interfaces between all the various subsystems and the mesh interconnect. The locations of the CMSes for the large core count is shown on the diagram below. It should be pointed that although the CMS appears to be inside the core tiles, most of the mesh is likely routed above the cores in a similar fashion to how Intel has done it with the ring interconnect which was wired above the caches in order reduce the die area.
 
 
 
 
 
:[[File:skylake server cms units.svg|450px]]
 
 
 
 
 
Each core tile interfaces with the mesh via its associated converged mesh stop (CMS). The CMSs at the very top are for the UPI links and PCIe links to interface with the mesh we annotated on the previous page. Additionally, the two integrated memory controllers have their own CMS they use to interface with the mesh as well.
 
 
 
Every stop at each tile is directly connected to its immediate four neighbors – north, south, east, and west.
 
 
 
 
 
::[[File:skylake sp cms links.svg|300px]]
 
 
 
 
 
Every vertical column of CMSs form a bi-directional half ring. Similarly, every horizontal row forms a bi-directional half ring.
 
 
 
 
 
::[[File:skylake sp mesh half rings.png|1000px]]
 
 
 
 
 
{{clear}}
 
  
 
=== Cache Coherency ===
 
=== Cache Coherency ===
Line 514: Line 500:
  
 
[[File:snc clusters.png|right|350px]]
 
[[File:snc clusters.png|right|350px]]
It should be pointed out that the directory-base coherency optimizations that were done in previous generations have been furthered improved with Skylake - particularly OSB, {{intel|HitME}} cache, IO directory cache. Skylake maintained support for {{intel|Opportunistic Snoop Broadcast}} (OSB) which allows the network to opportunistically make use of the UPI links when idle or lightly loaded thereby avoiding an expensive memory directory lookup. With the mesh network and distributed CHAs, HitME is now distributed and scales with the CHAs, enhancing the speeding up of cache-to-cache transfers (Those are your migratory cache lines that frequently get transferred between nodes). Specifically for I/O operations, the I/O directory cache (IODC), which was introduced with {{intel|Haswell|l=arch}}, improves stream throughput by eliminating directory reads for InvItoE from snoop caching agent. Previously this was implemented as a 64-entry directory cache to complement the directory in memory. In Skylake, with a distributed CHA at each node, the IODC is implemented as an eight-entry directory cache per CHA.
+
It should be pointed out that the directory-base coherency optimizations that were done in previous generations have been furthered improved with Skylake - particularly OSB, {{intel|HitME}} cache, IO directory cache. Skylake maintained support for {{intel|Opportunistic Snoop Broadcast}} (OSB) which allows the network to opportunistically make use of the UPI links when idle or lightly loaded thereby avoiding an expensive memory directory lookup. With the mesh network and distributed CHAs, HitME is now distributed and scales with the CHAs, enhancing the speeding up of cache-to-cache transfers (Those are your migratory cache lines that frequently get transferred between nodes). Specifically for I/O operations, the I/O directory cache (IODC), which was introduced with {{intel|Haswell|l=arch}}, improves stream throughput by eliminating directory reads for InvItoE from snoopy caching agent. Previously this was implemented as a 64-entry directory cache to complement the directory in memory. In Skylake, with a distributed CHA at each node, the IODC is implemented as an eight-entry directory cache per CHA.
  
 
==== Sub-NUMA Clustering ====
 
==== Sub-NUMA Clustering ====
In previous generations Intel had a feature called {{intel|cluster-on-die}} (COD) which was introduced with {{intel|Haswell|l=arch}}. With Skylake, there's a similar feature called {{intel|sub-NUMA cluster}} (SNC). With a memory controller physically located on each side of the die, SNC allows for the creation of two localized domains with each memory controller belonging to each domain. The processor can then map the addresses from the controller to the distributed home agents and LLC in its domain. This allows executing code to experience lower LLC and memory latency within its domain compared to accesses outside of the domain.
+
In previous generations Intel had a feature called {{intel|cluster-on-die}} (COD) which was introduced with {{intel|Haswell|l=arch}}. With Skylake, there's a similar feature called {{intel|sub-NUMA cluster}} (SNC). With a memory controller physically located on each side of the die, SNC allows for the creation of two localized domains with each memory controller belonging to each domain. The processor can then map the addresses from the controller to the distributed home ages and LLC in its domain. This allows executing code to experience lower LLC and memory latency within its domain compared to accesses outside of the domain.
  
It should be pointed out that in contrast to COD, SNC has a unique location for every address in the LLC and is never duplicated across LLC banks (previously, COD cache lines could have copies). Additionally, on multiprocessor systems, addresses mapped to memory on remote sockets are still uniformly distributed across all LLC banks irrespective of the localized SNC domain.
+
It should be pointed out that in contrast to COD, SNC has a unique location for every adddress in the LCC and is never duplicated across LLC banks (previously, COD cache lines could have copies). Additionally, on multiprocessor system, address mapped to memory on remote sockets are still uniformally distributed across all LLC banks irrespective of the localized SNC domain.
  
 
== Scalability ==
 
== Scalability ==
 
{{see also|intel/quickpath interconnect|intel/ultra path interconnect|l1=QuickPath Interconnect|l2=Ultra Path Interconnect}}
 
{{see also|intel/quickpath interconnect|intel/ultra path interconnect|l1=QuickPath Interconnect|l2=Ultra Path Interconnect}}
In the last couple of generations, Intel has been utilizing {{intel|QuickPath Interconnect}} (QPI) which served as a high-speed point-to-point interconnect. QPI has been replaced by the {{intel|Ultra Path Interconnect}} (UPI) which is higher-efficiency coherent interconnect for scalable systems, allowing multiple processors to share a single shared address space. Depending on the exact model, each processor can have either two or three UPI links connecting to the other processors.
+
In the last couple of generations, Intel has been utilizing {{intel|QuickPath Interconnect}} (QPI) which served as a high-speed point-to-point interconnect. QPI has been replaced by the {{intel|Ultra Path Interconnect}} (UPI) which is higher-efficiency coherent interconnect for scalable systems, allowing multiple processors to share a single shared address space. Depending on the exact model, each processor can have either either two or three UPI links connecting to the other processors.
  
UPI links eliminate some of the scalability limitations that surfaced in QPI over the past few microarchitecture iterations. They use directory-based home snoop coherency protocol and operate at up either 10.4 GT/s or 9.6 GT/s. This is quite a bit different from previous generations. In addition to the various improvements done to the protocol layer, {{intel|Skylake SP|l=core}} now implements a distributed CHA that is situated along with the LLC bank on each core. It's in charge of tracking the various requests from the core as well as responding to snoop requests from both local and remote agents. The ease of distributing the home agent is a result of Intel getting rid of the requirement on preallocation of resources at the home agent. This also means that future architectures should be able to scale up well.
+
UPI links eliminate some of the scalability limitations that surfaced in QPI over the past few microarchitecture iterations. They use directory-based home snoop coherency protocol and operate at up either 10.4 GT/s or 9.6 GT/s. This is quite a bit different form previous generations. In addition to the various improvements done to the protocol layer, {{intel|Skylake SP|l=core}} now implements a distributed CHA that is situated along with the LLC bank on each core. It's in charge of tracking the various requests form the core as well as responding to snoop requests from both local and remote agents. The ease of distributing the home agent is a result of Intel getting rid of the requirement on preallocation of resources at the home agent. This also means that future architectures should be able to scale up well.
  
 
Depending on the exact model, Skylake processors can scale from 2-way all the way up to 8-way multiprocessing. Note that the high-end models that support 8-way multiprocessing also only come with three UPI links for this purpose while the lower end processors can have either two or three UPI links. Below are the typical configurations for those processors.
 
Depending on the exact model, Skylake processors can scale from 2-way all the way up to 8-way multiprocessing. Note that the high-end models that support 8-way multiprocessing also only come with three UPI links for this purpose while the lower end processors can have either two or three UPI links. Below are the typical configurations for those processors.
Line 586: Line 572:
 
| HCC
 
| HCC
 
|}
 
|}
 
== Floorplan ==
 
[[File:skylake sp major blocks.svg|right|400px]]
 
All Skylake server dies consist of three major blocks:
 
 
* DDR PHYs
 
* North Cap
 
* Mesh Tiles
 
 
Those blocks are found on all die configuration and form the base for Intel's highly configurable floorplan. Depending on the market segment and model specification targets, Intel can add and remove rows of tiles.
 
 
<div style="text-align: center;">
 
<div style="float: left;">'''XCC Die'''<br>[[File:skylake (server) die major blocks (xcc).png|250px]]</div>
 
<div style="float: left; margin-left: 30px;">'''HCC Die'''<br>[[File:skylake (server) die major blocks (hcc).png|175px]]</div>
 
</div>
 
 
{{clear}}
 
=== Physical Layout ===
 
==== North Cap ====
 
The '''North Cap''' at the very top of the die contains all the I/O agents and PHYs as well as serial IP ports, and the fuse unit. For the most part this configuration largely the same for all the dies. For the smaller dies, the extras are removed (e.g., the in-package PCIe link is not needed).
 
 
At the very top of the North Cap are the various I/O connectivity. There are a total of 128 high-speed I/O lanes – 3×16 (48) PCIe lanes operating at 8 GT/s, x4 DMI lanes for hooking up the Lewisburg chipset, 16 on-package PCIe lanes (operating at 2.5/5/8 GT/s), and 3×20 (60) {{intel|Ultra-Path Interconnect}} (UPI) lanes operating at 10.4 GT/s for the [[multiprocessing]] support.
 
 
At the south-west corner of the North Cap is the clock generator unit (CGU) and the Global Power Management Unit (Global PMU). The CGU contains an all-digital (AD) filter phase-locked loops (PLL) and an all-digital uncore PLL. The filter ADPLL is dedicated to the generation of all on-die reference clock used for all the core PLLs and one uncore PLL. The power management unit also has its own dedicated all-digital PLL.
 
 
At the bottom part of the North Cap are the {{intel|mesh interconnect architecture#Overview|Mesh stops}} for the various I/O to interface with the Mesh.
 
 
==== DDR PHYs ====
 
There are the two DDR4 PHYs which are identical for all the dies (albeit in the low-end models, the extra channel is simply disabled). There are two independent and identical physical sections of 3 DDR4 channels each which reside on the east and west edges of the die. Each channel is 72-bit (64 bit and an 8-bit ECC), supporting 2-DIMM per channel with a data rate of up to 2666 MT/s for a bandwidth of 21.33 GB/s and an aggregated bandwidth of 128 GB/s. RDIMM and LRDIMM are supported.
 
 
The location of the PHYs was carefully chosen in order to ease the package design, specifically, they were chosen in order to maintain escape routing and pin-out order matching between the CPU and the DIMM slots to shorten package and PCB routing length in order to improve signal integrity.
 
 
==== Layout ====
 
:[[File:skylake (server) die area layout.svg|600px]]
 
 
==== Evolution ====
 
The original Skylake large die started out as a 5 by 5 core tile (25 tiles, 25 cores) as shown by the image from Intel on the left side. The memory controllers were next to the PHYs on the east and west side. An additional row was inserted to get to a 5 by 6 grid. Two core tiles one from each of the sides was then replaced by the new memory controller module which can interface with the mesh just like any other core tile. The final die is shown in the image below as well on the right side.
 
 
:[[File:skylaake server layout evoluation.png|800px]]
 
  
 
== Die ==
 
== Die ==
 
{{see also|intel/microarchitectures/skylake_(client)#Die|l1=Client Skylake's Die}}
 
{{see also|intel/microarchitectures/skylake_(client)#Die|l1=Client Skylake's Die}}
 
[[File:intel xeon skylake sp.jpg|right|300px|thumb|Skylake SP chips and wafer.]]
 
[[File:intel xeon skylake sp.jpg|right|300px|thumb|Skylake SP chips and wafer.]]
Skylake Server class models and high-end desktop (HEDT) consist of 3 different dies:
+
Skylake Server class models and high-end desktop (HEDT) consist of 3 different dies: Low Core Count (LCC), High Core Count (HCC), and Extreme Core Count (XCC).
 
 
* 12 tiles (3x4), 10-core, Low Core Count (LCC)
 
* 20 tiles (5x4), 18-core, High Core Count (HCC)
 
* 30 tiles (5x6), 28-core, Extreme Core Count (XCC)
 
 
 
=== North Cap ===
 
'''HCC:'''
 
 
 
:[[File:skylake (server) northcap (hcc).png|700px]]
 
 
 
:[[File:skylake (server) northcap (hcc) (annotated).png|700px]]
 
 
 
'''XCC:'''
 
 
 
:[[File:skylake (server) northcap (xcc).png|900px]]
 
 
 
:[[File:skylake (server) northcap (xcc) (annotated).png|900px]]
 
 
 
 
 
=== Memory PHYs ===
 
Data bytes are located on the north and south sub-sections of the channel layout. Command, Control, Clock signals, and process, supply voltage, and temperature (PVT) compensation circuitry are located in the middle section of the channels.
 
 
 
:[[File:skylake sp memory phys (annotated).png|700px]]
 
 
 
=== Core Tile ===
 
* ~4.8375 x 3.7163
 
* ~ 17.978 mm² die area
 
 
 
:[[File:skylake sp core.png|500px]]
 
 
 
:[[File:skylake sp mesh core tile zoom.png|700px]]
 
  
 
=== Low Core Count (LCC) ===
 
=== Low Core Count (LCC) ===
 
* [[14 nm process]]
 
* [[14 nm process]]
* 12 metal layers
+
* ? metal layers
 
* ~22.26 mm x ~14.62 mm
 
* ~22.26 mm x ~14.62 mm
 
* ~325.44 mm² die size
 
* ~325.44 mm² die size
 
* [[10 cores]]
 
* [[10 cores]]
* 12 tiles (3x4)
 
 
 
: (NOT official die shot, artist's rendering based on the larger die)
 
: [[File:skylake lcc die shot.jpg|650px]]
 
  
 
=== High Core Count (HCC) ===
 
=== High Core Count (HCC) ===
Line 678: Line 589:
  
 
* [[14 nm process]]
 
* [[14 nm process]]
* 13 metal layers
+
* ? metal layers
* ~485 mm² die size (estimated)
+
* ? mm² die size
 
* [[18 cores]]
 
* [[18 cores]]
* 20 tiles (5x4)
+
 
  
 
: [[File:skylake (octadeca core).png|650px]]
 
: [[File:skylake (octadeca core).png|650px]]
 +
 +
{{future information}}
  
  
Line 690: Line 603:
 
=== Extreme Core Count (XCC) ===
 
=== Extreme Core Count (XCC) ===
 
* [[14 nm process]]
 
* [[14 nm process]]
* 13 metal layers
+
* ? metal layers
* ~694 mm² die size (estimated)
+
* ? mm² die size
 
* [[28 cores]]
 
* [[28 cores]]
* 30 tiles (5x6)
+
 
 +
: [[File:skylake-sp hcc die shot.png|650px]]
  
  
: [[File:skylake-sp hcc die shot.png|class=wikichip_ogimage|650px]]
+
{{future information}}
  
  
Line 715: Line 629:
 
{{comp table header 1|cols=Launched, Price, Family, Core Name, Cores, Threads, %L2$, %L3$, TDP, %Frequency, %Max Turbo, Max Mem, Turbo, SMT}}
 
{{comp table header 1|cols=Launched, Price, Family, Core Name, Cores, Threads, %L2$, %L3$, TDP, %Frequency, %Max Turbo, Max Mem, Turbo, SMT}}
 
<tr class="comptable-header comptable-header-sep"><th>&nbsp;</th><th colspan="25">[[Uniprocessors]]</th></tr>
 
<tr class="comptable-header comptable-header-sep"><th>&nbsp;</th><th colspan="25">[[Uniprocessors]]</th></tr>
{{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Skylake (server)]] [[max cpu count::1]]
+
{{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Skylake (server)]] [[max cpu count::1]] [[core name::Skylake X]]
 
  |?full page name
 
  |?full page name
 
  |?model number
 
  |?model number
Line 831: Line 745:
 
== References ==
 
== References ==
 
* Intel Unveils Powerful Intel Xeon Scalable Processors, Live Event, July 11, 2017
 
* Intel Unveils Powerful Intel Xeon Scalable Processors, Live Event, July 11, 2017
* [[:File:intel xeon scalable processor architecture deep dive.pdf|Intel Xeon Scalable Process Architecture Deep Dive]], Akhilesh Kumar & Malay Trivedi, Skylake-SP CPU & Lewisburg PCH Architects, June 12th, 2017.
 
* IEEE Hot Chips (HC28) 2017.
 
* IEEE ISSCC 2018
 
  
 
== Documents ==
 
== Documents ==
Line 839: Line 750:
 
* [[:File:intel-xeon-scalable-processors-product-brief.pdf|Intel Xeon (Skylake SP) Processors Product Brief]]
 
* [[:File:intel-xeon-scalable-processors-product-brief.pdf|Intel Xeon (Skylake SP) Processors Product Brief]]
 
* [[:File:intel-xeon-scalable-processors-overview.pdf|Intel Xeon (Skylake SP) Processors Product Overview]]
 
* [[:File:intel-xeon-scalable-processors-overview.pdf|Intel Xeon (Skylake SP) Processors Product Overview]]
* [[:File:intel-skylake-w-overview.pdf|Xeon (Skylake W) Workstations Overview]]
 
* [[:File:optimal hpc solutions for scalable xeons.pdf|Optimal HPC solutions with Intel Scalable Xeons]]
 

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameSkylake (server) +
core count4 +, 6 +, 8 +, 10 +, 12 +, 14 +, 16 +, 18 +, 20 +, 22 +, 24 +, 26 + and 28 +
designerIntel +
first launchedMay 4, 2017 +
full page nameintel/microarchitectures/skylake (server) +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameSkylake (server) +
pipeline stages (max)19 +
pipeline stages (min)14 +
process14 nm (0.014 μm, 1.4e-5 mm) +