From WikiChip
Editing intel/microarchitectures/skylake (server)
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 205: | Line 205: | ||
** DMI upgraded to Gen3 | ** DMI upgraded to Gen3 | ||
* Core | * Core | ||
− | |||
** Front End | ** Front End | ||
*** LSD is disabled (Likely due to a bug; see [[#Front-end|§ Front-end]] for details) | *** LSD is disabled (Likely due to a bug; see [[#Front-end|§ Front-end]] for details) | ||
+ | *** Larger legacy pipeline delivery (5 µOPs, up from 4) | ||
+ | **** Another simple decoder has been added. | ||
+ | *** Allocation Queue (IDQ) | ||
+ | **** Larger delivery (6 µOPs, up from 4) | ||
+ | **** 2.28x larger buffer (64/thread, up from 56) | ||
+ | **** Partitioned for each active thread (from unified) | ||
+ | *** Improved [[branch prediction unit]] | ||
+ | **** reduced penalty for wrong direct jump target | ||
+ | **** No specifics were disclosed | ||
+ | *** µOP Cache | ||
+ | **** instruction window is now 64 Bytes (from 32) | ||
+ | **** 1.5x bandwidth (6 µOPs/cycle, up from 4) | ||
+ | ** Execution Engine | ||
+ | *** Larger [[re-order buffer]] (224 entries, up from 192) | ||
+ | *** Larger scheduler (97 entries, up from 64) | ||
+ | **** Larger Integer Register File (180 entries, up from 168) | ||
** Back-end | ** Back-end | ||
*** Port 4 now performs 512b stores (from 256b) | *** Port 4 now performs 512b stores (from 256b) | ||
Line 241: | Line 256: | ||
==== CPU changes ==== | ==== CPU changes ==== | ||
− | + | * Most ALU operations have 4 op/cycle 1 for 8 and 32-bit registers. 64-bit ops are still limited to 3 op/cycle. (16-bit throughput varies per op, can be 4, 3.5 or 2 op/cycle). | |
+ | * MOVSX and MOVZX have 4 op/cycle throughput for 16->32 and 32->64 forms, in addition to Haswell's 8->32, 8->64 and 16->64 bit forms. | ||
+ | * ADC and SBB have throughput of 1 op/cycle, same as Haswell. | ||
+ | * Vector moves have throughput of 4 op/cycle (move elimination). | ||
+ | * Not only zeroing vector vpXORxx and vpSUBxx ops, but also vPCMPxxx on the same register, have throughput of 4 op/cycle. | ||
+ | * Vector ALU ops are often "standardized" to latency of 4. for example, vADDPS and vMULPS used to have L of 3 and 5, now both are 4. | ||
+ | * Fused multiply-add ops have latency of 4 and throughput of 0.5 op/cycle. | ||
+ | * Throughput of vADDps, vSUBps, vCMPps, vMAXps, their scalar and double analogs is increased to 2 op/cycle. | ||
+ | * Throughput of vPSLxx and vPSRxx with immediate (i.e. fixed vector shifts) is increased to 2 op/cycle. | ||
+ | * Throughput of vANDps, vANDNps, vORps, vXORps, their scalar and double analogs, vPADDx, vPSUBx is increased to 3 op/cycle. | ||
+ | * vDIVPD, vSQRTPD have approximately twice as good throughput: from 8 to 4 and from 28 to 12 cycles/op. | ||
+ | * Throughput of some MMX ALU ops (such as PAND mm1, mm2) is decreased to 2 or 1 op/cycle (users are expected to use wider SSE/AVX registers instead). | ||
====New instructions ==== | ====New instructions ==== | ||
Line 477: | Line 503: | ||
=== Mode-Based Execute (MBE) Control === | === Mode-Based Execute (MBE) Control === | ||
− | '''Mode-Based Execute''' ('''MBE''') is an enhancement to the Extended Page Tables (EPT) that provides finer level of control of execute permissions. With MBE the previous Execute Enable (''X'') bit is turned into | + | '''Mode-Based Execute''' ('''MBE''') is an enhancement to the Extended Page Tables (EPT) that provides finer level of control of execute permissions. With MBE the previous Execute Enable (''X'') bit is turned into Excuse Userspace page (XU) and Execute Supervisor page (XS). The processor selects the mode based on the guest page permission. With proper software support, hypervisors can take advantage of this as well to ensure integrity of kernel-level code. |
== Mesh Architecture == | == Mesh Architecture == | ||
Line 655: | Line 681: | ||
=== Core Tile === | === Core Tile === | ||
− | |||
− | |||
− | |||
:[[File:skylake sp core.png|500px]] | :[[File:skylake sp core.png|500px]] | ||
Facts about "Skylake (server) - Microarchitectures - Intel"
codename | Skylake (server) + |
core count | 4 +, 6 +, 8 +, 10 +, 12 +, 14 +, 16 +, 18 +, 20 +, 22 +, 24 +, 26 + and 28 + |
designer | Intel + |
first launched | May 4, 2017 + |
full page name | intel/microarchitectures/skylake (server) + |
instance of | microarchitecture + |
instruction set architecture | x86-64 + |
manufacturer | Intel + |
microarchitecture type | CPU + |
name | Skylake (server) + |
pipeline stages (max) | 19 + |
pipeline stages (min) | 14 + |
process | 14 nm (0.014 μm, 1.4e-5 mm) + |