From WikiChip
Editing intel/microarchitectures/skylake (server)

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 205: Line 205:
 
** DMI upgraded to Gen3
 
** DMI upgraded to Gen3
 
* Core
 
* Core
** All the changes from Skylake Client (For full list, see {{\\|Skylake (Client)#Key changes from Broadwell|Skylake (Client) § Key changes from Broadwell}})
 
 
** Front End
 
** Front End
 
*** LSD is disabled (Likely due to a bug; see [[#Front-end|§ Front-end]] for details)
 
*** LSD is disabled (Likely due to a bug; see [[#Front-end|§ Front-end]] for details)
 +
*** Larger legacy pipeline delivery (5 µOPs, up from 4)
 +
**** Another simple decoder has been added.
 +
*** Allocation Queue (IDQ)
 +
**** Larger delivery (6 µOPs, up from 4)
 +
**** 2.28x larger buffer (64/thread, up from 56)
 +
**** Partitioned for each active thread (from unified)
 +
*** Improved [[branch prediction unit]]
 +
**** reduced penalty for wrong direct jump target
 +
**** No specifics were disclosed
 +
*** µOP Cache
 +
**** instruction window is now 64 Bytes (from 32)
 +
**** 1.5x bandwidth (6 µOPs/cycle, up from 4)
 +
** Execution Engine
 +
*** Larger [[re-order buffer]] (224 entries, up from 192)
 +
*** Larger scheduler (97 entries, up from 64)
 +
**** Larger Integer Register File (180 entries, up from 168)
 
** Back-end
 
** Back-end
 
*** Port 4 now performs 512b stores (from 256b)
 
*** Port 4 now performs 512b stores (from 256b)
Line 241: Line 256:
  
 
==== CPU changes ====
 
==== CPU changes ====
See {{\\|Skylake (Client)#CPU changes|Skylake (Client) § CPU changes}}
+
* Most ALU operations have 4 op/cycle 1 for 8 and 32-bit registers. 64-bit ops are still limited to 3 op/cycle. (16-bit throughput varies per op, can be 4, 3.5 or 2 op/cycle).
 +
* MOVSX and MOVZX have 4 op/cycle throughput for 16->32 and 32->64 forms, in addition to Haswell's 8->32, 8->64 and 16->64 bit forms.
 +
* ADC and SBB have throughput of 1 op/cycle, same as Haswell.
 +
* Vector moves have throughput of 4 op/cycle (move elimination).
 +
* Not only zeroing vector vpXORxx and vpSUBxx ops, but also vPCMPxxx on the same register, have throughput of 4 op/cycle.
 +
* Vector ALU ops are often "standardized" to latency of 4. for example, vADDPS and vMULPS used to have L of 3 and 5, now both are 4.
 +
* Fused multiply-add ops have latency of 4 and throughput of 0.5 op/cycle.
 +
* Throughput of vADDps, vSUBps, vCMPps, vMAXps, their scalar and double analogs is increased to 2 op/cycle.
 +
* Throughput of vPSLxx and vPSRxx with immediate (i.e. fixed vector shifts) is increased to 2 op/cycle.
 +
* Throughput of vANDps, vANDNps, vORps, vXORps, their scalar and double analogs, vPADDx, vPSUBx is increased to 3 op/cycle.
 +
* vDIVPD, vSQRTPD have approximately twice as good throughput: from 8 to 4 and from 28 to 12 cycles/op.
 +
* Throughput of some MMX ALU ops (such as PAND mm1, mm2) is decreased to 2 or 1 op/cycle (users are expected to use wider SSE/AVX registers instead).
  
 
====New instructions ====
 
====New instructions ====
Line 477: Line 503:
  
 
=== Mode-Based Execute (MBE) Control ===
 
=== Mode-Based Execute (MBE) Control ===
'''Mode-Based Execute''' ('''MBE''') is an enhancement to the Extended Page Tables (EPT) that provides finer level of control of execute permissions. With MBE the previous Execute Enable (''X'') bit is turned into Execute Userspace page (XU) and Execute Supervisor page (XS). The processor selects the mode based on the guest page permission. With proper software support, hypervisors can take advantage of this as well to ensure integrity of kernel-level code.
+
'''Mode-Based Execute''' ('''MBE''') is an enhancement to the Extended Page Tables (EPT) that provides finer level of control of execute permissions. With MBE the previous Execute Enable (''X'') bit is turned into Excuse Userspace page (XU) and Execute Supervisor page (XS). The processor selects the mode based on the guest page permission. With proper software support, hypervisors can take advantage of this as well to ensure integrity of kernel-level code.
  
 
== Mesh Architecture ==
 
== Mesh Architecture ==
Line 655: Line 681:
  
 
=== Core Tile ===
 
=== Core Tile ===
* ~4.8375 x 3.7163
 
* ~ 17.978 mm² die area
 
 
 
:[[File:skylake sp core.png|500px]]
 
:[[File:skylake sp core.png|500px]]
  

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameSkylake (server) +
core count4 +, 6 +, 8 +, 10 +, 12 +, 14 +, 16 +, 18 +, 20 +, 22 +, 24 +, 26 + and 28 +
designerIntel +
first launchedMay 4, 2017 +
full page nameintel/microarchitectures/skylake (server) +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameSkylake (server) +
pipeline stages (max)19 +
pipeline stages (min)14 +
process14 nm (0.014 μm, 1.4e-5 mm) +