From WikiChip
Editing intel/microarchitectures/skylake (server)

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 205: Line 205:
 
** DMI upgraded to Gen3
 
** DMI upgraded to Gen3
 
* Core
 
* Core
** All the changes from Skylake Client (For full list, see {{\\|Skylake (Client)#Key changes from Broadwell|Skylake (Client) § Key changes from Broadwell}})
+
** All the changes from Skylake Client (For full list, see {{\\|Skylake (Client)#Key changes from Broadwell|Skylake (Client) § Key changes from Broadwell)
 
** Front End
 
** Front End
 
*** LSD is disabled (Likely due to a bug; see [[#Front-end|§ Front-end]] for details)
 
*** LSD is disabled (Likely due to a bug; see [[#Front-end|§ Front-end]] for details)
Line 241: Line 241:
  
 
==== CPU changes ====
 
==== CPU changes ====
See {{\\|Skylake (Client)#CPU changes|Skylake (Client) § CPU changes}}
+
* Most ALU operations have 4 op/cycle 1 for 8 and 32-bit registers. 64-bit ops are still limited to 3 op/cycle. (16-bit throughput varies per op, can be 4, 3.5 or 2 op/cycle).
 +
* MOVSX and MOVZX have 4 op/cycle throughput for 16->32 and 32->64 forms, in addition to Haswell's 8->32, 8->64 and 16->64 bit forms.
 +
* ADC and SBB have throughput of 1 op/cycle, same as Haswell.
 +
* Vector moves have throughput of 4 op/cycle (move elimination).
 +
* Not only zeroing vector vpXORxx and vpSUBxx ops, but also vPCMPxxx on the same register, have throughput of 4 op/cycle.
 +
* Vector ALU ops are often "standardized" to latency of 4. for example, vADDPS and vMULPS used to have L of 3 and 5, now both are 4.
 +
* Fused multiply-add ops have latency of 4 and throughput of 0.5 op/cycle.
 +
* Throughput of vADDps, vSUBps, vCMPps, vMAXps, their scalar and double analogs is increased to 2 op/cycle.
 +
* Throughput of vPSLxx and vPSRxx with immediate (i.e. fixed vector shifts) is increased to 2 op/cycle.
 +
* Throughput of vANDps, vANDNps, vORps, vXORps, their scalar and double analogs, vPADDx, vPSUBx is increased to 3 op/cycle.
 +
* vDIVPD, vSQRTPD have approximately twice as good throughput: from 8 to 4 and from 28 to 12 cycles/op.
 +
* Throughput of some MMX ALU ops (such as PAND mm1, mm2) is decreased to 2 or 1 op/cycle (users are expected to use wider SSE/AVX registers instead).
  
 
====New instructions ====
 
====New instructions ====

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameSkylake (server) +
core count4 +, 6 +, 8 +, 10 +, 12 +, 14 +, 16 +, 18 +, 20 +, 22 +, 24 +, 26 + and 28 +
designerIntel +
first launchedMay 4, 2017 +
full page nameintel/microarchitectures/skylake (server) +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameSkylake (server) +
pipeline stages (max)19 +
pipeline stages (min)14 +
process14 nm (0.014 μm, 1.4e-5 mm) +