From WikiChip
Editing intel/microarchitectures/skylake (client)
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
− | {{intel title|Skylake | + | {{intel title|Skylake|arch}} |
{{microarchitecture | {{microarchitecture | ||
− | |atype=CPU | + | | atype = CPU |
− | |name=Skylake | + | | name = Skylake |
− | |designer=Intel | + | | designer = Intel |
− | |manufacturer=Intel | + | | manufacturer = Intel |
− | |introduction=August 5, 2015 | + | | introduction = August 5, 2015 |
− | |process=14 nm | + | | phase-out = |
− | |cores=2 | + | | process = 14 nm |
− | |cores 2=4 | + | | cores = 2 |
− | | | + | | cores 2 = 4 |
− | |type | + | | cores 3 = 6 |
− | | | + | | cores 4 = 8 |
− | |speculative=Yes | + | | cores 5 = 10 |
− | |renaming=Yes | + | |
− | |stages min=14 | + | | pipeline = Yes |
− | |stages max=19 | + | | type = Superscalar |
− | | | + | | OoOE = Yes |
− | |extension 2=MMX | + | | speculative = Yes |
− | |extension 3=SSE | + | | renaming = Yes |
− | |extension 4=SSE2 | + | | isa = IA-32 |
− | |extension 5=SSE3 | + | | isa 2 = x86-64 |
− | |extension 6=SSSE3 | + | | stages min = 14 |
− | |extension 7=SSE4.1 | + | | stages max = 19 |
− | |extension 8=SSE4.2 | + | | issues = 5 |
− | |extension 9=POPCNT | + | |
− | |extension 10=AVX | + | | inst = Yes |
− | |extension 11=AVX2 | + | | feature = |
− | |extension 12=AES | + | | extension = MOVBE |
− | |extension 13=PCLMUL | + | | extension 2 = MMX |
− | |extension 14=FSGSBASE | + | | extension 3 = SSE |
− | |extension 15=RDRND | + | | extension 4 = SSE2 |
− | |extension 16=FMA3 | + | | extension 5 = SSE3 |
− | |extension 17=F16C | + | | extension 6 = SSSE3 |
− | |extension 18=BMI | + | | extension 7 = SSE4.1 |
− | |extension 19=BMI2 | + | | extension 8 = SSE4.2 |
− | |extension 20=VT-x | + | | extension 9 = POPCNT |
− | |extension 21=VT-d | + | | extension 10 = AVX |
− | |extension 22=TXT | + | | extension 11 = AVX2 |
− | |extension 23=TSX | + | | extension 12 = AES |
− | |extension 25=ADCX | + | | extension 13 = PCLMUL |
− | |extension 27=CLFLUSHOPT | + | | extension 14 = FSGSBASE |
− | |extension 28=XSAVE | + | | extension 15 = RDRND |
− | |l1i=32 KiB | + | | extension 16 = FMA3 |
− | |l1i per=core | + | | extension 17 = F16C |
− | |l1i desc=8-way set associative | + | | extension 18 = BMI |
− | |l1d=32 KiB | + | | extension 19 = BMI2 |
− | |l1d per=core | + | | extension 20 = VT-x |
− | |l1d desc=8-way set associative | + | | extension 21 = VT-d |
− | |l2=256 KiB | + | | extension 22 = TXT |
− | |l2 per=core | + | | extension 23 = TSX |
− | |l2 desc=4-way set associative | + | | extension 24 = RDSEED |
− | |l3=2 MiB | + | | extension 25 = ADCX |
− | |l3 per=core | + | | extension 26 = PREFETCHW |
− | |l3 desc=Up to 16-way set associative | + | | extension 27 = CLFLUSHOPT |
− | | | + | | extension 28 = XSAVE |
− | | | + | | extension 29 = SGX |
− | | | + | | extension 30 = MPX |
− | |core name=Skylake Y | + | | extension 31 = AVX-512 |
− | |core name 2=Skylake U | + | |
− | |core name 3=Skylake H | + | | cache = Yes |
− | |core name 4=Skylake S | + | | l1i = 32 KiB |
− | |core name 5=Skylake | + | | l1i per = core |
− | |predecessor=Broadwell | + | | l1i desc = 8-way set associative |
− | |predecessor link=intel/microarchitectures/broadwell | + | | l1d = 32 KiB |
− | |successor=Kaby Lake | + | | l1d per = core |
− | |successor link=intel/microarchitectures/kaby lake | + | | l1d desc = 8-way set associative |
− | + | | l2 = 256 KiB | |
− | + | | l2 per = core | |
− | + | | l2 desc = 4-way set associative | |
− | + | | l3 = 2 MiB | |
− | + | | l3 per = core | |
− | + | | l3 desc = Up to 16-way set associative | |
+ | | l4 = 128 MiB | ||
+ | | l4 per = package | ||
+ | | l4 desc = on Iris Pro GPUs only | ||
+ | |||
+ | | core names = Yes | ||
+ | | core name = Skylake Y | ||
+ | | core name 2 = Skylake U | ||
+ | | core name 3 = Skylake H | ||
+ | | core name 4 = Skylake S | ||
+ | | core name 5 = Skylake X | ||
+ | | core name 6 = Skylake W | ||
+ | |||
+ | | succession = Yes | ||
+ | | predecessor = Broadwell | ||
+ | | predecessor link = intel/microarchitectures/broadwell | ||
+ | | successor = Kaby Lake | ||
+ | | successor link = intel/microarchitectures/kaby lake | ||
}} | }} | ||
− | '''Skylake''' ('''SKL''') | + | '''Skylake''' ('''SKL''') is [[Intel]]'s successor to {{\\|Broadwell}}, a [[14 nm process]] [[microarchitecture]] for mainstream desktops, servers, and mobile devices. Skylake succeeded the short-lived {{\\|Broadwell}} which experienced severe delays. Skylake is the "Architecture" phase as part of Intel's {{intel|PAO}} model. The microarchitecture was developed by Intel's R&D center in [[wikipedia:Haifa, Israel|Haifa, Israel]]. |
− | For desktop and mobile, Skylake is branded as 6th Generation Intel {{intel|Core i3}}, {{intel|Core i5}} | + | For desktop and mobile, Skylake is branded as 6th Generation Intel {{intel|Core i3}}, {{intel|Core i5}}. and {{intel|Core i7}} processors. For workstations it's branded as {{intel|Xeon E3|Xeon E3 v5}} For server class processors, Intel branded it as {{intel|Xeon Bronze}}, {{intel|Xeon Silver}}, {{intel|Xeon Gold}}, and {{intel|Xeon Platinum}}. |
== Codenames == | == Codenames == | ||
− | |||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
− | ! Core !! Abbrev | + | ! Core !! Abbrev !! Target |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|- | |- | ||
− | | | + | | Skylake Y || SKL-Y || 2-in-1s detachable, tablets, and computer sticks |
|- | |- | ||
− | | | + | | Skylake U || SKL-U || Light notebooks, portable All-in-Ones (AiOs), Minis, and conference room |
|- | |- | ||
− | | | + | | Skylake H || SKL-H || Ultimate mobile performance, mobile workstations |
|- | |- | ||
− | | | + | | Skylake S || SKL-S || Desktop performance to value, AiOs, and minis |
|- | |- | ||
− | | | + | | Skylake X || SKL-X || High-end desktops & enthusiasts market |
|- | |- | ||
− | | | + | | Skylake W || SKL-W || Workstations |
|} | |} | ||
Line 132: | Line 117: | ||
== Process Technology == | == Process Technology == | ||
{{main|intel/microarchitectures/broadwell#Process_Technology|l1=Broadwell § Process Technology}} | {{main|intel/microarchitectures/broadwell#Process_Technology|l1=Broadwell § Process Technology}} | ||
− | Skylake uses the same [[14 nm process]] used for the Broadwell microarchitecture | + | Skylake uses the same [[14 nm process]] used for the Broadwell microarchitecture. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Compiler support == | == Compiler support == | ||
Line 165: | Line 131: | ||
|- | |- | ||
| [[Visual Studio]] || <code>/arch:AVX2</code> || <code>/tune:skylake</code> | | [[Visual Studio]] || <code>/arch:AVX2</code> || <code>/tune:skylake</code> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|} | |} | ||
== Architecture == | == Architecture == | ||
− | Overall Skylake builds upon Intel's previous microarchitecture, {{\\|Broadwell}}, but includes a | + | Overall Skylake builds upon Intel's previous microarchitecture, {{\\|Broadwell}}, but includes a more beefed up front end, more optimized execution engine, and numerous number of smaller enhancements. Intel designed Skylake to encompass a wide range of devices and applications with a large emphasis on mobile with models ranging from as low as 4.5 W to as high as 100 W. |
=== Key changes from {{\\|Broadwell}} === | === Key changes from {{\\|Broadwell}} === | ||
− | |||
* 8x performance/watt over {{\\|Nehalem}} (Up from 3.5x in {{\\|Haswell}}) | * 8x performance/watt over {{\\|Nehalem}} (Up from 3.5x in {{\\|Haswell}}) | ||
− | + | * Bus/Interface to Chipset | |
− | + | ** {{intel|Direct Media Interface|DMI 3.0}} (from 2.0) | |
− | + | *** Skylake S and Skylake H cores, connected by 4-lane DMI 3.0 | |
− | + | *** Skylake Y and Skylake U cores have chipset in the same package (simplified {{intel|on Package I/O|OPIO}}) | |
− | + | *** Increase in transfer rate from 5.0 GT/s to 8.0 GT/s (~3.93GB/s up from 2GB/s) per lane | |
− | *** | + | *** Limits motherboard trace design to 7 inches max from (down from 8) from the CPU to chipset |
− | + | * Front End | |
− | + | ** Larger legacy pipeline delivery (5 µOPs, up from 4) | |
− | + | *** Another simple decoder has been added. | |
− | + | ** Larger IDQ delivery (6 µOPs, up from 4) | |
− | + | ** 2.28x larger allocation queue (64/thread, up from 28/thread) | |
− | + | ** Improved [[branch prediction unit]] | |
− | + | * Execution Engine | |
− | + | ** Larger [[re-order buffer]] (224 entries, up from 192) | |
− | + | ** Larger scheduler (97 entries, up from 64) | |
− | + | *** Larger Integer Register File (180 entries, up from 160) | |
− | + | ** Larger store buffer (56 entries, up from 42) | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | ** | ||
− | |||
− | **** Larger delivery (6 µOPs, up from 4) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
* Memory | * Memory | ||
** Support for faster DDR-2400 memory | ** Support for faster DDR-2400 memory | ||
+ | ** [[L2$]] was changed from 8-way to 4-way set associative | ||
** [[L3$]] re-gained 512 KiB/core (See [[#eDRAM architectural changes|§eDRAM architectural changes]] for the reason) | ** [[L3$]] re-gained 512 KiB/core (See [[#eDRAM architectural changes|§eDRAM architectural changes]] for the reason) | ||
** A new coherent [[cache]] fabric implementation | ** A new coherent [[cache]] fabric implementation | ||
Line 245: | Line 171: | ||
** The fully integrated voltage regulator (FIVR) is moved back to the motherboard | ** The fully integrated voltage regulator (FIVR) is moved back to the motherboard | ||
*** Originally intended to be a cost-cutting measure by moving the FIVR on-die as well as making it more efficient, the move resulted in unintentionally making the FIVR the limiting factor when it came to overclocking. | *** Originally intended to be a cost-cutting measure by moving the FIVR on-die as well as making it more efficient, the move resulted in unintentionally making the FIVR the limiting factor when it came to overclocking. | ||
− | |||
* Testability | * Testability | ||
** New support for {{intel|Direct Connect Interface}} (DCI), a new debugging transport protocol designed to allow debugging of closed cases (e.g. laptops, embedded) by accessing things such as [[JTAG]] through any [[USB 3]] port. | ** New support for {{intel|Direct Connect Interface}} (DCI), a new debugging transport protocol designed to allow debugging of closed cases (e.g. laptops, embedded) by accessing things such as [[JTAG]] through any [[USB 3]] port. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==== CPU changes ==== | ==== CPU changes ==== | ||
− | * | + | * Most ALU operations have 4 op/cycle 1 for 8 and 32-bit registers. 64-bit ops are still limited to 3 op/cycle. (16-bit throughput varies per op, can be 4, 3.5 or 2 op/cycle). |
− | * | + | * MOVSX and MOVZX have 4 op/cycle throughput for 16->32 and 32->64 forms, in addition to Haswell's 8->32, 8->64 and 16->64 bit forms. |
− | * Vector moves have throughput of 4 op/cycle ( | + | * ADC and SBB have throughput of 1 op/cycle, same as Haswell. |
− | * | + | * Vector moves have throughput of 4 op/cycle (move elimination). |
− | * Vector ALU ops are often "standardized" to latency of 4. for example, vADDPS and vMULPS used to have L of 3 and 5 | + | * Not only zeroing vector vpXORxx and vpSUBxx ops, but also vPCMPxxx on the same register, have throughput of 4 op/cycle. |
− | * Fused multiply-add ops have latency of 4 and throughput of 0.5 op/cycle | + | * Vector ALU ops are often "standardized" to latency of 4. for example, vADDPS and vMULPS used to have L of 3 and 5, now both are 4. |
− | * Throughput of vADDps, vSUBps, vCMPps, vMAXps, their scalar and double analogs is increased to 2 op/cycle | + | * Fused multiply-add ops have latency of 4 and throughput of 0.5 op/cycle. |
− | * Throughput of vPSLxx and vPSRxx with immediate (i.e. fixed vector shifts) is increased to 2 op/cycle | + | * Throughput of vADDps, vSUBps, vCMPps, vMAXps, their scalar and double analogs is increased to 2 op/cycle. |
+ | * Throughput of vPSLxx and vPSRxx with immediate (i.e. fixed vector shifts) is increased to 2 op/cycle. | ||
* Throughput of vANDps, vANDNps, vORps, vXORps, their scalar and double analogs, vPADDx, vPSUBx is increased to 3 op/cycle. | * Throughput of vANDps, vANDNps, vORps, vXORps, their scalar and double analogs, vPADDx, vPSUBx is increased to 3 op/cycle. | ||
* vDIVPD, vSQRTPD have approximately twice as good throughput: from 8 to 4 and from 28 to 12 cycles/op. | * vDIVPD, vSQRTPD have approximately twice as good throughput: from 8 to 4 and from 28 to 12 cycles/op. | ||
* Throughput of some MMX ALU ops (such as PAND mm1, mm2) is decreased to 2 or 1 op/cycle (users are expected to use wider SSE/AVX registers instead). | * Throughput of some MMX ALU ops (such as PAND mm1, mm2) is decreased to 2 or 1 op/cycle (users are expected to use wider SSE/AVX registers instead). | ||
+ | |||
+ | ===== New GPU Features & Changes ===== | ||
+ | * Adaptive scalable texture compression (ASTC) | ||
+ | * 16x multi-sample anti-aliasing (MSAA) | ||
+ | * Post depth test coverage mask | ||
+ | * Floating point atomics (min/max/cmpexch) | ||
+ | * Min/max texture filtering | ||
+ | * Multi-plane overlays | ||
+ | |||
+ | ==== Graphics ==== | ||
+ | * Improved underlying implementation of the memory QoS for higher resolution displays and the integrated [[image signal processor]] (ISP) | ||
+ | ** Allow for higher concurrent bandwidth | ||
+ | * Skylake retires VGA support, multi-monitor support for up to 3 displays via HDMI 1.4, DP 1.2, and eDP 1.3 interfaces. | ||
+ | * Direct X 12 | ||
+ | * OpenCL 2.0 | ||
+ | * OpenGL 4.4 | ||
+ | * Up to 24 EUs GT2 (same as {{\\|Haswell}}); 48 EUs for GT3, and up to 72 EUs on {{intel|Iris Pro Graphics}} | ||
+ | ** 1,152 GFLOPS | ||
+ | |||
+ | :{| class="wikitable" | ||
+ | |- | ||
+ | ! [[integrated graphics processor|IGP]] !! Execution Units !! GT !! eDRAM !! Series (Y/U/H/S) | ||
+ | |- | ||
+ | | {{intel|HD Graphics}} || 12 || 2+1 || - || Y | ||
+ | |- | ||
+ | | {{intel|HD Graphics 510}} || 12 || 2+2 || - || U/S | ||
+ | |- | ||
+ | | {{intel|HD Graphics 515}} || 24 || 2+2 || - || Y | ||
+ | |- | ||
+ | | {{intel|HD Graphics 520}} || 24 || 4+2<br>2+2 || - || U | ||
+ | |- | ||
+ | | {{intel|HD Graphics 530}} || 24 || 4+2<br>2+2 || - || H/S | ||
+ | |- | ||
+ | | {{intel|HD Graphics P530}} || 24 || 4+2 || - || H | ||
+ | |- | ||
+ | | {{intel|Iris Graphics 540}} || 48 || 2+3e || 64 MiB || U | ||
+ | |- | ||
+ | | {{intel|Iris Graphics 550}} || 48 || 2+3e || 64 MiB || U | ||
+ | |- | ||
+ | | {{intel|Iris Pro Graphics 580}} || 72 || 4+4e || 128 MiB || H | ||
+ | |} | ||
====New instructions ==== | ====New instructions ==== | ||
− | {{ | + | {{main|#Added instructions|l1=See §Added instructions for the complete list}} |
− | Skylake introduced a number of {{x86| | + | Skylake introduced a number of new instructions: |
+ | * {{x86|SGX|<code>SGX</code>}} - Software Guard Extensions | ||
+ | * {{x86|MPX|<code>MPX</code>}} -Memory Protection Extensions | ||
+ | * {{x86|AVX-512|<code>AVX-512</code>}} - Advanced Vector Extensions 512 (Only on high-end {{intel|Xeon}} models (SKX)) | ||
− | + | ===="Speed Shift" (new power management)==== | |
− | + | Ever since the introduction of the modern power management unit on a microprocessor, it was effectively the role of the operating system to determine the desired [[operating frequency]] and [[voltage]] (i.e. a [[p-state]]) for the current workload. When the CPU utilization peaked, it was the role of the operating system to bump up the frequency to help cope with it. The issue has always been the limitation of the operating system. One such major limitation is the granularity of the operating system response time - usually in the 10s of [[milliseconds]] (anything lower than that would likely be too intensive and would not yield better result). A second major issue is that the operating system doesn't have an instantaneous observation of the microarchitectural behavior of the workload. | |
− | + | ||
− | + | Intel introduced '''{{intel|Speed Shift}}''' with Skylake, a new methodology for quickly alternating core frequencies in response to power loads. Intel introduced a new unit called [[Package Control Unit]] (PCU) which is effectively a full fledged [[microcontroller]] (containing power management logic and [[firmware]]) that collects and tracks many internal SoC statistics as well as external power telemetry (e.g. Psys and iMon). PCU is also capable of interfacing with the OS, [[BIOS]], and {{intel|Dynamic Platform and Thermal Framework|DPTF}}. Speed Shift improves the performance of frequency shifting by off-loading the control from the [[operating system]] to the PCU. | |
− | + | ||
+ | Speed Shift effectively eliminates the need for the OS to manages the P-states - though it does have the final say (unless special exceptions occur such as thermal throttling). Intel calls this "autonomous P-state", allowing Speed Shift to kick in in a matter of just ~1 millisecond (whereas the operating system-based p-states control can be as slow as 30 ms). Speed Shift effectively reduces hitting peak frequency in around ~30 ms from over 100 ms (OS-based implementation as before). While Speed Shift is capable of full range shift by default, the operating system can set the minimum QoS, maximum frequency and power/performance hints when desired. The final result should be higher performance and specially higher responsiveness at power constrained form factors. | ||
+ | |||
+ | ===== Power of System (Psys) ===== | ||
+ | Psys (Power of System) is a way for the PCU to monitor the performance and the total platform power provided to the chip. The chip uses a number of autonomous algorithms (one for "Low Range" and one for "High Range"). The Low Range algorithm frequency is lowered to conserve energy. Algorithm is capable of overriding the low P state - a state calculated ever millisecond based on the active workload and system characteristics. The High Range algorithm deals with elevating frequency for the benefit of increase performance (at the cost of increase energy/inefficiency). The exact ratio of ΔPower/ΔPerformance ≤ αPreference can be finely controlled via the OS and user preferences. | ||
+ | |||
+ | ==== Other Power Optimization ==== | ||
+ | Skylake includes a number of additional power optimization changes: | ||
− | + | * {{x86|AVX2}} is now power gated - prior to Skylake, AVX2 was not power gated which meant it was susceptible to [[leakage]]. Starting with Skylake, those instruction are full power gated and turn off when not used. | |
+ | * Many older/legacy underused resources have been downscaled. | ||
+ | * Various scenario-based power optimizations were done, including: | ||
+ | ** Idle power is reduced further | ||
+ | ** C1 state power reduction (improved dynamic capacitance C<sub>dyn</sub>) | ||
− | + | Overall Skylake enjoys better performance/Watt per core for 8x performance/watt over {{\\|Nehalem}}. | |
− | |||
− | ==== Entire SoC Overview | + | === Block Diagram === |
+ | ==== Client SoC ==== | ||
+ | ====== Entire SoC Overview ====== | ||
[[File:skylake soc block diagram.svg|900px]] | [[File:skylake soc block diagram.svg|900px]] | ||
− | ==== Individual Core ==== | + | ====== Individual Core ====== |
− | [[File:skylake block diagram.svg | + | [[File:skylake block diagram.svg]] |
− | ==== Gen9 ==== | + | ====== Gen9 ====== |
See {{intel|Gen9#Gen9|l=arch}}. | See {{intel|Gen9#Gen9|l=arch}}. | ||
+ | |||
+ | ==== Server MPUs ==== | ||
+ | {{future information}} | ||
+ | |||
+ | Intel has not disclosed the details of the Skylake server configuration. | ||
=== Memory Hierarchy === | === Memory Hierarchy === | ||
Other than a few organizational changes (e.g. L2$ went from 8-way to 4-way set associative), the overall memory structure is identical to {{\\|Broadwell}}/{{\\|Haswell}}. | Other than a few organizational changes (e.g. L2$ went from 8-way to 4-way set associative), the overall memory structure is identical to {{\\|Broadwell}}/{{\\|Haswell}}. | ||
− | |||
− | |||
* Cache | * Cache | ||
− | |||
− | |||
− | |||
− | |||
** L1I Cache: | ** L1I Cache: | ||
− | *** 32 [[KiB]] | + | *** 32 [[KiB]] 8-way set associative |
− | **** | + | **** 64 B line size |
**** shared by the two threads, per core | **** shared by the two threads, per core | ||
** L1D Cache: | ** L1D Cache: | ||
− | *** 32 KiB | + | *** 32 KiB 8-way set associative |
− | *** | + | *** 64 B line size |
*** shared by the two threads, per core | *** shared by the two threads, per core | ||
*** 4 cycles for fastest load-to-use (simple pointer accesses) | *** 4 cycles for fastest load-to-use (simple pointer accesses) | ||
**** 5 cycles for complex addresses | **** 5 cycles for complex addresses | ||
− | *** 64 | + | *** 64 Bytes/cycle load bandwidth |
− | *** 32 | + | *** 32 Bytes/cycle store bandwidth |
*** Write-back policy | *** Write-back policy | ||
** L2 Cache: | ** L2 Cache: | ||
− | *** | + | *** unified, 256 KiB 4-way set associative |
− | |||
− | |||
*** 12 cycles for fastest load-to-use | *** 12 cycles for fastest load-to-use | ||
− | *** | + | *** 64B/cycle bandwidth to L1$ |
*** Write-back policy | *** Write-back policy | ||
** L3 Cache/LLC: | ** L3 Cache/LLC: | ||
*** Up to 2 MiB Per core, shared across all cores | *** Up to 2 MiB Per core, shared across all cores | ||
*** Up to 16-way set associative | *** Up to 16-way set associative | ||
− | |||
− | |||
*** Write-back policy | *** Write-back policy | ||
*** Per each core: | *** Per each core: | ||
Line 337: | Line 306: | ||
*** Per package | *** Per package | ||
*** Only on the Iris Pro GPUs | *** Only on the Iris Pro GPUs | ||
− | *** Read: | + | *** Read: 32B/cycle (@ [[eDRAM]] clock) |
− | *** Write: | + | *** Write: 32B/cycle (@ EDRAM clock) |
** System [[DRAM]]: | ** System [[DRAM]]: | ||
*** 2 Channels | *** 2 Channels | ||
− | *** | + | *** 8B/cycle/channel (@ memory clock) |
*** 42 cycles + 51 ns latency | *** 42 cycles + 51 ns latency | ||
− | Skylake TLB consists of dedicated | + | Skylake TLB consists of dedicated level one TLB for instruction cache and another one for data cache. Additionally there is a unified second level TLB. |
* TLBs: | * TLBs: | ||
** ITLB | ** ITLB | ||
*** 4 KiB page translations: | *** 4 KiB page translations: | ||
**** 128 entries; 8-way set associative | **** 128 entries; 8-way set associative | ||
− | **** dynamic | + | **** dynamic partition; divided between the two threads |
*** 2 MiB / 4 MiB page translations: | *** 2 MiB / 4 MiB page translations: | ||
− | **** 8 entries | + | **** 8 entries; fully associative |
**** Duplicated for each thread | **** Duplicated for each thread | ||
** DTLB | ** DTLB | ||
*** 4 KiB page translations: | *** 4 KiB page translations: | ||
**** 64 entries; 4-way set associative | **** 64 entries; 4-way set associative | ||
− | **** fixed partition | + | **** fixed partition; divided between the two threads |
*** 2 MiB / 4 MiB page translations: | *** 2 MiB / 4 MiB page translations: | ||
**** 32 entries; 4-way set associative | **** 32 entries; 4-way set associative | ||
Line 370: | Line 339: | ||
**** 16 entries; 4-way set associative | **** 16 entries; 4-way set associative | ||
**** fixed partition | **** fixed partition | ||
− | |||
− | |||
− | |||
− | |||
== Overview == | == Overview == | ||
Line 397: | Line 362: | ||
The Skylake [[system on a chip]] consists of a five major components: CPU core, [[last level cache|LLC]], Ring interconnect, System agent, and the [[integrated graphics]]. The image shown on the right, presented by Intel at the Intel Developer Forum in 2015, represents a hypothetical model incorporating all available features Skylake has to offer (i.e. [[superset]] of features). Skylake features an improved core (see [[#Pipeline|§ Pipeline]]) with higher performance per watt and higher performance per clock. The number of cores depends on the model, but mainstream mobile models are typically [[dual-core]] while mainstream desktop models are typically [[quad-core]] with dual-core desktop models still offered for value models (e.g. {{intel|Celeron}}). Accompanying the cores is the LCC ([[last level cache]] or [[L3$]] as seen from the CPU perspective). On mainstream parts the LLC consists of 2 MiB for each core with lower amounts for value models. Connecting the cores together is the ring interconnect. The ring extends to the GPU and the system agent as well. Intel further optimized the ring in Skylake for low-power and higher bandwidth. | The Skylake [[system on a chip]] consists of a five major components: CPU core, [[last level cache|LLC]], Ring interconnect, System agent, and the [[integrated graphics]]. The image shown on the right, presented by Intel at the Intel Developer Forum in 2015, represents a hypothetical model incorporating all available features Skylake has to offer (i.e. [[superset]] of features). Skylake features an improved core (see [[#Pipeline|§ Pipeline]]) with higher performance per watt and higher performance per clock. The number of cores depends on the model, but mainstream mobile models are typically [[dual-core]] while mainstream desktop models are typically [[quad-core]] with dual-core desktop models still offered for value models (e.g. {{intel|Celeron}}). Accompanying the cores is the LCC ([[last level cache]] or [[L3$]] as seen from the CPU perspective). On mainstream parts the LLC consists of 2 MiB for each core with lower amounts for value models. Connecting the cores together is the ring interconnect. The ring extends to the GPU and the system agent as well. Intel further optimized the ring in Skylake for low-power and higher bandwidth. | ||
− | Accompanying the cores is the {{\\|Gen9}} [[integrated graphics]] unit which comes in a number of different tiers ranging from just 12 execution units (used in the ultra-low power models) all the way the GT4 ({{\\|gen9#Scalability|Gen9 § Pipeline}}) with 72 execution units boasting a peak performance of up to 2,534.4 GFLOPS (HF) / 1,267.2 GFLOPS (SP) on the highest-end workstation model. The two highest-tier models are also accompanied by dedicated [[eDRAM]] ranging from 64 to | + | Accompanying the cores is the {{\\|Gen9}} [[integrated graphics]] unit which comes in a number of different tiers ranging from just 12 execution units (used in the ultra-low power models) all the way the GT4 ({{\\|gen9#Scalability|Gen9 § Pipeline}}) with 72 execution units boasting a peak performance of up to 2,534.4 GFLOPS (HF) / 1,267.2 GFLOPS (SP) on the highest-end workstation model. The two highest-tier models are also accompanied by dedicated [[eDRAM]] ranging from 64 GiB to 120 GiB in capacity. The eDRAM is packaged along with the SoC in the same package. |
On the other side is the {{intel|System Agent}} (SA) which houses the various functionality that's not directly related to the cores or graphics. Skylake features an upgraded [[integrated memory controller]] (IMC) with most mainstream models supporting faster memory and dual-channel [[DDR4]]. The SA in Skylake also includes the [[Display Controller]] which now supports higher resolution displays with up to three displays for all mainstream models. | On the other side is the {{intel|System Agent}} (SA) which houses the various functionality that's not directly related to the cores or graphics. Skylake features an upgraded [[integrated memory controller]] (IMC) with most mainstream models supporting faster memory and dual-channel [[DDR4]]. The SA in Skylake also includes the [[Display Controller]] which now supports higher resolution displays with up to three displays for all mainstream models. | ||
Line 416: | Line 381: | ||
Intel has been experiencing a growing divergence in functionality over the last number of iterations of [[intel/microarchitectures|their microarchitecture]] between their mainstream consumer products and their high-end HPC/server models. Traditionally, Intel has been using the same exact core design for everything from their lowest end value models (e.g. {{intel|Celeron}}) all the way up to the highest-performance enterprise models (e.g. {{intel|Xeon E7}}). While the two have fundamentally different chip architectures, they use the same exact CPU core architecture as the building block. | Intel has been experiencing a growing divergence in functionality over the last number of iterations of [[intel/microarchitectures|their microarchitecture]] between their mainstream consumer products and their high-end HPC/server models. Traditionally, Intel has been using the same exact core design for everything from their lowest end value models (e.g. {{intel|Celeron}}) all the way up to the highest-performance enterprise models (e.g. {{intel|Xeon E7}}). While the two have fundamentally different chip architectures, they use the same exact CPU core architecture as the building block. | ||
− | This design philosophy has changed with Skylake. In order to better accommodate the different functionalities of each segment without sacrificing features or making unnecessary compromises Intel went with a configurable core. The Skylake core is a single development project, making up a master superset core. The project result in two derivatives: | + | This design philosophy has changed with Skylake. In order to better accommodate the different functionalities of each segment without sacrificing features or making unnecessary compromises Intel went with a configurable core. The Skylake core is a single development project, making up a master superset core. The project result in two derivatives: one for servers and one for clients. All mainstream models (from {{intel|Celeron}}/{{intel|Pentium (2009)|Pentium}} all the way up to {{intel|Core i7}}/{{intel|Xeon E3}}) use the client core configuration. Server models (e.g. {{intel|Xeon E5}}/{{intel|Xeon E7}}) will be using the new server configuration. |
− | |||
− | |||
=== Pipeline === | === Pipeline === | ||
Line 424: | Line 387: | ||
==== Broad Overview ==== | ==== Broad Overview ==== | ||
− | At a | + | At a 10,000 feet view, Skylake represents the logical evolution from {{\\|Haswell}} and {{\\|Broadwell}}. Therefore, despite some significant differences from the previous microarchitecture, the overall designs is fundamentally the same and can be seen as enhancements over Broadwell rather than a complete change. |
[[File:intel common arch post ucache.svg|left|250px]] | [[File:intel common arch post ucache.svg|left|250px]] | ||
The pipeline can be broken down into three areas: the front-end, back-end or execution engine, and the memory subsystem. The goal of the [[front-end]] is to feed the back-end with a sufficient stream of operations which it gets by [[decoding instructions]] coming from memory. The front-end has two major pathways: the [[µOPs cache]] path and the legacy path. The legacy path is the traditional path whereby variable-length [[x86]] instructions are fetched from the [[level 1 instruction cache]], queued, and consequently get decoded into simpler, fixed-length [[µOPs]]. The alternative and much more desired path is the µOPs cache path whereby a [[cache]] containing already decoded µOPs receives a hit allowing the µOPs to be sent directly to the decode queue. | The pipeline can be broken down into three areas: the front-end, back-end or execution engine, and the memory subsystem. The goal of the [[front-end]] is to feed the back-end with a sufficient stream of operations which it gets by [[decoding instructions]] coming from memory. The front-end has two major pathways: the [[µOPs cache]] path and the legacy path. The legacy path is the traditional path whereby variable-length [[x86]] instructions are fetched from the [[level 1 instruction cache]], queued, and consequently get decoded into simpler, fixed-length [[µOPs]]. The alternative and much more desired path is the µOPs cache path whereby a [[cache]] containing already decoded µOPs receives a hit allowing the µOPs to be sent directly to the decode queue. | ||
Line 434: | Line 397: | ||
Some µOPs deal with memory access (e.g. [[instruction load|load]] & [[instruction store|store]]). Those will be sent on dedicated scheduler ports that can perform those memory operations. Store operations go to the store buffer which is also capable of performing forwarding when needed. Likewise, Load operations come from the load buffer. Skylake features a dedicated 32 KiB level 1 data cache and a dedicated 32 KiB level 1 instruction cache. It also features a core-private 256 KiB L2 cache that is shared by both of the L1 caches. | Some µOPs deal with memory access (e.g. [[instruction load|load]] & [[instruction store|store]]). Those will be sent on dedicated scheduler ports that can perform those memory operations. Store operations go to the store buffer which is also capable of performing forwarding when needed. Likewise, Load operations come from the load buffer. Skylake features a dedicated 32 KiB level 1 data cache and a dedicated 32 KiB level 1 instruction cache. It also features a core-private 256 KiB L2 cache that is shared by both of the L1 caches. | ||
− | Each core enjoys a slice of a third level of cache that is shared by all the core. | + | Each core enjoys a slice of a third level of cache that is shared by all the core. In the client configuration for Skylake, there are either [[two cores]] or [[four cores]] connected while in the server configuration, up to [[28 cores]] may be hooked together on a single chip. |
{{clear}} | {{clear}} | ||
==== Front-end ==== | ==== Front-end ==== | ||
− | + | Skylake had its front-end bandwidth of µOPs deliver to the execution engine increase. The µOPs Cache now delivers 6 µOPs per clock (previously the cache only delivered 4 µOPs/clock), likewise the decoders now deliver 5 µOPs/clock (previously they were capable of only 4µOPs/clock). The [[branch predictor]] has also been improved. The branch predictor now has reduced penalty (i.e. lower latency) for wrong direct jump target prediction. Additionally, the instruction fetch unit is capable of looking much deeper into the stream of bytes. Finally, the allocation queue which interfaces between the front-end ([[in-order]]) and the [[execution engine]] ([[out-of-order]]) itself has been more than doubled to accommodate 64/thread (from 28/thread in Broadwell). | |
− | ==== | + | ==== Execution engine ==== |
− | + | Like the front-end, the execution engine's ReOrder buffer has been increased to 224 entries (from 192 in Broadwell) in order to extract more [[instruction-level parallelism]]. Likewise the scheduler itself was increased considerably to 97 entries (from 64 in Broadwell). The [[integer]] [[register file]] was also slightly increased from 160 entries to 180. | |
− | [[ | + | The scheduler had its ports rearranged to better balance various instructions. For example, divide and [[sqrt]] instructions latency and throughput were improved. The latency and throughput of [[floating point]] ADD, MUL, and FMA were made uniformed at 4 cycles with a throughput of 2 ops/clock. Likewise the latency of {{x86|AES|AES instructions}} were significantly reduced from 7 cycles down to 4. |
− | + | {| class="wikitable" style="text-align: center;" | |
− | + | |- | |
− | + | ! colspan="8" | Dispatch Ports | |
− | + | |- | |
− | + | ! Port 0 !! Port 1 !! Port 2 !! Port 3 !! Port 4 !! Port 5 !! Port 6 !! Port 7 | |
− | | | + | |- |
+ | | ALU<br>Vec ALU || ALU<br>Fast LEA<br>Vec ALU || Load Addr<br>Store Addr || Load Addr<br>Store Addr || Store Data || ALU<br>Fast LEA<br>Vec ALU || ALU<br>Shift || Store Addr | ||
+ | |- | ||
+ | | Vec Shift<br>Vec Add || Vec Shift<br>Vec Add || || || || Vec Shuffle || Branch || | ||
|- | |- | ||
− | | | + | | Vec Mul<br>FMA || Vec Mul<br>FMA || || || || || || |
− | < | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | | | ||
|- | |- | ||
− | | | + | | DIV || Slow Int || || || || || || |
|- | |- | ||
− | | | + | | Branch2 || Slow LEA || || || || || || |
|} | |} | ||
− | |||
− | + | ==== Execution Units ==== | |
− | + | {| class="wikitable" | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | {| class="wikitable | ||
|- | |- | ||
! colspan="3" | Execution Units | ! colspan="3" | Execution Units | ||
Line 558: | Line 443: | ||
| Slow Int || 1 || mul, imul, bsr, rcl, shld, mulx, pdep, etc... | | Slow Int || 1 || mul, imul, bsr, rcl, shld, mulx, pdep, etc... | ||
|- | |- | ||
− | | Bit Manipulation || 2 || andn, bextr, blsi, blsmsk, bzhi, etc | + | | BM<info>Bit Manipulation</info> || 2 || andn, bextr, blsi, blsmsk, bzhi, etc |
|- | |- | ||
| FP Mov || 1 || (v)movsd/ss, (v)movd gpr | | FP Mov || 1 || (v)movsd/ss, (v)movd gpr | ||
Line 571: | Line 456: | ||
|- | |- | ||
| Vec Mul || 2 || (v)mul*, (v)pmul*, (v)pmadd* | | Vec Mul || 2 || (v)mul*, (v)pmul*, (v)pmadd* | ||
− | |||
− | |||
|} | |} | ||
− | |||
− | |||
− | |||
==== Memory subsystem ==== | ==== Memory subsystem ==== | ||
− | + | Skylake has had its store buffer enlarged to 56 entries (up from 42 in {{\\|Broadwell}}). Special care was taken to reduce the penalty for page-split loads: previously scenarios involving page-split loads were thought to be rarer than they actually are. This was addressed in Skylake with page-split loads are now made equal to other splits loads. Expect page split load penalty down to 5 cycles from 100 cycles in Broadwell. The average latency to forward a load to store has also been improved and stores that miss in the L1$ generate L2$ requests to the next level cache much earlier in Sklake than before. | |
− | Skylake | ||
− | The | + | The bandwidth from L2$ to L3$ has been improved and write bandwidth from L2$ to L3$ has also been increased from 4 cycles/line to 2 cycles/line. |
=== eDRAM architectural changes === | === eDRAM architectural changes === | ||
Line 598: | Line 477: | ||
The new eDRAM changes mean it's no longer architectural - capable of caching any data (including "unreachable memory", display engines, and effectively any memory transfer not bound by software restrictions) and is entirely invisible to software (one exception noted later) in terms of coherency (note that no flushing is thus necessary to maintain coherency), ordering, or other organizational details. For optimal graphics performance, the graphics driver may decide to limit certain memory accesses to only the eDRAM, only the LLC, or in both of them. | The new eDRAM changes mean it's no longer architectural - capable of caching any data (including "unreachable memory", display engines, and effectively any memory transfer not bound by software restrictions) and is entirely invisible to software (one exception noted later) in terms of coherency (note that no flushing is thus necessary to maintain coherency), ordering, or other organizational details. For optimal graphics performance, the graphics driver may decide to limit certain memory accesses to only the eDRAM, only the LLC, or in both of them. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Clock domains == | == Clock domains == | ||
Skylake is divided into a number of [[clock domains]], each controlling the clock frequency of their respective unit in the processor. All clock domains are some multiple of the [virtual] bus clock ([[BCLK]]). | Skylake is divided into a number of [[clock domains]], each controlling the clock frequency of their respective unit in the processor. All clock domains are some multiple of the [virtual] bus clock ([[BCLK]]). | ||
− | * '''BCLK''' - Bus | + | * '''BCLK''' - Bus Clock - The system bus interface frequency (once upon a time referred to the actual [[FSB]] speed, it now serves as only a base clock reference for all other clock domains). The bus clock is 100 MHz. |
* '''Core Clock''' - The frequency at which the core and the [[L1]]/[[L2]] caches operate at. (Frequency depends on the model and is represented as a multiple of BCLK). | * '''Core Clock''' - The frequency at which the core and the [[L1]]/[[L2]] caches operate at. (Frequency depends on the model and is represented as a multiple of BCLK). | ||
* '''Ring Clock''' - The frequency at which the ring interconnect and [[L3$|LLC]] operate at. Data from/to the individual cores are read/written into the L3 at a rate of 32B/cycle operating at Ring Clock frequency. | * '''Ring Clock''' - The frequency at which the ring interconnect and [[L3$|LLC]] operate at. Data from/to the individual cores are read/written into the L3 at a rate of 32B/cycle operating at Ring Clock frequency. | ||
Line 666: | Line 489: | ||
[[File:skylake soc clock domain block diagram.svg|850px]] | [[File:skylake soc clock domain block diagram.svg|850px]] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Die == | == Die == | ||
− | + | === Client Die === | |
Skylake desktop and mobile come and [[2 cores|2]] and [[4 cores|4]] cores. Each variant has its own die. One of the most noticeable changes on die is the amount of die space allocated to the [[GPU]]. The major components of the die is: | Skylake desktop and mobile come and [[2 cores|2]] and [[4 cores|4]] cores. Each variant has its own die. One of the most noticeable changes on die is the amount of die space allocated to the [[GPU]]. The major components of the die is: | ||
Line 807: | Line 499: | ||
* Memory Controller | * Memory Controller | ||
− | === System Agent === | + | ==== System Agent ==== |
The System Agent (SA) contains the Image Processing Unit (IPU), the Display Engine (DE), the I/O bus and various other shared functionality. Note that the mainstream desktop (i.e., [[quad-core]] die) does not have an IPU (The memory controller actually occupies a portion of where it would otherwise be). | The System Agent (SA) contains the Image Processing Unit (IPU), the Display Engine (DE), the I/O bus and various other shared functionality. Note that the mainstream desktop (i.e., [[quad-core]] die) does not have an IPU (The memory controller actually occupies a portion of where it would otherwise be). | ||
Line 824: | Line 516: | ||
{{clear}} | {{clear}} | ||
− | = | + | ==== Integrated Graphics ==== |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | === Integrated Graphics === | ||
The [[integrated graphics]] takes up the largest portion of the die. The normal [[dual-core]] and [[quad-core]] dies come with 24 EU {{\\|Gen9.5}} GPU (with 12 units disabled on the low end models). | The [[integrated graphics]] takes up the largest portion of the die. The normal [[dual-core]] and [[quad-core]] dies come with 24 EU {{\\|Gen9.5}} GPU (with 12 units disabled on the low end models). | ||
Line 859: | Line 525: | ||
{{clear}} | {{clear}} | ||
− | === Dual-core === | + | ==== Dual-core ==== |
− | Die shot of the [[dual-core]] | + | Die shot of the [[dual-core]] Skylake processors. Those are found in mobile models, and entry-level/budget processors: |
* [[14 nm process]] | * [[14 nm process]] | ||
* 11 metal layers | * 11 metal layers | ||
* ~1,750,000,000 transistors | * ~1,750,000,000 transistors | ||
− | * ~ | + | * ~95.33 mm² |
− | |||
− | |||
: [[File:skylake (dual core).png|650px]] | : [[File:skylake (dual core).png|650px]] | ||
Line 874: | Line 538: | ||
: [[File:skylake (dual core) (annotated).png|650px]] | : [[File:skylake (dual core) (annotated).png|650px]] | ||
− | === Quad-core === | + | ==== Quad-core ==== |
− | Die shot of the [[quad-core]] | + | Die shot of the [[quad-core]] Skyllake processors. Those are found in almost all mainstream desktop processors. |
* [[14 nm process]] | * [[14 nm process]] | ||
* 11 metal layers | * 11 metal layers | ||
− | + | * ~122 mm² | |
− | * ~122 | ||
− | |||
− | : [[File:skylake (quad-core).png | + | : [[File:skylake (quad-core).png|650px]] |
: [[File:skylake (quad-core) (annotated).png|650px]] | : [[File:skylake (quad-core) (annotated).png|650px]] | ||
+ | |||
+ | === Server Die === | ||
+ | Skylake Server class models consist of 3 different dies: Low Core Count (LCC), Medium Core Count (MCC), and High Core Count (HCC). | ||
+ | |||
+ | ==== High Core Count (HCC) ==== | ||
+ | * [[14 nm process]] | ||
+ | * [[28 cores]] | ||
+ | [[File:skylake-ep-hcc die shot.png|650px]] | ||
+ | |||
+ | == Added instructions == | ||
+ | '''{{x86|SGX}}''' - Software Guard Extensions | ||
+ | |||
+ | {| class="wikitable collapsible collapsed" | ||
+ | ! Full list | ||
+ | |- | ||
+ | | | ||
+ | {{collist | ||
+ | | count = 4 | ||
+ | | width = 650px | ||
+ | | | ||
+ | * {{x86|AEX}} | ||
+ | * {{x86|EACCEPT}} | ||
+ | * {{x86|EACCEPTCOPY}} | ||
+ | * {{x86|EADD}} | ||
+ | * {{x86|EAUG}} | ||
+ | * {{x86|EBLOCK}} | ||
+ | * {{x86|ECREATE}} | ||
+ | * {{x86|EDBGRD}} | ||
+ | * {{x86|EDBGWR}} | ||
+ | * {{x86|EENTER}} | ||
+ | * {{x86|EEXIT}} | ||
+ | * {{x86|EEXTEND}} | ||
+ | * {{x86|EGETKEY}} | ||
+ | * {{x86|EINIT}} | ||
+ | * {{x86|ELDB}} | ||
+ | * {{x86|ELDU}} | ||
+ | * {{x86|EMODPE}} | ||
+ | * {{x86|EMODPR}} | ||
+ | * {{x86|EMODT}} | ||
+ | * {{x86|EPA}} | ||
+ | * {{x86|EREMOVE}} | ||
+ | * {{x86|EREPORT}} | ||
+ | * {{x86|ERESUME}} | ||
+ | * {{x86|ETRACK}} | ||
+ | * {{x86|EWB}} | ||
+ | }} | ||
+ | |} | ||
+ | |||
+ | '''{{x86|MPX}}''' - Memory Protection Extensions | ||
+ | |||
+ | {| class="wikitable collapsible collapsed" | ||
+ | ! Full list | ||
+ | |- | ||
+ | | | ||
+ | {{collist | ||
+ | | count = 4 | ||
+ | | width = 650px | ||
+ | | | ||
+ | * {{x86|BNDCL}} | ||
+ | * {{x86|BNDCN}} | ||
+ | * {{x86|BNDCU}} | ||
+ | * {{x86|BNDLDX}} | ||
+ | * {{x86|BNDMK}} | ||
+ | * {{x86|BNDMOV}} | ||
+ | * {{x86|BNDSTX}} | ||
+ | }} | ||
+ | |} | ||
+ | |||
+ | '''{{x86|AVX-512}}''' - Advanced Vector Extensions 512; These instructions can only be found on selected high-end {{intel|Xeon}} models (codename '''SKX''') | ||
+ | |||
+ | {| class="wikitable collapsible collapsed" | ||
+ | ! Full list | ||
+ | |- | ||
+ | | | ||
+ | {{collist | ||
+ | | count = 5 | ||
+ | | width = 850px | ||
+ | | | ||
+ | * {{x86|VADDPD}} | ||
+ | * {{x86|VADDPS}} | ||
+ | * {{x86|VADDSD}} | ||
+ | * {{x86|VADDSS}} | ||
+ | * {{x86|VALIGND}} | ||
+ | * {{x86|VALIGNQ}} | ||
+ | * {{x86|VANDNPD}} | ||
+ | * {{x86|VANDNPS}} | ||
+ | * {{x86|VANDPD}} | ||
+ | * {{x86|VANDPS}} | ||
+ | * {{x86|VBLENDMPD}} | ||
+ | * {{x86|VBLENDMPS}} | ||
+ | * {{x86|VBROADCASTF32X2}} | ||
+ | * {{x86|VBROADCASTF32X4}} | ||
+ | * {{x86|VBROADCASTF32X8}} | ||
+ | * {{x86|VBROADCASTF64X2}} | ||
+ | * {{x86|VBROADCASTF64X4}} | ||
+ | * {{x86|VBROADCASTI32X2}} | ||
+ | * {{x86|VBROADCASTI32X4}} | ||
+ | * {{x86|VBROADCASTI32X8}} | ||
+ | * {{x86|VBROADCASTI64X2}} | ||
+ | * {{x86|VBROADCASTI64X4}} | ||
+ | * {{x86|VBROADCASTSD}} | ||
+ | * {{x86|VBROADCASTSS}} | ||
+ | * {{x86|VCMPPD}} | ||
+ | * {{x86|VCMPPS}} | ||
+ | * {{x86|VCMPSD}} | ||
+ | * {{x86|VCMPSS}} | ||
+ | * {{x86|VCOMISD}} | ||
+ | * {{x86|VCOMISS}} | ||
+ | * {{x86|VCOMPRESSPD}} | ||
+ | * {{x86|VCOMPRESSPS}} | ||
+ | * {{x86|VCVTDQ2PD}} | ||
+ | * {{x86|VCVTDQ2PS}} | ||
+ | * {{x86|VCVTPD2DQ}} | ||
+ | * {{x86|VCVTPD2PS}} | ||
+ | * {{x86|VCVTPD2QQ}} | ||
+ | * {{x86|VCVTPD2UDQ}} | ||
+ | * {{x86|VCVTPD2UQQ}} | ||
+ | * {{x86|VCVTPH2PS}} | ||
+ | * {{x86|VCVTPS2DQ}} | ||
+ | * {{x86|VCVTPS2PD}} | ||
+ | * {{x86|VCVTPS2PH}} | ||
+ | * {{x86|VCVTPS2QQ}} | ||
+ | * {{x86|VCVTPS2UDQ}} | ||
+ | * {{x86|VCVTPS2UQQ}} | ||
+ | * {{x86|VCVTQQ2PD}} | ||
+ | * {{x86|VCVTQQ2PS}} | ||
+ | * {{x86|VCVTSD2SI}} | ||
+ | * {{x86|VCVTSD2SS}} | ||
+ | * {{x86|VCVTSD2USI}} | ||
+ | * {{x86|VCVTSI2SD}} | ||
+ | * {{x86|VCVTSI2SS}} | ||
+ | * {{x86|VCVTSS2SD}} | ||
+ | * {{x86|VCVTSS2SI}} | ||
+ | * {{x86|VCVTSS2USI}} | ||
+ | * {{x86|VCVTTPD2DQ}} | ||
+ | * {{x86|VCVTTPD2QQ}} | ||
+ | * {{x86|VCVTTPD2UDQ}} | ||
+ | * {{x86|VCVTTPD2UQQ}} | ||
+ | * {{x86|VCVTTPS2DQ}} | ||
+ | * {{x86|VCVTTPS2QQ}} | ||
+ | * {{x86|VCVTTPS2UDQ}} | ||
+ | * {{x86|VCVTTPS2UQQ}} | ||
+ | * {{x86|VCVTTSD2SI}} | ||
+ | * {{x86|VCVTTSD2USI}} | ||
+ | * {{x86|VCVTTSS2SI}} | ||
+ | * {{x86|VCVTTSS2USI}} | ||
+ | * {{x86|VCVTUDQ2PD}} | ||
+ | * {{x86|VCVTUDQ2PS}} | ||
+ | * {{x86|VCVTUQQ2PD}} | ||
+ | * {{x86|VCVTUQQ2PS}} | ||
+ | * {{x86|VCVTUSI2SD}} | ||
+ | * {{x86|VCVTUSI2SS}} | ||
+ | * {{x86|VDBPSADBW}} | ||
+ | * {{x86|VDIVPD}} | ||
+ | * {{x86|VDIVPS}} | ||
+ | * {{x86|VDIVSD}} | ||
+ | * {{x86|VDIVSS}} | ||
+ | * {{x86|VEXP2PD}} | ||
+ | * {{x86|VEXP2PS}} | ||
+ | * {{x86|VEXPANDPD}} | ||
+ | * {{x86|VEXPANDPS}} | ||
+ | * {{x86|VEXTRACTF32X4}} | ||
+ | * {{x86|VEXTRACTF32X8}} | ||
+ | * {{x86|VEXTRACTF64X2}} | ||
+ | * {{x86|VEXTRACTF64X4}} | ||
+ | * {{x86|VEXTRACTI32X4}} | ||
+ | * {{x86|VEXTRACTI32X8}} | ||
+ | * {{x86|VEXTRACTI64X2}} | ||
+ | * {{x86|VEXTRACTI64X4}} | ||
+ | * {{x86|VEXTRACTPS}} | ||
+ | * {{x86|VFIXUPIMMPD}} | ||
+ | * {{x86|VFIXUPIMMPS}} | ||
+ | * {{x86|VFIXUPIMMSD}} | ||
+ | * {{x86|VFIXUPIMMSS}} | ||
+ | * {{x86|VFMADD132PD}} | ||
+ | * {{x86|VFMADD132PS}} | ||
+ | * {{x86|VFMADD132SD}} | ||
+ | * {{x86|VFMADD132SS}} | ||
+ | * {{x86|VFMADD213PD}} | ||
+ | * {{x86|VFMADD213PS}} | ||
+ | * {{x86|VFMADD213SD}} | ||
+ | * {{x86|VFMADD213SS}} | ||
+ | * {{x86|VFMADD231PD}} | ||
+ | * {{x86|VFMADD231PS}} | ||
+ | * {{x86|VFMADD231SD}} | ||
+ | * {{x86|VFMADD231SS}} | ||
+ | * {{x86|VFMADDSUB132PD}} | ||
+ | * {{x86|VFMADDSUB132PS}} | ||
+ | * {{x86|VFMADDSUB213PD}} | ||
+ | * {{x86|VFMADDSUB213PS}} | ||
+ | * {{x86|VFMADDSUB231PD}} | ||
+ | * {{x86|VFMADDSUB231PS}} | ||
+ | * {{x86|VFMSUB132PD}} | ||
+ | * {{x86|VFMSUB132PS}} | ||
+ | * {{x86|VFMSUB132SD}} | ||
+ | * {{x86|VFMSUB132SS}} | ||
+ | * {{x86|VFMSUB213PD}} | ||
+ | * {{x86|VFMSUB213PS}} | ||
+ | * {{x86|VFMSUB213SD}} | ||
+ | * {{x86|VFMSUB213SS}} | ||
+ | * {{x86|VFMSUB231PD}} | ||
+ | * {{x86|VFMSUB231PS}} | ||
+ | * {{x86|VFMSUB231SD}} | ||
+ | * {{x86|VFMSUB231SS}} | ||
+ | * {{x86|VFMSUBADD132PD}} | ||
+ | * {{x86|VFMSUBADD132PS}} | ||
+ | * {{x86|VFMSUBADD213PD}} | ||
+ | * {{x86|VFMSUBADD213PS}} | ||
+ | * {{x86|VFMSUBADD231PD}} | ||
+ | * {{x86|VFMSUBADD231PS}} | ||
+ | * {{x86|VFNMADD132PD}} | ||
+ | * {{x86|VFNMADD132PS}} | ||
+ | * {{x86|VFNMADD132SD}} | ||
+ | * {{x86|VFNMADD132SS}} | ||
+ | * {{x86|VFNMADD213PD}} | ||
+ | * {{x86|VFNMADD213PS}} | ||
+ | * {{x86|VFNMADD213SD}} | ||
+ | * {{x86|VFNMADD213SS}} | ||
+ | * {{x86|VFNMADD231PD}} | ||
+ | * {{x86|VFNMADD231PS}} | ||
+ | * {{x86|VFNMADD231SD}} | ||
+ | * {{x86|VFNMADD231SS}} | ||
+ | * {{x86|VFNMSUB132PD}} | ||
+ | * {{x86|VFNMSUB132PS}} | ||
+ | * {{x86|VFNMSUB132SD}} | ||
+ | * {{x86|VFNMSUB132SS}} | ||
+ | * {{x86|VFNMSUB213PD}} | ||
+ | * {{x86|VFNMSUB213PS}} | ||
+ | * {{x86|VFNMSUB213SD}} | ||
+ | * {{x86|VFNMSUB213SS}} | ||
+ | * {{x86|VFNMSUB231PD}} | ||
+ | * {{x86|VFNMSUB231PS}} | ||
+ | * {{x86|VFNMSUB231SD}} | ||
+ | * {{x86|VFNMSUB231SS}} | ||
+ | * {{x86|VFPCLASSPD}} | ||
+ | * {{x86|VFPCLASSPS}} | ||
+ | * {{x86|VFPCLASSSD}} | ||
+ | * {{x86|VFPCLASSSS}} | ||
+ | * {{x86|VGATHERDPD}} | ||
+ | * {{x86|VGATHERDPS}} | ||
+ | * {{x86|VGATHERPF0DPD}} | ||
+ | * {{x86|VGATHERPF0DPS}} | ||
+ | * {{x86|VGATHERPF0QPD}} | ||
+ | * {{x86|VGATHERPF0QPS}} | ||
+ | * {{x86|VGATHERPF1DPD}} | ||
+ | * {{x86|VGATHERPF1DPS}} | ||
+ | * {{x86|VGATHERPF1QPD}} | ||
+ | * {{x86|VGATHERPF1QPS}} | ||
+ | * {{x86|VGATHERQPD}} | ||
+ | * {{x86|VGATHERQPS}} | ||
+ | * {{x86|VGETEXPPD}} | ||
+ | * {{x86|VGETEXPPS}} | ||
+ | * {{x86|VGETEXPSD}} | ||
+ | * {{x86|VGETEXPSS}} | ||
+ | * {{x86|VGETMANTPD}} | ||
+ | * {{x86|VGETMANTPS}} | ||
+ | * {{x86|VGETMANTSD}} | ||
+ | * {{x86|VGETMANTSS}} | ||
+ | * {{x86|VINSERTF32X4}} | ||
+ | * {{x86|VINSERTF32X8}} | ||
+ | * {{x86|VINSERTF64X2}} | ||
+ | * {{x86|VINSERTF64X4}} | ||
+ | * {{x86|VINSERTI32X4}} | ||
+ | * {{x86|VINSERTI32X8}} | ||
+ | * {{x86|VINSERTI64X2}} | ||
+ | * {{x86|VINSERTI64X4}} | ||
+ | * {{x86|VINSERTPS}} | ||
+ | * {{x86|VMAXPD}} | ||
+ | * {{x86|VMAXPS}} | ||
+ | * {{x86|VMAXSD}} | ||
+ | * {{x86|VMAXSS}} | ||
+ | * {{x86|VMINPD}} | ||
+ | * {{x86|VMINPS}} | ||
+ | * {{x86|VMINSD}} | ||
+ | * {{x86|VMINSS}} | ||
+ | * {{x86|VMOVAPD}} | ||
+ | * {{x86|VMOVAPS}} | ||
+ | * {{x86|VMOVD}} | ||
+ | * {{x86|VMOVDDUP}} | ||
+ | * {{x86|VMOVDQA32}} | ||
+ | * {{x86|VMOVDQA64}} | ||
+ | * {{x86|VMOVDQU16}} | ||
+ | * {{x86|VMOVDQU32}} | ||
+ | * {{x86|VMOVDQU64}} | ||
+ | * {{x86|VMOVDQU8}} | ||
+ | * {{x86|VMOVHLPS}} | ||
+ | * {{x86|VMOVHPD}} | ||
+ | * {{x86|VMOVHPS}} | ||
+ | * {{x86|VMOVLHPS}} | ||
+ | * {{x86|VMOVLPD}} | ||
+ | * {{x86|VMOVLPS}} | ||
+ | * {{x86|VMOVNTDQ}} | ||
+ | * {{x86|VMOVNTDQA}} | ||
+ | * {{x86|VMOVNTPD}} | ||
+ | * {{x86|VMOVNTPS}} | ||
+ | * {{x86|VMOVQ}} | ||
+ | * {{x86|VMOVSD}} | ||
+ | * {{x86|VMOVSHDUP}} | ||
+ | * {{x86|VMOVSLDUP}} | ||
+ | * {{x86|VMOVSS}} | ||
+ | * {{x86|VMOVUPD}} | ||
+ | * {{x86|VMOVUPS}} | ||
+ | * {{x86|VMULPD}} | ||
+ | * {{x86|VMULPS}} | ||
+ | * {{x86|VMULSD}} | ||
+ | * {{x86|VMULSS}} | ||
+ | * {{x86|VORPD}} | ||
+ | * {{x86|VORPS}} | ||
+ | * {{x86|VPABSB}} | ||
+ | * {{x86|VPABSD}} | ||
+ | * {{x86|VPABSQ}} | ||
+ | * {{x86|VPABSW}} | ||
+ | * {{x86|VPACKSSDW}} | ||
+ | * {{x86|VPACKSSWB}} | ||
+ | * {{x86|VPACKUSDW}} | ||
+ | * {{x86|VPACKUSWB}} | ||
+ | * {{x86|VPADDB}} | ||
+ | * {{x86|VPADDD}} | ||
+ | * {{x86|VPADDQ}} | ||
+ | * {{x86|VPADDSB}} | ||
+ | * {{x86|VPADDSW}} | ||
+ | * {{x86|VPADDUSB}} | ||
+ | * {{x86|VPADDUSW}} | ||
+ | * {{x86|VPADDW}} | ||
+ | * {{x86|VPALIGNR}} | ||
+ | * {{x86|VPANDD}} | ||
+ | * {{x86|VPANDND}} | ||
+ | * {{x86|VPANDNQ}} | ||
+ | * {{x86|VPANDQ}} | ||
+ | * {{x86|VPAVGB}} | ||
+ | * {{x86|VPAVGW}} | ||
+ | * {{x86|VPBLENDMB}} | ||
+ | * {{x86|VPBLENDMD}} | ||
+ | * {{x86|VPBLENDMQ}} | ||
+ | * {{x86|VPBLENDMW}} | ||
+ | * {{x86|VPBROADCASTB}} | ||
+ | * {{x86|VPBROADCASTD}} | ||
+ | * {{x86|VPBROADCASTMB2Q}} | ||
+ | * {{x86|VPBROADCASTMW2D}} | ||
+ | * {{x86|VPBROADCASTQ}} | ||
+ | * {{x86|VPBROADCASTW}} | ||
+ | * {{x86|VPCMPB}} | ||
+ | * {{x86|VPCMPD}} | ||
+ | * {{x86|VPCMPEQB}} | ||
+ | * {{x86|VPCMPEQD}} | ||
+ | * {{x86|VPCMPEQQ}} | ||
+ | * {{x86|VPCMPEQW}} | ||
+ | * {{x86|VPCMPGTB}} | ||
+ | * {{x86|VPCMPGTD}} | ||
+ | * {{x86|VPCMPGTQ}} | ||
+ | * {{x86|VPCMPGTW}} | ||
+ | * {{x86|VPCMPQ}} | ||
+ | * {{x86|VPCMPUB}} | ||
+ | * {{x86|VPCMPUD}} | ||
+ | * {{x86|VPCMPUQ}} | ||
+ | * {{x86|VPCMPUW}} | ||
+ | * {{x86|VPCMPW}} | ||
+ | * {{x86|VPCOMPRESSD}} | ||
+ | * {{x86|VPCOMPRESSQ}} | ||
+ | * {{x86|VPCONFLICTD}} | ||
+ | * {{x86|VPCONFLICTQ}} | ||
+ | * {{x86|VPERMB}} | ||
+ | * {{x86|VPERMD}} | ||
+ | * {{x86|VPERMI2B}} | ||
+ | * {{x86|VPERMI2D}} | ||
+ | * {{x86|VPERMI2PD}} | ||
+ | * {{x86|VPERMI2PS}} | ||
+ | * {{x86|VPERMI2Q}} | ||
+ | * {{x86|VPERMI2W}} | ||
+ | * {{x86|VPERMILPD}} | ||
+ | * {{x86|VPERMILPS}} | ||
+ | * {{x86|VPERMPD}} | ||
+ | * {{x86|VPERMPS}} | ||
+ | * {{x86|VPERMQ}} | ||
+ | * {{x86|VPERMT2B}} | ||
+ | * {{x86|VPERMT2D}} | ||
+ | * {{x86|VPERMT2PD}} | ||
+ | * {{x86|VPERMT2PS}} | ||
+ | * {{x86|VPERMT2Q}} | ||
+ | * {{x86|VPERMT2W}} | ||
+ | * {{x86|VPERMW}} | ||
+ | * {{x86|VPEXPANDD}} | ||
+ | * {{x86|VPEXPANDQ}} | ||
+ | * {{x86|VPEXTRB}} | ||
+ | * {{x86|VPEXTRD}} | ||
+ | * {{x86|VPEXTRQ}} | ||
+ | * {{x86|VPEXTRW}} | ||
+ | * {{x86|VPGATHERDD}} | ||
+ | * {{x86|VPGATHERDQ}} | ||
+ | * {{x86|VPGATHERQD}} | ||
+ | * {{x86|VPGATHERQQ}} | ||
+ | * {{x86|VPINSRB}} | ||
+ | * {{x86|VPINSRD}} | ||
+ | * {{x86|VPINSRQ}} | ||
+ | * {{x86|VPINSRW}} | ||
+ | * {{x86|VPLZCNTD}} | ||
+ | * {{x86|VPLZCNTQ}} | ||
+ | * {{x86|VPMADD52HUQ}} | ||
+ | * {{x86|VPMADD52LUQ}} | ||
+ | * {{x86|VPMADDUBSW}} | ||
+ | * {{x86|VPMADDWD}} | ||
+ | * {{x86|VPMAXSB}} | ||
+ | * {{x86|VPMAXSD}} | ||
+ | * {{x86|VPMAXSQ}} | ||
+ | * {{x86|VPMAXSW}} | ||
+ | * {{x86|VPMAXUB}} | ||
+ | * {{x86|VPMAXUD}} | ||
+ | * {{x86|VPMAXUQ}} | ||
+ | * {{x86|VPMAXUW}} | ||
+ | * {{x86|VPMINSB}} | ||
+ | * {{x86|VPMINSD}} | ||
+ | * {{x86|VPMINSQ}} | ||
+ | * {{x86|VPMINSW}} | ||
+ | * {{x86|VPMINUB}} | ||
+ | * {{x86|VPMINUD}} | ||
+ | * {{x86|VPMINUQ}} | ||
+ | * {{x86|VPMINUW}} | ||
+ | * {{x86|VPMOVB2M}} | ||
+ | * {{x86|VPMOVD2M}} | ||
+ | * {{x86|VPMOVDB}} | ||
+ | * {{x86|VPMOVDW}} | ||
+ | * {{x86|VPMOVM2B}} | ||
+ | * {{x86|VPMOVM2D}} | ||
+ | * {{x86|VPMOVM2Q}} | ||
+ | * {{x86|VPMOVM2W}} | ||
+ | * {{x86|VPMOVQ2M}} | ||
+ | * {{x86|VPMOVQB}} | ||
+ | * {{x86|VPMOVQD}} | ||
+ | * {{x86|VPMOVQW}} | ||
+ | * {{x86|VPMOVSDB}} | ||
+ | * {{x86|VPMOVSDW}} | ||
+ | * {{x86|VPMOVSQB}} | ||
+ | * {{x86|VPMOVSQD}} | ||
+ | * {{x86|VPMOVSQW}} | ||
+ | * {{x86|VPMOVSWB}} | ||
+ | * {{x86|VPMOVSXBD}} | ||
+ | * {{x86|VPMOVSXBQ}} | ||
+ | * {{x86|VPMOVSXBW}} | ||
+ | * {{x86|VPMOVSXDQ}} | ||
+ | * {{x86|VPMOVSXWD}} | ||
+ | * {{x86|VPMOVSXWQ}} | ||
+ | * {{x86|VPMOVUSDB}} | ||
+ | * {{x86|VPMOVUSDW}} | ||
+ | * {{x86|VPMOVUSQB}} | ||
+ | * {{x86|VPMOVUSQD}} | ||
+ | * {{x86|VPMOVUSQW}} | ||
+ | * {{x86|VPMOVUSWB}} | ||
+ | * {{x86|VPMOVW2M}} | ||
+ | * {{x86|VPMOVWB}} | ||
+ | * {{x86|VPMOVZXBD}} | ||
+ | * {{x86|VPMOVZXBQ}} | ||
+ | * {{x86|VPMOVZXBW}} | ||
+ | * {{x86|VPMOVZXDQ}} | ||
+ | * {{x86|VPMOVZXWD}} | ||
+ | * {{x86|VPMOVZXWQ}} | ||
+ | * {{x86|VPMULDQ}} | ||
+ | * {{x86|VPMULHRSW}} | ||
+ | * {{x86|VPMULHUW}} | ||
+ | * {{x86|VPMULHW}} | ||
+ | * {{x86|VPMULLD}} | ||
+ | * {{x86|VPMULLQ}} | ||
+ | * {{x86|VPMULLW}} | ||
+ | * {{x86|VPMULTISHIFTQB}} | ||
+ | * {{x86|VPMULUDQ}} | ||
+ | * {{x86|VPORD}} | ||
+ | * {{x86|VPORQ}} | ||
+ | * {{x86|VPROLD}} | ||
+ | * {{x86|VPROLQ}} | ||
+ | * {{x86|VPROLVD}} | ||
+ | * {{x86|VPROLVQ}} | ||
+ | * {{x86|VPRORD}} | ||
+ | * {{x86|VPRORQ}} | ||
+ | * {{x86|VPRORVD}} | ||
+ | * {{x86|VPRORVQ}} | ||
+ | * {{x86|VPSADBW}} | ||
+ | * {{x86|VPSCATTERDD}} | ||
+ | * {{x86|VPSCATTERDQ}} | ||
+ | * {{x86|VPSCATTERQD}} | ||
+ | * {{x86|VPSCATTERQQ}} | ||
+ | * {{x86|VPSHUFB}} | ||
+ | * {{x86|VPSHUFD}} | ||
+ | * {{x86|VPSHUFHW}} | ||
+ | * {{x86|VPSHUFLW}} | ||
+ | * {{x86|VPSLLD}} | ||
+ | * {{x86|VPSLLDQ}} | ||
+ | * {{x86|VPSLLQ}} | ||
+ | * {{x86|VPSLLVD}} | ||
+ | * {{x86|VPSLLVQ}} | ||
+ | * {{x86|VPSLLVW}} | ||
+ | * {{x86|VPSLLW}} | ||
+ | * {{x86|VPSRAD}} | ||
+ | * {{x86|VPSRAQ}} | ||
+ | * {{x86|VPSRAVD}} | ||
+ | * {{x86|VPSRAVQ}} | ||
+ | * {{x86|VPSRAVW}} | ||
+ | * {{x86|VPSRAW}} | ||
+ | * {{x86|VPSRLD}} | ||
+ | * {{x86|VPSRLDQ}} | ||
+ | * {{x86|VPSRLQ}} | ||
+ | * {{x86|VPSRLVD}} | ||
+ | * {{x86|VPSRLVQ}} | ||
+ | * {{x86|VPSRLVW}} | ||
+ | * {{x86|VPSRLW}} | ||
+ | * {{x86|VPSUBB}} | ||
+ | * {{x86|VPSUBD}} | ||
+ | * {{x86|VPSUBQ}} | ||
+ | * {{x86|VPSUBSB}} | ||
+ | * {{x86|VPSUBSW}} | ||
+ | * {{x86|VPSUBUSB}} | ||
+ | * {{x86|VPSUBUSW}} | ||
+ | * {{x86|VPSUBW}} | ||
+ | * {{x86|VPTERNLOGD}} | ||
+ | * {{x86|VPTERNLOGQ}} | ||
+ | * {{x86|VPTESTMB}} | ||
+ | * {{x86|VPTESTMD}} | ||
+ | * {{x86|VPTESTMQ}} | ||
+ | * {{x86|VPTESTMW}} | ||
+ | * {{x86|VPTESTNMB}} | ||
+ | * {{x86|VPTESTNMD}} | ||
+ | * {{x86|VPTESTNMQ}} | ||
+ | * {{x86|VPTESTNMW}} | ||
+ | * {{x86|VPUNPCKHBW}} | ||
+ | * {{x86|VPUNPCKHDQ}} | ||
+ | * {{x86|VPUNPCKHQDQ}} | ||
+ | * {{x86|VPUNPCKHWD}} | ||
+ | * {{x86|VPUNPCKLBW}} | ||
+ | * {{x86|VPUNPCKLDQ}} | ||
+ | * {{x86|VPUNPCKLQDQ}} | ||
+ | * {{x86|VPUNPCKLWD}} | ||
+ | * {{x86|VPXORD}} | ||
+ | * {{x86|VPXORQ}} | ||
+ | * {{x86|VRANGEPD}} | ||
+ | * {{x86|VRANGEPS}} | ||
+ | * {{x86|VRANGESD}} | ||
+ | * {{x86|VRANGESS}} | ||
+ | * {{x86|VRCP14PD}} | ||
+ | * {{x86|VRCP14PS}} | ||
+ | * {{x86|VRCP14SD}} | ||
+ | * {{x86|VRCP14SS}} | ||
+ | * {{x86|VRCP28PD}} | ||
+ | * {{x86|VRCP28PS}} | ||
+ | * {{x86|VRCP28SD}} | ||
+ | * {{x86|VRCP28SS}} | ||
+ | * {{x86|VREDUCEPD}} | ||
+ | * {{x86|VREDUCEPS}} | ||
+ | * {{x86|VREDUCESD}} | ||
+ | * {{x86|VREDUCESS}} | ||
+ | * {{x86|VRNDSCALEPD}} | ||
+ | * {{x86|VRNDSCALEPS}} | ||
+ | * {{x86|VRNDSCALESD}} | ||
+ | * {{x86|VRNDSCALESS}} | ||
+ | * {{x86|VRSQRT14PD}} | ||
+ | * {{x86|VRSQRT14PS}} | ||
+ | * {{x86|VRSQRT14SD}} | ||
+ | * {{x86|VRSQRT14SS}} | ||
+ | * {{x86|VRSQRT28PD}} | ||
+ | * {{x86|VRSQRT28PS}} | ||
+ | * {{x86|VRSQRT28SD}} | ||
+ | * {{x86|VRSQRT28SS}} | ||
+ | * {{x86|VSCALEFPD}} | ||
+ | * {{x86|VSCALEFPS}} | ||
+ | * {{x86|VSCALEFSD}} | ||
+ | * {{x86|VSCALEFSS}} | ||
+ | * {{x86|VSCATTERDPD}} | ||
+ | * {{x86|VSCATTERDPS}} | ||
+ | * {{x86|VSCATTERPF0DPD}} | ||
+ | * {{x86|VSCATTERPF0DPS}} | ||
+ | * {{x86|VSCATTERPF0QPD}} | ||
+ | * {{x86|VSCATTERPF0QPS}} | ||
+ | * {{x86|VSCATTERPF1DPD}} | ||
+ | * {{x86|VSCATTERPF1DPS}} | ||
+ | * {{x86|VSCATTERPF1QPD}} | ||
+ | * {{x86|VSCATTERPF1QPS}} | ||
+ | * {{x86|VSCATTERQPD}} | ||
+ | * {{x86|VSCATTERQPS}} | ||
+ | * {{x86|VSHUFF32X4}} | ||
+ | * {{x86|VSHUFF64X2}} | ||
+ | * {{x86|VSHUFI32X4}} | ||
+ | * {{x86|VSHUFI64X2}} | ||
+ | * {{x86|VSHUFPD}} | ||
+ | * {{x86|VSHUFPS}} | ||
+ | * {{x86|VSQRTPD}} | ||
+ | * {{x86|VSQRTPS}} | ||
+ | * {{x86|VSQRTSD}} | ||
+ | * {{x86|VSQRTSS}} | ||
+ | * {{x86|VSUBPD}} | ||
+ | * {{x86|VSUBPS}} | ||
+ | * {{x86|VSUBSD}} | ||
+ | * {{x86|VSUBSS}} | ||
+ | * {{x86|VUCOMISD}} | ||
+ | * {{x86|VUCOMISS}} | ||
+ | * {{x86|VUNPCKHPD}} | ||
+ | * {{x86|VUNPCKHPS}} | ||
+ | * {{x86|VUNPCKLPD}} | ||
+ | * {{x86|VUNPCKLPS}} | ||
+ | * {{x86|VXORPD}} | ||
+ | * {{x86|VXORPS}} | ||
+ | }} | ||
+ | |} | ||
+ | |||
+ | == Cores == | ||
+ | {{empty section}} | ||
== All Skylake Chips == | == All Skylake Chips == | ||
Line 894: | Line 1,158: | ||
created and tagged accordingly. | created and tagged accordingly. | ||
− | Missing a chip? please dump its name here: | + | Missing a chip? please dump its name here: http://en.wikichip.org/wiki/WikiChip:wanted_chips |
--> | --> | ||
− | + | <table class="wikitable sortable"> | |
− | <table class=" | + | <tr><th colspan="12" style="background:#D6D6FF;">Skylake Chips</th></tr> |
− | <tr | + | <tr><th colspan="9">Main processor</th><th colspan="3">IGP</th></tr> |
− | <tr | + | <tr><th>Model</th><th>µarch</th><th>Platform</th><th>Core</th><th>Launched</th><th>SDP</th><th>TDP</th><th>Freq</th><th>Max Mem</th><th>Name</th><th>Freq</th><th>Max Freq</th></tr> |
− | + | {{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Skylake]] | |
− | < | ||
− | {{#ask: [[Category:microprocessor models by intel]] [[instance of::microprocessor]] [[microarchitecture::Skylake | ||
|?full page name | |?full page name | ||
|?model number | |?model number | ||
+ | |?microarchitecture | ||
+ | |?platform | ||
+ | |?core name | ||
|?first launched | |?first launched | ||
− | |? | + | |?sdp |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|?tdp | |?tdp | ||
|?base frequency#GHz | |?base frequency#GHz | ||
− | + | |?max memory#GB | |
− | |||
− | |||
− | |||
− | |?max memory# | ||
|?integrated gpu | |?integrated gpu | ||
|?integrated gpu base frequency | |?integrated gpu base frequency | ||
|?integrated gpu max frequency | |?integrated gpu max frequency | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|format=template | |format=template | ||
− | |template=proc table | + | |template=proc table 2 |
|searchlabel= | |searchlabel= | ||
− | + | |userparam=13 | |
− | |||
− | |userparam= | ||
|mainlabel=- | |mainlabel=- | ||
− | |||
}} | }} | ||
− | {{ | + | <tr><th colspan="12">Count: {{#ask:[[Category:microprocessor models by intel]][[instance of::microprocessor]][[microarchitecture::Skylake]]|format=count}}</th></tr> |
</table> | </table> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Documents == | == Documents == | ||
Line 957: | Line 1,193: | ||
* [[:File:6th Gen Intel® Core™ vPro™ Processor Family Product Brief.pdf|6th Gen Intel® Core™ vPro™ Processor Family Product Brief]] | * [[:File:6th Gen Intel® Core™ vPro™ Processor Family Product Brief.pdf|6th Gen Intel® Core™ vPro™ Processor Family Product Brief]] | ||
* [[:File:6th Generation Intel® Core™ Desktop Processors i7-6700K and i5-6600K Product Brief.pdf|6th Generation Intel® Core™ Desktop Processors i7-6700K and i5-6600K Product Brief]] | * [[:File:6th Generation Intel® Core™ Desktop Processors i7-6700K and i5-6600K Product Brief.pdf|6th Generation Intel® Core™ Desktop Processors i7-6700K and i5-6600K Product Brief]] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
== See also == | == See also == | ||
* AMD {{amd|Zen}} | * AMD {{amd|Zen}} |
Facts about "Skylake (client) - Microarchitectures - Intel"
codename | Skylake (client) + |
core count | 2 + and 4 + |
designer | Intel + |
first launched | August 5, 2015 + |
full page name | intel/microarchitectures/skylake (client) + |
instance of | microarchitecture + |
instruction set architecture | x86-64 + |
manufacturer | Intel + |
microarchitecture type | CPU + |
name | Skylake (client) + |
pipeline stages (max) | 19 + |
pipeline stages (min) | 14 + |
process | 14 nm (0.014 μm, 1.4e-5 mm) + |