Editing intel/microarchitectures/palm cove

{{intel title|Palm Cove|arch}}
{{microarchitecture
|atype=CPU
|name=Palm Cove
|designer=Intel
|manufacturer=Intel
|introduction=2018
|process=10 nm
|isa=x86-64
|predecessor=Skylake
|predecessor link=intel/microarchitectures/skylake
|successor=Sunny Cove
|successor link=intel/microarchitectures/sunny cove
}}
'''Palm Cove''' is a high-performance [[10 nm]] [[x86]] core microarchitecture designed by [[Intel]] for an array of server and client products.

== Process Technology ==
Palm Cove is designed to take advantage of Intel's [[10 nm process]].

== Architecture ==
=== Key changes from {{\\|Skylake (Server)}}===
* [[10 nm process]] (From [[14 nm]])
{{expand list}}

==== New instructions ====
Cannon Lake introduced a number of {{x86|extensions|new instructions}}:

* {{x86|AVX-512|<code>AVX-512</code>}}, specifically:
** {{x86|AVX512F|<code>AVX512F</code>}} - AVX-512 Foundation
** {{x86|AVX512CD|<code>AVX512CD</code>}} - AVX-512 Conflict Detection
** {{x86|AVX512BW|<code>AVX512BW</code>}} - AVX-512 Byte and Word
** {{x86|AVX512DQ|<code>AVX512DQ</code>}} - AVX-512 Doubleword and Quadword 
** {{x86|AVX512VL|<code>AVX512VL</code>}} - AVX-512 Vector Length
** {{x86|AVX512IFMA|<code>AVX512IFMA</code>}} - AVX-512 Integer Fused Multiply-Add
** {{x86|AVX512VBMI|<code>AVX512VBMI</code>}} - AVX-512 Vector Bit Manipulation
* {{x86|SHA|<code>SHA</code>}} - [[Hardware acceleration]] for SHA hashing operations
* {{x86|UMIP|<code>UMIP</code>}} - User-Mode Instruction Prevention extension

== Overview ==
Palm Cove is the core microarchitecture that is found in Intel's {{\\|Cannon Lake}} SoCs. Although originally intended to be mass manufactured for all client and server markets, due to Intel's prolong [[10 nm process]] problems, Palm Cove is getting skipped with the exception of a single chip.

== See also ==
* {{intel|Cannon Lake|l=arch}}
@@ Line 7: / Line 7: @@
 |introduction=2018
 |process=10 nm
-|cores=2
-|type=Superscalar
-|oooe=Yes
-|speculative=Yes
-|renaming=Yes
-|stages min=14
-|stages max=19
 |isa=x86-64
-|extension=MOVBE
-|extension 2=MMX
-|extension 3=SSE
-|extension 4=SSE2
-|extension 5=SSE3
-|extension 6=SSSE3
-|extension 7=SSE4.1
-|extension 8=SSE4.2
-|extension 9=POPCNT
-|extension 10=AVX
-|extension 11=AVX2
-|extension 12=AES
-|extension 13=PCLMUL
-|extension 14=FSGSBASE
-|extension 15=RDRND
-|extension 16=FMA3
-|extension 17=F16C
-|extension 18=BMI
-|extension 19=BMI2
-|extension 20=VT-x
-|extension 21=VT-d
-|extension 22=TXT
-|extension 23=TSX
-|extension 24=RDSEED
-|extension 25=ADCX
-|extension 26=PREFETCHW
-|extension 27=CLFLUSHOPT
-|extension 28=XSAVE
-|extension 29=SGX
-|extension 30=MPX
-|extension 31=AVX-512
-|l1i=32 KiB
-|l1i per=core
-|l1i desc=8-way set associative
-|l1d=32 KiB
-|l1d per=core
-|l1d desc=8-way set associative
-|l2=256 KiB
-|l2 per=core
-|l2 desc=4-way set associative
-|l3=2 MiB
-|l3 per=core
-|l3 desc=16-way set associative
 |predecessor=Skylake
 |predecessor link=intel/microarchitectures/skylake
@@ Line 69: / Line 19: @@
 == Architecture ==
-=== Key changes from {{\\|Skylake (Client)}}===
+=== Key changes from {{\\|Skylake (Server)}}===
 * [[10 nm process]] (From [[14 nm]])
-* Front End
-** LSD is re-enabled (See {{\\|skylake_(server)#Front-end|Skylake § Front-end}} for details)
-** 50% smaller L1 instruction cache 4K page TLB (64-entry, down from 128)
-* Back-end
-** Execution units
-*** Port 4 now performs 512b stores (from 256b)
-*** New 512b FMA unit on Port 0
-*** New iDIV unit
-* Memory subsystem
-** Store is now 64B/cycle (from 32B/cycle)
-** Load is now 2x64B/cycle (from 2x32B/cycle)
 {{expand list}}
@@ Line 98: / Line 36: @@
 * {{x86|SHA|<code>SHA</code>}} - [[Hardware acceleration]] for SHA hashing operations
 * {{x86|UMIP|<code>UMIP</code>}} - User-Mode Instruction Prevention extension
-=== Memory Hierarchy ===
-Other than a few organizational changes (e.g. L2$ went from 8-way to 4-way set associative), the overall memory structure is identical to {{\\|Broadwell}}/{{\\|Haswell}}.
-* Cache
-** L0 µOP cache:
-*** 1,536 µOPs, 8-way set associative
-**** 32 sets, 6-µOP line size
-**** statically divided between threads, per core, inclusive with L1I
-** L1I Cache:
-*** 32 [[KiB]], 8-way set associative
-**** 64 sets, 64 B line size
-**** shared by the two threads, per core
-** L1D Cache:
-*** 32 KiB, 8-way set associative
-*** 64 sets, 64 B line size
-*** shared by the two threads, per core
-*** 4 cycles for fastest load-to-use (simple pointer accesses)
-**** 5 cycles for complex addresses
-*** 128 B/cycle load bandwidth
-*** 64 B/cycle store bandwidth
-*** Write-back policy
-** L2 Cache:
-*** Unified, 256 KiB, 4-way set associative
-*** 1024 sets, 64 B line size
-*** Non-inclusive
-*** 12 cycles for fastest load-to-use
-*** 64 B/cycle bandwidth to L1$
-*** Write-back policy
-** L3 Cache/LLC:
-*** Up to 2 MiB Per core, shared across all cores
-*** Up to 16-way set associative
-*** Inclusive
-*** 64 B line size
-*** Write-back policy
-*** Per each core:
-**** Read: 32 B/cycle (@ ring [[clock]])
-**** Write: 32 B/cycle (@ ring clock)
-*** 42 cycles for fastest load-to-use
-** System [[DRAM]]:
-*** 2 Channels
-*** 8 B/cycle/channel (@ memory clock)
-Palm Cove TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally there is a unified L2 TLB (STLB).
-* TLBs:
-** ITLB
-*** 4 KiB page translations:
-**** 64 entries; 8-way set associative
-**** dynamic partitioning
-*** 2 MiB / 4 MiB page translations:
-**** 8 entries per thread; fully associative
-**** Duplicated for each thread
-** DTLB
-*** 4 KiB page translations:
-**** 64 entries; 4-way set associative
-**** fixed partition
-*** 2 MiB / 4 MiB page translations:
-**** 32 entries; 4-way set associative
-**** fixed partition
-*** 1G page translations:
-**** 4 entries; 4-way set associative
-**** fixed partition
-** STLB
-*** 4 KiB + 2 MiB page translations:
-**** 1536 entries; 12-way set associative
-**** fixed partition
-*** 1 GiB page translations:
-**** 16 entries; 4-way set associative
-**** fixed partition
 == Overview ==
codename	Palm Cove +
core count	2 +
designer	Intel +
first launched	2018 +
full page name	intel/microarchitectures/palm cove +
instance of	microarchitecture +
instruction set architecture	x86-64 +
manufacturer	Intel +
microarchitecture type	CPU +
name	Palm Cove +
pipeline stages (max)	19 +
pipeline stages (min)	14 +
process	10 nm (0.01 μm, 1.0e-5 mm) +