From WikiChip
Editing intel/microarchitectures/palm cove

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 7: Line 7:
 
|introduction=2018
 
|introduction=2018
 
|process=10 nm
 
|process=10 nm
|cores=2
 
|type=Superscalar
 
|oooe=Yes
 
|speculative=Yes
 
|renaming=Yes
 
|stages min=14
 
|stages max=19
 
 
|isa=x86-64
 
|isa=x86-64
|extension=MOVBE
 
|extension 2=MMX
 
|extension 3=SSE
 
|extension 4=SSE2
 
|extension 5=SSE3
 
|extension 6=SSSE3
 
|extension 7=SSE4.1
 
|extension 8=SSE4.2
 
|extension 9=POPCNT
 
|extension 10=AVX
 
|extension 11=AVX2
 
|extension 12=AES
 
|extension 13=PCLMUL
 
|extension 14=FSGSBASE
 
|extension 15=RDRND
 
|extension 16=FMA3
 
|extension 17=F16C
 
|extension 18=BMI
 
|extension 19=BMI2
 
|extension 20=VT-x
 
|extension 21=VT-d
 
|extension 22=TXT
 
|extension 23=TSX
 
|extension 24=RDSEED
 
|extension 25=ADCX
 
|extension 26=PREFETCHW
 
|extension 27=CLFLUSHOPT
 
|extension 28=XSAVE
 
|extension 29=SGX
 
|extension 30=MPX
 
|extension 31=AVX-512
 
|l1i=32 KiB
 
|l1i per=core
 
|l1i desc=8-way set associative
 
|l1d=32 KiB
 
|l1d per=core
 
|l1d desc=8-way set associative
 
|l2=256 KiB
 
|l2 per=core
 
|l2 desc=4-way set associative
 
|l3=2 MiB
 
|l3 per=core
 
|l3 desc=16-way set associative
 
 
|predecessor=Skylake
 
|predecessor=Skylake
 
|predecessor link=intel/microarchitectures/skylake
 
|predecessor link=intel/microarchitectures/skylake
Line 69: Line 19:
  
 
== Architecture ==
 
== Architecture ==
=== Key changes from {{\\|Skylake (Client)}}===
+
=== Key changes from {{\\|Skylake (Server)}}===
 
* [[10 nm process]] (From [[14 nm]])
 
* [[10 nm process]] (From [[14 nm]])
* Front End
 
** LSD is re-enabled (See {{\\|skylake_(server)#Front-end|Skylake § Front-end}} for details)
 
** 50% smaller L1 instruction cache 4K page TLB (64-entry, down from 128)
 
* Back-end
 
** Execution units
 
*** Port 4 now performs 512b stores (from 256b)
 
*** New 512b FMA unit on Port 0
 
*** New iDIV unit
 
* Memory subsystem
 
** Store is now 64B/cycle (from 32B/cycle)
 
** Load is now 2x64B/cycle (from 2x32B/cycle)
 
 
 
{{expand list}}
 
{{expand list}}
  
Line 98: Line 36:
 
* {{x86|SHA|<code>SHA</code>}} - [[Hardware acceleration]] for SHA hashing operations
 
* {{x86|SHA|<code>SHA</code>}} - [[Hardware acceleration]] for SHA hashing operations
 
* {{x86|UMIP|<code>UMIP</code>}} - User-Mode Instruction Prevention extension
 
* {{x86|UMIP|<code>UMIP</code>}} - User-Mode Instruction Prevention extension
 
=== Memory Hierarchy ===
 
Other than a few organizational changes (e.g. L2$ went from 8-way to 4-way set associative), the overall memory structure is identical to {{\\|Broadwell}}/{{\\|Haswell}}.
 
 
* Cache
 
** L0 µOP cache:
 
*** 1,536 µOPs, 8-way set associative
 
**** 32 sets, 6-µOP line size
 
**** statically divided between threads, per core, inclusive with L1I
 
** L1I Cache:
 
*** 32 [[KiB]], 8-way set associative
 
**** 64 sets, 64 B line size
 
**** shared by the two threads, per core
 
** L1D Cache:
 
*** 32 KiB, 8-way set associative
 
*** 64 sets, 64 B line size
 
*** shared by the two threads, per core
 
*** 4 cycles for fastest load-to-use (simple pointer accesses)
 
**** 5 cycles for complex addresses
 
*** 128 B/cycle load bandwidth
 
*** 64 B/cycle store bandwidth
 
*** Write-back policy
 
** L2 Cache:
 
*** Unified, 256 KiB, 4-way set associative
 
*** 1024 sets, 64 B line size
 
*** Non-inclusive
 
*** 12 cycles for fastest load-to-use
 
*** 64 B/cycle bandwidth to L1$
 
*** Write-back policy
 
** L3 Cache/LLC:
 
*** Up to 2 MiB Per core, shared across all cores
 
*** Up to 16-way set associative
 
*** Inclusive
 
*** 64 B line size
 
*** Write-back policy
 
*** Per each core:
 
**** Read: 32 B/cycle (@ ring [[clock]])
 
**** Write: 32 B/cycle (@ ring clock)
 
*** 42 cycles for fastest load-to-use
 
** System [[DRAM]]:
 
*** 2 Channels
 
*** 8 B/cycle/channel (@ memory clock)
 
 
Palm Cove TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally there is a unified L2 TLB (STLB).
 
* TLBs:
 
** ITLB
 
*** 4 KiB page translations:
 
**** 64 entries; 8-way set associative
 
**** dynamic partitioning
 
*** 2 MiB / 4 MiB page translations:
 
**** 8 entries per thread; fully associative
 
**** Duplicated for each thread
 
** DTLB
 
*** 4 KiB page translations:
 
**** 64 entries; 4-way set associative
 
**** fixed partition
 
*** 2 MiB / 4 MiB page translations:
 
**** 32 entries; 4-way set associative
 
**** fixed partition
 
*** 1G page translations:
 
**** 4 entries; 4-way set associative
 
**** fixed partition
 
** STLB
 
*** 4 KiB + 2 MiB page translations:
 
**** 1536 entries; 12-way set associative
 
**** fixed partition
 
*** 1 GiB page translations:
 
**** 16 entries; 4-way set associative
 
**** fixed partition
 
  
 
== Overview ==
 
== Overview ==

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)