From WikiChip
Difference between revisions of "intel/microarchitectures/sunny cove"
< intel‎ | microarchitectures

(Pipeline)
(Sunny cove has "Larger scheduler (160, up from 97 entries)".)
(32 intermediate revisions by 13 users not shown)
Line 53: Line 53:
 
|l1d per=core
 
|l1d per=core
 
|l1d desc=12-way set associative
 
|l1d desc=12-way set associative
|l2=512 MiB
+
|l2=512 KiB
 
|l2 per=core
 
|l2 per=core
 
|l2 desc=8-way set associative
 
|l2 desc=8-way set associative
Line 64: Line 64:
 
|successor link=intel/microarchitectures/willow cove
 
|successor link=intel/microarchitectures/willow cove
 
}}
 
}}
'''Sunny Cove''' is the successor to {{\\|Palm Cove}}, a high-performance [[10 nm]] [[x86]] core microarchitecture designed by [[Intel]] for an array of server and client products, including {{\\|Ice Lake (Client)}}, {{\\|Ice Lake (Server)}}, {{\\|Lakefield}}, and the Nervana {{nervana|NNP-I}}. The microarchitecture was developed by Intel's R&D Center (IDC) in Haifa, Israel.
+
'''Sunny Cove''' ('''SNC'''), the successor to {{\\|Palm Cove}}, is a high-performance [[10 nm]] [[x86]]-64 core microarchitecture designed by [[Intel]] for an array of server and client products, including {{\\|Ice Lake (Client)}}, {{\\|Ice Lake (Server)}}, {{\\|Lakefield}}, and the Nervana {{nervana|NNP-I}}. The microarchitecture was developed by Intel's R&D Center (IDC) in Haifa, Israel.
  
 
== History ==
 
== History ==
[[File:sunny cove roadmap.png|thumb|right|400px|Intel Core roadmap]]
+
[[File:sunny cove roadmap.png|thumb|right|200px|Intel Core roadmap]]
Sunny Cove was originally unveiled by Intel at their 2018 architecture day. Intel originally intended for Sunny Cove to succeed {{\\|Palm Cove}} in late 2017 which was it was intended to be the first [[10 nm]]-based core and the proper successor to {{\\|Skylake (client)|Skylake}}. Prolong delays and problems with their [[10 nm process]] and resulted in a number of improvised derivatives of {{\\|Skylake (client)|Skylake}} including {{\\|Kaby Lake}}, {{\\|Coffee Lake}}, and {{\\|Comet Lake}}. For all practical purposes, {{\\|Palm Cove}} has been skipped and Intel has gone directly to Sunny Cove. Sunny Cove is expected to debut in mid-2019.
+
Sunny Cove was originally unveiled by Intel at their 2018 architecture day. Intel originally intended for Sunny Cove to succeed {{\\|Palm Cove}} in late 2017 which was intended to be the first [[10 nm]]-based core and the proper successor to {{\\|Skylake (client)|Skylake}}. Prolonged delays and problems with their [[10 nm process]] resulted in a number of improvised derivatives of {{\\|Skylake (client)|Skylake}} including {{\\|Kaby Lake}}, {{\\|Coffee Lake}}, and {{\\|Comet Lake}}. For all practical purposes, {{\\|Palm Cove}} has been skipped and Intel has gone directly to Sunny Cove. Sunny Cove debuted in mid-2019.
  
 
:[[File:14nm improv 10 delays.svg|500px]]
 
:[[File:14nm improv 10 delays.svg|500px]]
  
 
== Process Technology ==
 
== Process Technology ==
Sunny Cove is designed to take advantage of Intel's [[10 nm process]].
+
Sunny Cove is designed to take advantage of Intel's [[10 nm+ process]].
  
 
== Architecture ==
 
== Architecture ==
 
=== Key changes from {{\\|Palm Cove}}/{{\\|Skylake}}===
 
=== Key changes from {{\\|Palm Cove}}/{{\\|Skylake}}===
[[File:skylake - sunny cove changes block.jpg|thumb|right|Skylake to Sunny Cove changes]][[File:sunny cove enhancements.jpg|thumb|right|Sunny Cove enhancements]]
+
[[File:skylake - sunny cove changes block.jpg|thumb|right|Skylake to Sunny Cove changes]][[File:sunny cove enhancements.jpg|thumb|right|Sunny Cove enhancements]][[File:sunny cove buffer capacities.png|thumb|right|Sunny Cove buffers]]
 +
* Performance
 +
** [[IPC]] uplift ([[Intel]] self-reported average 18% IPC across proxy benchmarks such as [[SPEC CPU2006]]/[[SPEC CPU2017]])
 
* Front-end
 
* Front-end
** Larger µOP cache (?, up from 1536)
+
** 1.5x larger µOP cache (2.25k entries, up from 1536)
 +
** Smarter [[prefetchers]]
 +
** Improved [[branch predictor]]
 +
** ITLB
 +
*** Double 2M page entries (16 entries, up from 8)
 +
** Larger IDQ (70 µOPs, up from 64)
 +
** LSD can detect up to 70 µOP loops (up from 64)
 
* Back-end
 
* Back-end
 
** Wider allocation (5-way, up from 4-way)
 
** Wider allocation (5-way, up from 4-way)
** Larger ROB (?, up from 224 entries)
+
** 1.6x larger ROB (352, up from 224 entries)
 
** Scheduler
 
** Scheduler
*** Larger scheduler (?, up from 97 entries)
+
*** Larger scheduler (160, up from 97 entries)
 
*** Larger dispatch (10-way, up from 8-way)
 
*** Larger dispatch (10-way, up from 8-way)
*** Execution ports rebalanced
+
* Execution Engine
*** New store data port
+
** Execution ports rebalanced
*** New store AGU port
+
** 2x store data ports (up from 1)
 +
** 2x store address AGU (up from 1)
 +
** New paired store capabilities
 +
** Replaced 2 generic AGUs with two load AGUs
 
* Memory subsystem
 
* Memory subsystem
 
** LSU
 
** LSU
*** Deeper load queue (?, up from 72 entries)
+
*** 1.8x more inflight loads (128, up from 72 entries)
*** Deeper store queue (?, up from 42 entries)
+
*** 1.3x more inflight stores (72, up from 56 entries)
** Larger L1 data cache (48 KiB, up from 32 KiB)
+
** 1.5x larger L1 data cache (48 KiB, up from 32 KiB)
** Larger L2 cache (512 KiB, up from 256 KiB)
+
** 2x larger L2 cache (512 KiB, up from 256 KiB)
 
*** Larger STLBs
 
*** Larger STLBs
 +
**** Larger 1G table (1024-entry, up from 16)
 +
**** Larger 4k table (2048 entries, up from 1536)
 +
**** New 1,024-entry 2M/4M table
 
** 5-Level Paging
 
** 5-Level Paging
 
*** Large virtual address (57 bits, up from 48 bits)
 
*** Large virtual address (57 bits, up from 48 bits)
Line 104: Line 118:
 
Sunny Cove introduced a number of {{x86|extensions|new instructions}}:
 
Sunny Cove introduced a number of {{x86|extensions|new instructions}}:
  
 +
* {{x86|SHA|<code>SHA</code>}} - [[Hardware acceleration]] for SHA hashing operations
 
* {{x86|CLWB|<code>CLWB</code>}} - Force cache line write-back without flush
 
* {{x86|CLWB|<code>CLWB</code>}} - Force cache line write-back without flush
 
* {{x86|RDPID|<code>RDPID</code>}} - Read Processor ID
 
* {{x86|RDPID|<code>RDPID</code>}} - Read Processor ID
Line 118: Line 133:
 
* Split Lock Detection - detection and cause an exception for split locks
 
* Split Lock Detection - detection and cause an exception for split locks
 
* Fast Short REP MOV
 
* Fast Short REP MOV
 +
 +
Only on server parts ({{\\|Ice Lake (Server)}}):
 +
 +
* {{x86|TME|<code>TME</code>}} - Total Memory Encryption
 +
* {{x86|PCONFIG|<code>PCONFIG</code>}} Platform Configuration
 +
* {{x86|WBNOINVD|<code>WBNOINVD</code>}} Write-back and do not invalidate cache
 +
* {{x86|ENCLV|<code>ENCLV</code>}} - SGX oversubscription instructions
 +
 +
=== Block diagram ===
 +
:[[File:sunny cove block diagram.svg|950px]]
  
 
== Overview ==
 
== Overview ==
Sunny Cove is Intel's core microarchitecture for a series of client and server chips that succeed {{\\|Palm Cove}} (and effectively the {{\\|Skylake (client)|Skylake}} series of derivatives). Sunny Cove is just the core which is implemented in a numerous chips made by Intel including {{\\|Lakefield}}, {{\\|Ice Lake (Client)}}, {{\\|Ice Lake (Server)}}, and the [[Nervana]] {{nervana|NNP}} accelerator. Sunny Cove introduces a large set of enhancements that significantly improves the performance of legacy code and new code through the extraction of parallelism as well as new features. Those include a significantly deep [[out-of-window]] pipeline, a wider execution back-end, higher load-store bandwidth, lower effective access latencies, and bigger caches.
+
Sunny Cove is Intel's microarchitecture for the CPU core which is incorporated into a number of client and server chips that succeed {{\\|Palm Cove}} (and effectively the {{\\|Skylake (client)|Skylake}} series of derivatives). Sunny Cove is just the core which is implemented in a numerous chips made by Intel including {{\\|Lakefield}}, {{\\|Ice Lake (Client)}}, {{\\|Ice Lake (Server)}}, and the [[Nervana]] {{nervana|NNP}} accelerator. Sunny Cove introduces a large set of enhancements that significantly improves the performance of legacy code and new code through the extraction of parallelism as well as new features. Those include a significantly deep [[out-of-window]] pipeline, a wider execution back-end, higher load-store bandwidth, lower effective access latencies, and bigger caches.
  
 
== Pipeline ==
 
== Pipeline ==
Like it's predecessors, Sunny Cove focuses on extracting performance and reducing power through a number of key ways. Intel builds Sunny Cove on previous microarchitectures, descendants of {{\\|Sandy Bridge}}. For the core to increase the overall performance, Intel focused on extracting additional parallelism.
+
Like its predecessors, Sunny Cove focuses on extracting performance and reducing power through a number of key ways. Intel builds Sunny Cove on previous microarchitectures, descendants of {{\\|Sandy Bridge}}. For the core to increase the overall performance, Intel focused on extracting additional parallelism.
  
 
==== Broad Overview ====
 
==== Broad Overview ====
Line 136: Line 161:
 
Some µOPs deal with memory access (e.g. [[instruction load|load]] & [[instruction store|store]]). Those will be sent on dedicated scheduler ports that can perform those memory operations. Store operations go to the store buffer which is also capable of performing forwarding when needed. Likewise, Load operations come from the load buffer. Sunny Cove features a dedicated 48 KiB level 1 data cache and a dedicated 32 KiB level 1 instruction cache. It also features a core-private 512 KiB L2 cache that is shared by both of the L1 caches.
 
Some µOPs deal with memory access (e.g. [[instruction load|load]] & [[instruction store|store]]). Those will be sent on dedicated scheduler ports that can perform those memory operations. Store operations go to the store buffer which is also capable of performing forwarding when needed. Likewise, Load operations come from the load buffer. Sunny Cove features a dedicated 48 KiB level 1 data cache and a dedicated 32 KiB level 1 instruction cache. It also features a core-private 512 KiB L2 cache that is shared by both of the L1 caches.
  
Each core enjoys a slice of a third level of cache that is shared by all the core. For Sunny Cove, there are either [[two cores]] or [[four cores]] connected together on a single chip.
+
Each core enjoys a slice of a third level of cache that is shared by all the core. For {{\\|Ice Lake (Client)}} which incorporates Sunny Cove cores, there are either [[two cores]] or [[four cores]] connected together on a single chip.
 
{{clear}}
 
{{clear}}
 +
 +
=== Front-end ===
 +
{{empty section}}
 +
 +
{{work-in-progress}}
 +
 +
=== Back-end ===
 +
{{empty section}}
 +
 +
{{work-in-progress}}
  
 
== Die ==
 
== Die ==
Line 143: Line 178:
 
* [[10 nm process|10nm+ process]]
 
* [[10 nm process|10nm+ process]]
 
* Core from an {{\\|Ice Lake (client)|Ice Lake}} SoC
 
* Core from an {{\\|Ice Lake (client)|Ice Lake}} SoC
 
+
* ~6.91 mm² die size
 +
** ~3.5 mm x ~1.97 mm
  
 
:[[File:ice lake die core.png|400px]]
 
:[[File:ice lake die core.png|400px]]
Line 149: Line 185:
  
 
:[[File:ice lake die core (annotated).png|400px]]
 
:[[File:ice lake die core (annotated).png|400px]]
 +
 +
 +
:[[File:ice lake die core 2.png|500px]]
  
 
=== Core group ===
 
=== Core group ===
 
* [[10 nm process|10nm+ process]]
 
* [[10 nm process|10nm+ process]]
 
* Quad-core from an {{\\|Ice Lake (client)|Ice Lake}} SoC
 
* Quad-core from an {{\\|Ice Lake (client)|Ice Lake}} SoC
 +
* ~30.73 mm² die size
 +
** ~7.86 mm x ~3.91 mm
  
  
:[[File:ice lake die core group.png|700px]]
+
:[[File:ice lake die core group.png|class=wikichip_ogimage|700px]]
  
  
 
:[[File:ice lake die core group (annotated).png|700px]]
 
:[[File:ice lake die core group (annotated).png|700px]]
 +
 +
 +
:[[File:ice lake die core group 2.png|800px]]
  
 
== Bibliography ==
 
== Bibliography ==
 
* Intel Architecture Day 2018, December 11, 2018
 
* Intel Architecture Day 2018, December 11, 2018

Revision as of 00:12, 20 August 2020

Edit Values
Sunny Cove µarch
General Info
Arch TypeCPU
DesignerIntel
ManufacturerIntel
Introduction2019
Process10 nm
Core Configs2, 4
Pipeline
TypeSuperscalar
OoOEYes
SpeculativeYes
Reg RenamingYes
Stages14-19
Instructions
ISAx86-64
ExtensionsMOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA3, F16C, BMI, BMI2, VT-x, VT-d, TXT, TSX, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVE, SGX, MPX, AVX-512
Cache
L1I Cache32 KiB/core
8-way set associative
L1D Cache48 KiB/core
12-way set associative
L2 Cache512 KiB/core
8-way set associative
L3 Cache2 MiB/core
16-way set associative
Succession

Sunny Cove (SNC), the successor to Palm Cove, is a high-performance 10 nm x86-64 core microarchitecture designed by Intel for an array of server and client products, including Ice Lake (Client), Ice Lake (Server), Lakefield, and the Nervana NNP-I. The microarchitecture was developed by Intel's R&D Center (IDC) in Haifa, Israel.

History

Intel Core roadmap

Sunny Cove was originally unveiled by Intel at their 2018 architecture day. Intel originally intended for Sunny Cove to succeed Palm Cove in late 2017 which was intended to be the first 10 nm-based core and the proper successor to Skylake. Prolonged delays and problems with their 10 nm process resulted in a number of improvised derivatives of Skylake including Kaby Lake, Coffee Lake, and Comet Lake. For all practical purposes, Palm Cove has been skipped and Intel has gone directly to Sunny Cove. Sunny Cove debuted in mid-2019.

14nm improv 10 delays.svg

Process Technology

Sunny Cove is designed to take advantage of Intel's 10 nm+ process.

Architecture

Key changes from Palm Cove/Skylake

Skylake to Sunny Cove changes
Sunny Cove enhancements
Sunny Cove buffers
  • Performance
  • Front-end
    • 1.5x larger µOP cache (2.25k entries, up from 1536)
    • Smarter prefetchers
    • Improved branch predictor
    • ITLB
      • Double 2M page entries (16 entries, up from 8)
    • Larger IDQ (70 µOPs, up from 64)
    • LSD can detect up to 70 µOP loops (up from 64)
  • Back-end
    • Wider allocation (5-way, up from 4-way)
    • 1.6x larger ROB (352, up from 224 entries)
    • Scheduler
      • Larger scheduler (160, up from 97 entries)
      • Larger dispatch (10-way, up from 8-way)
  • Execution Engine
    • Execution ports rebalanced
    • 2x store data ports (up from 1)
    • 2x store address AGU (up from 1)
    • New paired store capabilities
    • Replaced 2 generic AGUs with two load AGUs
  • Memory subsystem
    • LSU
      • 1.8x more inflight loads (128, up from 72 entries)
      • 1.3x more inflight stores (72, up from 56 entries)
    • 1.5x larger L1 data cache (48 KiB, up from 32 KiB)
    • 2x larger L2 cache (512 KiB, up from 256 KiB)
      • Larger STLBs
        • Larger 1G table (1024-entry, up from 16)
        • Larger 4k table (2048 entries, up from 1536)
        • New 1,024-entry 2M/4M table
    • 5-Level Paging
      • Large virtual address (57 bits, up from 48 bits)
      • Significantly large virtual address space (128 PiB, up from 256 TiB)

This list is incomplete; you can help by expanding it.

New instructions

Sunny Cove introduced a number of new instructions:

  • SHA - Hardware acceleration for SHA hashing operations
  • CLWB - Force cache line write-back without flush
  • RDPID - Read Processor ID
  • Additional AVX-512 extensions:
  • SSE_GFNI - SSE-based Galois Field New Instructions
  • AVX_GFNI - AVX-based Galois Field New Instructions
  • Split Lock Detection - detection and cause an exception for split locks
  • Fast Short REP MOV

Only on server parts (Ice Lake (Server)):

  • TME - Total Memory Encryption
  • PCONFIG Platform Configuration
  • WBNOINVD Write-back and do not invalidate cache
  • ENCLV - SGX oversubscription instructions

Block diagram

sunny cove block diagram.svg

Overview

Sunny Cove is Intel's microarchitecture for the CPU core which is incorporated into a number of client and server chips that succeed Palm Cove (and effectively the Skylake series of derivatives). Sunny Cove is just the core which is implemented in a numerous chips made by Intel including Lakefield, Ice Lake (Client), Ice Lake (Server), and the Nervana NNP accelerator. Sunny Cove introduces a large set of enhancements that significantly improves the performance of legacy code and new code through the extraction of parallelism as well as new features. Those include a significantly deep out-of-window pipeline, a wider execution back-end, higher load-store bandwidth, lower effective access latencies, and bigger caches.

Pipeline

Like its predecessors, Sunny Cove focuses on extracting performance and reducing power through a number of key ways. Intel builds Sunny Cove on previous microarchitectures, descendants of Sandy Bridge. For the core to increase the overall performance, Intel focused on extracting additional parallelism.

Broad Overview

At a 5,000 foot view, Sunny Cove represents the logical evolution from Skylake and Haswell. Therefore, despite some significant differences from the previous microarchitecture, the overall designs is fundamentally the same and can be seen as enhancements over Skylake rather than a complete change.

intel common arch post ucache.svg

The pipeline can be broken down into three areas: the front-end, back-end or execution engine, and the memory subsystem. The goal of the front-end is to feed the back-end with a sufficient stream of operations which it gets by decoding instructions coming from memory. The front-end has two major pathways: the µOPs cache path and the legacy path. The legacy path is the traditional path whereby variable-length x86 instructions are fetched from the level 1 instruction cache, queued, and consequently get decoded into simpler, fixed-length µOPs. The alternative and much more desired path is the µOPs cache path whereby a cache containing already decoded µOPs receives a hit allowing the µOPs to be sent directly to the decode queue.

Regardless of which path an instruction ends up taking it will eventually arrive at the decode queue. The IDQ represents the end of the front-end and the in-order part of the machine and the start of the execution engine which operates out-of-order.

In the back-end, the micro-operations visit the reorder buffer. It's there where register allocation, renaming, and retiring takes place. At this stage a number of other optimizations are also done. From the reorder buffer, µOPs are sent to the unified scheduler. The scheduler has a number of exit ports, each wired to a set of different execution units. Some units can perform basic ALU operations, others can do multiplication and division, with some units capable of more complex operations such as various vector operations. The scheduler is effectively in charge of queuing the µOPs on the appropriate port so they can be executed by the appropriate unit.

Some µOPs deal with memory access (e.g. load & store). Those will be sent on dedicated scheduler ports that can perform those memory operations. Store operations go to the store buffer which is also capable of performing forwarding when needed. Likewise, Load operations come from the load buffer. Sunny Cove features a dedicated 48 KiB level 1 data cache and a dedicated 32 KiB level 1 instruction cache. It also features a core-private 512 KiB L2 cache that is shared by both of the L1 caches.

Each core enjoys a slice of a third level of cache that is shared by all the core. For Ice Lake (Client) which incorporates Sunny Cove cores, there are either two cores or four cores connected together on a single chip.

Front-end

New text document.svg This section is empty; you can help add the missing info by editing this page.
Under construction icon-blue.svg This article is a work in progress!

Back-end

New text document.svg This section is empty; you can help add the missing info by editing this page.
Under construction icon-blue.svg This article is a work in progress!

Die

Core

ice lake die core.png


ice lake die core (annotated).png


ice lake die core 2.png

Core group


ice lake die core group.png


ice lake die core group (annotated).png


ice lake die core group 2.png

Bibliography

  • Intel Architecture Day 2018, December 11, 2018