From WikiChip
Coffee Lake - Microarchitectures - Intel
< intel‎ | microarchitectures(Redirected from intel/coffee lake)

Edit Values
Coffee Lake µarch
General Info
Arch TypeCPU
DesignerIntel
ManufacturerIntel
IntroductionOctober 5, 2017
Process14 nm
Core Configs4, 6
Pipeline
OoOEYes
SpeculativeYes
Reg RenamingYes
Stages14-19
Decode5-way
Instructions
ISAx86-16, x86-32, x86-64
ExtensionsMOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA3, F16C, BMI, BMI2, VT-x, VT-d, TXT, TSX, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVE, SGX, MPX
Cache
L1I Cache32 KiB/core
8-way set associative
L1D Cache32 KiB/core
8-way set associative
L2 Cache256 KiB/core
4-way set associative
L3 Cache2 MiB/core
Up to 16-way set associative
L4 Cache128 MiB/package
on Iris Pro GPUs only
Cores
Core NamesCoffee Lake U, Coffee Lake H, Coffee Lake S
Succession

Coffee Lake (CFL) is a microarchitecture designed by Intel as a successor to Kaby Lake for desktops and high-performance mobile devices. Coffee Lake was introduced in the third quarter of 2017 and is manufactured on Intel's mature 14 nm process. Coffee Lake features the first series of mainstream hexa-core processors from Intel.

Codenames[edit]

Core Abbrev Description Graphics Target
Coffee Lake U CFL-U Ultra-low power GT2 Light notebooks, portable All-in-Ones (AiOs), Minis, and conference room
Coffee Lake H CFL-H High-performance graphics GT3e Ultimate mobile performance, mobile workstations
Coffee Lake S CFL-S Mainstream performance GT2 Desktop performance to value, AiOs, and minis
Coffee Lake X CFL-X Extreme Performance High performance desktops

Brands[edit]

Intel released Coffee Lake under 3 main brand families:

Logo Family General Description Differentiating Features
Cores HT AVX AVX2 TBT ECC
core i3 logo (2015).png Core i3 Low-end Performance Quad
core i5 logo (2015).png Core i5 Mid-range Performance Hexa
core i7 logo (2015).png Core i7 High-end Performance Hexa

Release Dates[edit]

2016 to 2018 kaby cannon coffee roadmap.jpg

Early roadmaps indicated Coffee Lake was to be introduced around the second quarter of 2018. In early 2017 Intel announced that 8th generation processors will be available starting from the 3rd quarter of 2017. While the exact reason for the early release is unknown, it seems likely to attribute the move to various market forces, particularly AMD's introduction of Zen and the Ryzen family.

Intel announced Coffee Lake-based SKUs on September 24 with products available beginning October 5, 2017 and OEM systems starting Q4 2017.

Technology[edit]

intel 14nm++.png
See also: Broadwell § Process Technology and 14 nm lithography process

Coffee Lake is manufactured on Intel's 3rd generation 14 nm process called "14nm++". The process is the second enhanced version of the first which was used for the Broadwell microarchitecture (first enhanced version was first used for Kaby Lake). The various enhancements improve performance without increasing the capacitance (i.e., active power characteristics). 14nm++ allows for +23-24% higher drive current. Intel claims their 14nm++ process provides 26% more performance for 52% less power.


intel 14nm++ (nmos).png intel 14nm++ (pmos).png


Note that while both "14nm" and "14nm+" used the same transistor geometry, the "14nm++" actually uses a more relaxed contacted poly pitch of 84 nm (from previously 70nm). There is no real density change despite this change likely due to various design techniques such as reduced fins where unnecessary.

Kaby Lake Coffee Lake Δ
14 nm 14 nm
Gate Pitch 70 nm 84 nm 1.20x
Interconnect Pitch 52 nm 52 nm 1.00x

Compatibility[edit]

New text document.svg This section is empty; you can help add the missing info by editing this page.

Compiler support[edit]

Compiler Arch-Specific Arch-Favorable
ICC -march=skylake -mtune=skylake
GCC -march=skylake -mtune=skylake
LLVM -march=skylake -mtune=skylake
Visual Studio /arch:AVX2 /tune:skylake

CPUID[edit]

Core Extended
Family
Family Extended
Model
Model
U 0 0x6 0x9 0xE
Family 6 Model 158
S/H 0 0x6 0x? 0x?
Family 6 Model ???

Architecture[edit]

Coffee Lake is 8th Generation Core

While there is no change in pure IPC over Skylake and the actual microarchitecture is largely the same, Intel introduced a number of major architectural changes in Coffee Lake. In addition to improved performance brought by the uplift in binning as a result of the enhanced process, Coffee Lake also increased the number of cores by 50%, enabling much higher multi-threaded performance. The enhanced manufacturing process should allow Coffee Lake chips to be highly overclockable.

Key changes from Kaby Lake[edit]

  • Enhanced "14nm++" process results in higher turbo frequencies
  • IPC improvement from larger cache for various workloads, but actual core is unchanged
  • Core
    • LSD has been re-enabled (Previously disabled)
  • Memory
    • Faster memory for mainstream desktops (i.e., Coffee Lake S) DDR4-2666 (from DDR4-2400)

Block Diagram[edit]

Entire SoC Overview (quad)[edit]

kaby lake soc block diagram.svg

Entire SoC Overview (hexa)[edit]

coffee lake soc block diagram.svg

Individual Core[edit]

(Core identical to Skylake (client))

skylake block diagram.svg

Gen9.5[edit]

See Gen9.5#Gen9.5.

Memory Hierarchy[edit]

The overall memory structure is identical to Skylake.

  • Cache
    • L0 µOP cache:
      • 1,536 µOPs, 8-way set associative
        • 32 sets, 6-µOP line size
        • statically divided between threads, per core, inclusive with L1I
    • L1I Cache:
      • 32 KiB, 8-way set associative
        • 64 sets, 64 B line size
        • shared by the two threads, per core
    • L1D Cache:
      • 32 KiB, 8-way set associative
      • 64 sets, 64 B line size
      • shared by the two threads, per core
      • 4 cycles for fastest load-to-use (simple pointer accesses)
        • 5 cycles for complex addresses
      • 64 B/cycle load bandwidth
      • 32 B/cycle store bandwidth
      • Write-back policy
    • L2 Cache:
      • Unified, 256 KiB, 4-way set associative
      • Non-inclusive
      • 1024 sets, 64 B line size
      • 12 cycles for fastest load-to-use
      • 64 B/cycle bandwidth to L1$
      • Write-back policy
    • L3 Cache/LLC:
      • Up to 2 MiB Per core, shared across all cores
      • Up to 16-way set associative
      • Inclusive
      • 64 B line size
      • Write-back policy
      • Per each core:
        • Read: 32 B/cycle (@ ring clock)
        • Write: 32 B/cycle (@ ring clock)
      • 42 cycles for fastest load-to-use
    • Side Cache:
      • 64 MiB & 128 MiB eDRAM
      • Per package
      • Only on the Iris Pro GPUs
      • Read: 32 B/cycle (@ eDRAM clock)
      • Write: 32 B/cycle (@ eDRAM clock)
    • System DRAM:
      • 2 Channels
      • 8 B/cycle/channel (@ memory clock)
      • 42 cycles + 51 ns latency

Coffee Lake TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally there is a unified L2 TLB (STLB).

  • TLBs:
    • ITLB
      • 4 KiB page translations:
        • 128 entries; 8-way set associative
        • dynamic partitioning
      • 2 MiB / 4 MiB page translations:
        • 8 entries per thread; fully associative
        • Duplicated for each thread
    • DTLB
      • 4 KiB page translations:
        • 64 entries; 4-way set associative
        • fixed partition
      • 2 MiB / 4 MiB page translations:
        • 32 entries; 4-way set associative
        • fixed partition
      • 1G page translations:
        • 4 entries; fully associative
        • fixed partition
    • STLB
      • 4 KiB + 2 MiB page translations:
        • 1536 entries; 12-way set associative
        • fixed partition
      • 1 GiB page translations:
        • 16 entries; 4-way set associative
        • fixed partition


  • Note: STLB is incorrectly reported as "6-way" by CPUID leaf 2 (EAX=02H). Coffee Lake erratum CFL084 recommends software to simply ignore that value.

Overview[edit]

coffee lake overview.svg

The Coffee Lake system on a chip consists of a five major components: CPU cores, LLC, Ring interconnect, System Agent, and the integrated graphics. The core architecture in Coffee Lake, like in Kaby Lake, as not changed from Skylake. This is also true for the integrated graphics which is identical to the one incorporated in Kaby Lake and from a platform point of view, the I/O has not changed (supporting up to 3 displays and providing 16 PCIe Gen 3 lanes). Coffee Lake, however, has brought a relatively large change to the overall system architecture by introducing two additional physical cores into its mainstream processor die. Those two cores also come with up to 2 MiB of LLC slice per core (for up to 4 MiB of additional last level cache).

In addition to improving multi-thread performance considerably by introducing 50% or two more cores as well as up to four additional threads, the added addition of up to 4 MiB of cache should have a positive impact on most single-thread performance.

Historical Trend[edit]

Coffee Lake presents the largest change in the system architecture of Intel's mainstream microarchitecutre since the introduction of Sandy Bridge in 2011. In 2006 Intel introduced the first mainstream quad-core processor, the Core 2 Extreme QX6700 which was based on the Kentsfield core. Those initial quad-cores comprised of two separate dies interconnected in a multi-chip package. A coherent communication link was lacking and the aging front-side bus was used for as the die-to-die link. This configuration did not change through Penryn up until the introduction of the Core i7 Extreme based on the Nehalem microarchitecture in 2008. Nehalem leveraged Moore's Law and Intel's 45 nm process to incorporate all four cores onto a single die along with a large number of changes, particularly enhancing the uncore (now known as the System Agent).


penryn-nehalem overview change.svg


With the introduction of Sandy Bridge in 2011, the entire system architecture was reworked. A particular goal of Sandy Bridge was its configurability. Intel wanted to be able to use a single design across multiple market segments without having to spend extra resources on multiple physical designs. A large part of its modularity came from the ring interconnect Sandy Bridge implemented. It's worth pointing out that the ring implementation in Sandy Bridge is an enhanced version largely based on an implementation first incorporated into the Nehalem-EX server parts. The ring allowed Intel to integrate the System Agent and the integrated graphics on-die in Sandy Bridge.

sandy bridge ring scalability.svg

Each of those components had its own ring agent (in addition to the individual core), allowing for efficient transfer of data between the GPU, the SA, and the individual cores and caches. The final result was a complete system on a chip with four cores and a 12 EU GPU on a single die measuring consisting of 1.16 billion transistors on a 216 mm² die.


nehalem-sandy bridge overview change.svg


From Sandy Bridge through 22 nm Haswell and through 14 nm Skylake, the die shrunk considerably, even after large amount of enhancements (and thus transistors) were done to the microarchitecture. With the aid of Moore's Law, the quad-core die in Coffee Lake's predecessor, Kaby Lake, has reached 126 mm² - 42% smaller than the quad-core Sandy Bridge while packing over 3 times as much transistors.

Since Coffee Lake utilizes Intel's 3rd generation enhanced 14nm++ process which has reached maturity and healthy yield, Intel can afford to increase the amount of cores by 50% from 4 to 6 cores. This is also possible thanks to the existing ring interconnect that was designed specifically to be able to support this configuration. In addition to the two added cores, there are two addition LLC slices - each consisting of 2 MiB in size.


sandy bridge-coffee lake overview change.svg


quad to hexa mainstream die areas.svg

It can easily be seen how the natural evolution of Moore's Law and its affects on the die size of Intel's mainstream platform enables the addition of two more cores and their associated cache slices without sacrificing yield due to a bigger die. In fact, the hexa-core at 149 mm² is still considerably smaller than even the quad-core Haswell-based chips. The pair of cores with their associated cache slices contributed an extra ~25mm². In fact, it can further be seen that even an 8-core Coffee Lake would be smaller than Haswell's quad-core at around 174 mm². It's worth noting that Coffee Lake is released concurrently with Cannonlake which is a 10 nm-based microarchitecture for low-power mobile devices. Due to Intel's faithful die shrink of roughly x2.7 in density, an identical hexa-core Coffee Lake die on 10nm would result in a smaller die than any of the 14 nm quad-core dies, possibly even the dual-core dies as well.

Core[edit]

Pipeline[edit]

Main article: Skylake § Pipeline

Coffee Lake's pipeline is identical to Skylake's.

Front-end[edit]

Note that a bug associated with the Loop Stream Detector (LSD) has been fixed with Coffee Lake. See Skylake (server) § Front-end.

Scheduler Ports & Execution Units[edit]

Scheduler Ports Designation
Port 0Integer/Vector Arithmetic, Multiplication, Logic, Shift, and String ops
FP Add, Multiply, FMA
Integer/FP Division and Square Root
AES Encryption
Branch2
Port 1Integer/Vector Arithmetic, Multiplication, Logic, Shift, and Bit Scanning
FP Add, Multiply, FMA
Port 5Integer/Vector Arithmetic, Logic
Vector Permute
x87 FP Add, Composite Int, CLMUL
Port 6Integer Arithmetic, Logic, Shift
Branch
Port 2Load, AGU
Port 3Load, AGU
Port 4Store, AGU
Port 7AGU

Configurability[edit]

Coffee Lake builds upon the Skylake platform, with the addition of the first hexa core die. Currently the Coffee Lake family consists out of two dies, aimed towards the high performance market.

Graphics[edit]

Main article: Gen9.5

Support for three displays via HDMI 1.4[graphics 1], DisplayPort (DP) 1.2, and Embedded DisplayPort (eDP) 1.4 interfaces. Coffee Lake's graphics are identical to Kaby Lake and have native fixed function HEVC/VP9 decoding for 4K playback at 60fps (10-bit) as well as fixed function HEVC/VP9 encoding for 4K (8-bit).

Integrated Graphics Processor Standards
Name Execution Units Tier Series eDRAM Vulkan Direct3D OpenGL OpenCL
Windows Linux Windows Linux HLSL Windows Linux Windows Linux
UHD Graphics 630 23/24 GT2 S - 1.0 12 N/A 5.1 4.5 4.5 2.1 2.0
  1. Note that while there is no native HDMI 2.0 support, Intel did provide somewhat of an awkward solution using an LSPCON (Level Shifter/Protocol Converter) to drive DP to HDMI 1.4 signal + convert HDMI 1.4 to HDMI 2.0. One such solution is the MegaChips MCDP2800.

Hardware Accelerated Video[edit]

[Edit] Coffee Lake (Gen9.5) Hardware Accelerated Video Capabilities
Codec Encode Decode
Profiles Levels Max Resolution Profiles Levels Max Resolution
MPEG-2 (H.262) Main High 1080p (FHD) Main Main, High 1080p (FHD)
MPEG-4 AVC (H.264) High, Main 5.1 2160p (4K) Main, High, MVC, Stereo 5.1 2160p (4K)
JPEG/MJPEG Baseline - 16k x 16k Baseline Unified 16k x 16k
HEVC (H.265) Main 5.1 2160p (4K) Main 5.1 2160p (4K)
VC-1 Advanced, Main, Simple 3, High, Simple 3840x3840
VP8 Unified Unified N/A 0 Unified 1080p
VP9 0 2160p (4K) 0, 2 Unified 2160p (4K)

Power delivery[edit]

Despite using the same socket (FCLGA-1151) Coffee Lake break compatibility with Skylake and Kaby Lake due to various enhancements to the power delivery of the processor in order to better handle the additional cores.

In order to improve the power delivery of the chip and support higher package-level current delivered for the additional cores, Intel needed to increase the number of pins that go to the power rails of the die. Since there is a practical limit as to how much current each pin is capable of delivering, a large number of additional pins that were previously unused/reserved have also been allocated for this purpose. The new hexa-core parts have 38 higher amperage rating.

Pin Changes
Skylake/Kaby Lake Coffee Lake
Socket FCLGA-1151 FCLGA-1151
Contacts 1151 1151
Reserved Pins 46 25 (-21)
VSS (Ground) 377 391 (+14)
VCC (Power) 128 146 (+18)
Core Icc 138 A (Hexa; 95 W)
133 A (Hexa; 65 W)
100 A (Quad; 91 W) 100 A (Quad; 95 W)
79 A (Quad; 65 W) 79 A (Quad; 65 W)
66 A (Quad; 35 W)
58 A (Dual; 54 W)
45 A (Dual; 51 W)
40 A (Dual; 35 W)
Pinout skylake pin diagram.png coffee lake pin diagram.png

Die[edit]

Coffee Lake desktop and mobile come and 4 and 6 cores. Each variant has its own die. The major components of the die are:

  • System Agent
  • CPU Core
  • Ring bus interconnect
  • Memory Interface

System Agent[edit]

The System Agent (SA) contains the Display Engine (DE) and the I/O bus.

Hexa-Core Die

coffee lake 6c sa.png
coffee lake 6c sa (annotated).png

Integrated Graphics[edit]

The integrated graphics makes up a large portion of the die. The normal dual-core and quad-core dies come with 24 EU Gen9.5 GPU (with 12 units disabled on the low end models).

coffee lake gpu.png
coffee lake gpu (annotated).png

Quad-Core[edit]

  • 14 nm++ process
  • 11 metal layers
  • 126 mm² die size
  • 4 CPU cores + 23 GPU EUs

Hexa-Core[edit]

  • 14 nm++ process
  • 11 metal layers
  • 149 mm² die size
  • 6 CPU cores + 24 GPU EUs


coffee lake die (hexa core).png


coffee lake die (quad core) (annotated).png

Additional Shots[edit]

Additional die and wafer shots provided by Intel:

All Coffee Lake Chips[edit]

 List of Coffee Lake-based Processors
 Main processorTurbo BoostMemoryGPUFeatures
ModelLaunchedPriceFamilyPlatformCoreCoresThreadsL3$TDPBase1 Core2 Cores4 Cores6 CoresMax MemoryNameBaseBurstTBTHT
i3-81005 October 2017$ 117.00
€ 105.30
£ 94.77
¥ 12,089.61
Core i3Coffee LakeCoffee Lake S446 MiB
6,144 KiB
6,291,456 B
0.00586 GiB
65 W
65,000 mW
0.0872 hp
0.065 kW
3.6 GHz
3,600 MHz
3,600,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
UHD Graphics 630350 MHz
0.35 GHz
350,000 KHz
1,100 MHz
1.1 GHz
1,100,000 KHz
i3-8350K5 October 2017$ 168.00
€ 151.20
£ 136.08
¥ 17,359.44
Core i3Coffee LakeCoffee Lake S448 MiB
8,192 KiB
8,388,608 B
0.00781 GiB
91 W
91,000 mW
0.122 hp
0.091 kW
4 GHz
4,000 MHz
4,000,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
UHD Graphics 630350 MHz
0.35 GHz
350,000 KHz
1,150 MHz
1.15 GHz
1,150,000 KHz
i5-84005 October 2017$ 182.00
€ 163.80
£ 147.42
¥ 18,806.06
Core i5Coffee LakeCoffee Lake S669 MiB
9,216 KiB
9,437,184 B
0.00879 GiB
65 W
65,000 mW
0.0872 hp
0.065 kW
2.8 GHz
2,800 MHz
2,800,000 kHz
4 GHz
4,000 MHz
4,000,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
3.9 GHz
3,900 MHz
3,900,000 kHz
3.8 GHz
3,800 MHz
3,800,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
UHD Graphics 630350 MHz
0.35 GHz
350,000 KHz
1,050 MHz
1.05 GHz
1,050,000 KHz
i5-8600K5 October 2017$ 257.00
€ 231.30
£ 208.17
¥ 26,555.81
Core i5Coffee LakeCoffee Lake S669 MiB
9,216 KiB
9,437,184 B
0.00879 GiB
95 W
95,000 mW
0.127 hp
0.095 kW
3.6 GHz
3,600 MHz
3,600,000 kHz
4.3 GHz
4,300 MHz
4,300,000 kHz
4.2 GHz
4,200 MHz
4,200,000 kHz
4.2 GHz
4,200 MHz
4,200,000 kHz
4.1 GHz
4,100 MHz
4,100,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
UHD Graphics 630350 MHz
0.35 GHz
350,000 KHz
1,150 MHz
1.15 GHz
1,150,000 KHz
i7-87005 October 2017$ 303.00
€ 272.70
£ 245.43
¥ 31,308.99
Core i7Coffee LakeCoffee Lake S61212 MiB
12,288 KiB
12,582,912 B
0.0117 GiB
65 W
65,000 mW
0.0872 hp
0.065 kW
3.2 GHz
3,200 MHz
3,200,000 kHz
4.6 GHz
4,600 MHz
4,600,000 kHz
4.5 GHz
4,500 MHz
4,500,000 kHz
4.3 GHz
4,300 MHz
4,300,000 kHz
4.3 GHz
4,300 MHz
4,300,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
UHD Graphics 630350 MHz
0.35 GHz
350,000 KHz
1,200 MHz
1.2 GHz
1,200,000 KHz
i7-8700K5 October 2017$ 359.00
€ 323.10
£ 290.79
¥ 37,095.47
Core i7Coffee LakeCoffee Lake S61212 MiB
12,288 KiB
12,582,912 B
0.0117 GiB
95 W
95,000 mW
0.127 hp
0.095 kW
3.7 GHz
3,700 MHz
3,700,000 kHz
4.7 GHz
4,700 MHz
4,700,000 kHz
4.6 GHz
4,600 MHz
4,600,000 kHz
4.4 GHz
4,400 MHz
4,400,000 kHz
4.3 GHz
4,300 MHz
4,300,000 kHz
64 GiB
65,536 MiB
67,108,864 KiB
68,719,476,736 B
0.0625 TiB
UHD Graphics 630350 MHz
0.35 GHz
350,000 KHz
1,200 MHz
1.2 GHz
1,200,000 KHz
Count: 6

Documents[edit]

References[edit]

  • Mark Bohr, Intel. Intel Technology and Manufacturing Day. Mar 28, 2017.
  • Intel 8th Generation Core announcement, Sept 25, 2017.
codenameCoffee Lake +
core count4 + and 6 +
designerIntel +
first launchedOctober 5, 2017 +
full page nameintel/microarchitectures/coffee lake +
instance ofmicroarchitecture +
instruction set architecturex86-16 +, x86-32 + and x86-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameCoffee Lake +
pipeline stages (max)19 +
pipeline stages (min)14 +
process14 nm (0.014 μm, 1.4e-5 mm) +