From WikiChip
Difference between revisions of "intel/microarchitectures/gen9"
< intel‎ | microarchitectures

(Codenames)
(Key changes from {{\\|Gen8 LP}})
Line 50: Line 50:
 
** Gen9 LP is composed of 3 truely independent major components: Display block, Unslice, and the Slice.
 
** Gen9 LP is composed of 3 truely independent major components: Display block, Unslice, and the Slice.
 
* Unslice
 
* Unslice
** Now sits on its own power/clock domain
+
** Now sits on its own power gating/clock domain
 
*** Capable of running at higher speeds if the situation allows (irrespective of slice clock)
 
*** Capable of running at higher speeds if the situation allows (irrespective of slice clock)
 
*** Can allow for pure fixed media alone
 
*** Can allow for pure fixed media alone
** Fixed-function geometry
 
 
** Higher throughput
 
** Higher throughput
 
** Tessellator AutoStrip
 
** Tessellator AutoStrip
 +
* Slice
 +
** L3 Cache
 +
*** Increased to 768 [[KiB]]/slice (up from 576 KiB/slice)
 +
*** Request queue size was increased
 +
* Subslice
 +
** Texture samplers now natively support an NV12 YUV
 +
** Preemption of execution is now supported at the thread level
 +
** Round robin scheduling of threads within an execution unit.
 +
** new native support for the 32-bit float atomics operations of min, max, and compare/exchange.
 +
** 16-bit floating point capability is improved with native support for denormals and gradual underflow
 +
* L4$
 +
** The [[eDRAM]] is now a side cache instead of an L4$ like it was in {{\\|Gen8 LP}}. (See {{\\|Skylake#eDRAM architectural changes|Skylake §eDRAM architectural changes}} for the reason)
 +
** Side-cache eDRAM was moved into the system agent adjacent to the display controller
  
 
=== Block Diagram ===
 
=== Block Diagram ===

Revision as of 15:18, 25 January 2017

Edit Values
Gen9 LP µarch
General Info
Arch TypeGPU
DesignerIntel
ManufacturerIntel
IntroductionAugust 5, 2015
Process14 nm
Succession

Gen9 LP (Generation 9 Low Power) is the microarchitecture for Intel's graphics processing unit utilized by Skylake-based microprocessors. Gen9 LP is the successor to Gen8 LP used by Broadwell. The Gen9 microarchitecture is designed separately by Intel and then integrated onto the same Skylake SoC die.

Codenames

Various models support different Graphics Tiers (GT) which provides different levels of performance. Some models also support an additional eDRAM side cache.

Code Name Description
GT1 Contains 1 slice with 12 execution units.
GT2 Contains 1 slice with 24 execution units.
GT3 Contains 2 slices with 48 execution units.
GT3e Contains 2 slices with 48 execution units. Has an additional eDRAM side cache.
Halo (GT4) Contains 3 slices with 72 execution units.
Halo+e (GT4e) Contains 3 slices with 72 execution units. Has an additional eDRAM side cache.

Brands

New text document.svg This section is empty; you can help add the missing info by editing this page.

Process Technology

Main article: Broadwell § Process Technology

Gen9 LP are part of the Skylake SoC die which uses the same 14 nm process used for the Broadwell microarchitecture.

Architecture

Gen9 LP presents a large departure from the Gen8 LP and previous architectures.

Key changes from Gen8 LP

  • Architecture is drastically different
    • Gen9 LP is composed of 3 truely independent major components: Display block, Unslice, and the Slice.
  • Unslice
    • Now sits on its own power gating/clock domain
      • Capable of running at higher speeds if the situation allows (irrespective of slice clock)
      • Can allow for pure fixed media alone
    • Higher throughput
    • Tessellator AutoStrip
  • Slice
    • L3 Cache
      • Increased to 768 KiB/slice (up from 576 KiB/slice)
      • Request queue size was increased
  • Subslice
    • Texture samplers now natively support an NV12 YUV
    • Preemption of execution is now supported at the thread level
    • Round robin scheduling of threads within an execution unit.
    • new native support for the 32-bit float atomics operations of min, max, and compare/exchange.
    • 16-bit floating point capability is improved with native support for denormals and gradual underflow
  • L4$

Block Diagram

Entire SoC Overview

skylake soc block diagram.svg

Gen9 LP

This block is for the most common setup, which is GT2 with 24 execution units.

gen9 lp gt2 block diagram.svg

Individual Core

See Skylake#Individual_Core.

Display

New text document.svg This section is empty; you can help add the missing info by editing this page.

Unslice

The Unslice is one of Gen9's major components and is responsible for the fixed-function geometry capabilities, fixed-function media capabilities, and it provides the interface to the memory fabric. One of the big changes in Gen9 is that the Unslice now sits on its own power/clock domain. This change allows the Unslice to operate at its own speed provided higher on-demand performance when desired. This change has a number of other benefits such as being able to turn off the slices (one or more) when they're not used in cases where pure fixed-function media is used. Additionally, the Unslice is now capable of running at a higher clock while the slice can run at a slower clock when the scenario demands it (such as in cases where higher fixed-function geometry or memory demands occur).

Slice

New text document.svg This section is empty; you can help add the missing info by editing this page.
codenameGen9 LP +
designerIntel +
first launchedAugust 5, 2015 +
full page nameintel/microarchitectures/gen9 +
instance ofmicroarchitecture +
manufacturerIntel +
microarchitecture typeGPU +
nameGen9 LP +
process14 nm (0.014 μm, 1.4e-5 mm) +