From WikiChip
Editing intel/microarchitectures/gen9
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
− | {{intel title|Gen9|arch}} | + | {{intel title|Gen9 LP|arch}} |
{{microarchitecture | {{microarchitecture | ||
| atype = GPU | | atype = GPU | ||
− | | name = Gen9 | + | | name = Gen9 LP |
| designer = Intel | | designer = Intel | ||
| manufacturer = Intel | | manufacturer = Intel | ||
Line 10: | Line 10: | ||
| succession = Yes | | succession = Yes | ||
− | | predecessor = Gen8 | + | | predecessor = Gen8 LP |
− | | predecessor link = intel/microarchitectures/ | + | | predecessor link = intel/microarchitectures/gen8_lp |
− | | successor = Gen9.5 | + | | successor = Gen9.5 LP |
− | | successor link = intel/microarchitectures/gen9. | + | | successor link = intel/microarchitectures/gen9.5_lp |
}} | }} | ||
− | '''Gen9''' (''Generation 9'') is the [[microarchitecture]] for [[Intel]]'s [[graphics processing unit]] utilized by {{\\|Skylake}}-based microprocessors. Gen9 is the successor to {{\\|Gen8}} used by {{\\|Broadwell}}. The Gen9 microarchitecture is designed separately by Intel and then integrated onto the same Skylake SoC die. | + | '''Gen9 LP''' (''Generation 9 Low Power'') is the [[microarchitecture]] for [[Intel]]'s [[graphics processing unit]] utilized by {{\\|Skylake}}-based microprocessors. Gen9 LP is the successor to {{\\|Gen8 LP}} used by {{\\|Broadwell}}. The Gen9 microarchitecture is designed separately by Intel and then integrated onto the same Skylake SoC die. |
== Codenames == | == Codenames == | ||
− | |||
Various models support different Graphics Tiers (GT) which provides different levels of performance. Some models also support an additional [[eDRAM]] side cache. | Various models support different Graphics Tiers (GT) which provides different levels of performance. Some models also support an additional [[eDRAM]] side cache. | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 38: | Line 37: | ||
{| class="wikitable tc2 tc3" | {| class="wikitable tc2 tc3" | ||
|- | |- | ||
− | ! colspan="5" | Gen9 [[IGP]] Models !! colspan=" | + | ! colspan="5" | Gen9 LP [[IGP]] Models !! colspan="9" | Standards |
|- | |- | ||
− | ! rowspan="2" | Name !! rowspan="2" | Execution Units !! rowspan="2" | Tier !! rowspan="2" | Series !! rowspan="2" | eDRAM !! colspan="2" | [[Vulkan]] !! colspan="3" | [[Direct3D]] !! colspan="2" | [[OpenGL]] !! colspan="2" | [[OpenCL | + | ! rowspan="2" | Name !! rowspan="2" | Execution Units !! rowspan="2" | Tier !! rowspan="2" | Series !! rowspan="2" | eDRAM !! colspan="2" | [[Vulkan]] !! colspan="3" | [[Direct3D]] !! colspan="2" | [[OpenGL]] !! colspan="2" | [[OpenCL]] |
|- | |- | ||
− | | Windows || Linux || Windows || Linux || [[High Level Shading Language|HLSL]] || Windows || Linux || Windows || Linux | + | | Windows || Linux || Windows || Linux || [[High Level Shading Language|HLSL]] || Windows || Linux || Windows || Linux |
|- | |- | ||
− | | {{intel|HD Graphics (Skylake)}} || 12 || GT1 || {{intel|Skylake Y|Y|l=core}} || - || rowspan=" | + | | {{intel|HD Graphics (Skylake)}} || 12 || GT1 || {{intel|Skylake Y|Y|l=core}} || - || rowspan="9" colspan="2" style="text-align: center;" | '''1.0''' || rowspan="9" style="text-align: center;" | '''12''' || rowspan="9" style="text-align: center;" | '''N/A''' || rowspan="9" style="text-align: center;" | '''5.1''' || rowspan="9" style="text-align: center;" | '''4.4''' || rowspan="9" style="text-align: center;" | '''4.5''' || rowspan="9" style="text-align: center;" colspan="2" | '''2.0''' |
|- | |- | ||
| {{intel|HD Graphics 510}} || 12 || GT1 || {{intel|Skylake U|U|l=core}}, {{intel|Skylake S|S|l=core}} || - | | {{intel|HD Graphics 510}} || 12 || GT1 || {{intel|Skylake U|U|l=core}}, {{intel|Skylake S|S|l=core}} || - | ||
Line 59: | Line 58: | ||
|- | |- | ||
| {{intel|Iris Graphics 550}} || 48 || GT3e || {{intel|Skylake U|U|l=core}} || 64 MiB | | {{intel|Iris Graphics 550}} || 48 || GT3e || {{intel|Skylake U|U|l=core}} || 64 MiB | ||
− | |||
− | |||
|- | |- | ||
| {{intel|Iris Pro Graphics 580}} || 72 || GT4e || {{intel|Skylake H|H|l=core}} || 128 MiB | | {{intel|Iris Pro Graphics 580}} || 72 || GT4e || {{intel|Skylake H|H|l=core}} || 128 MiB | ||
− | |||
− | |||
|} | |} | ||
Line 94: | Line 89: | ||
| {{intel|HD Graphics 535}} || SKL U - ULT 2+3 || K1 || L1 || 0x1923 || 0xA | | {{intel|HD Graphics 535}} || SKL U - ULT 2+3 || K1 || L1 || 0x1923 || 0xA | ||
|- | |- | ||
− | | {{intel|Iris | + | | {{intel|Iris Graphics P555}} || SKL Media Server 4+3FE || N0 || J0 || 0x192D || 0x9 |
|- | |- | ||
− | | {{intel|Iris Pro Graphics | + | | {{intel|Iris Pro Graphics P580}} || SKL H Halo 4+4E || rowspan="2" | 72 || N0 || J0 || 0x193B || 0x9 |
|- | |- | ||
− | | {{intel|Iris Pro Graphics P580}} | + | | {{intel|Iris Pro Graphics P580}} || SKL WKS 4+4E || N0 || J0 || 0x193D || 0x9 |
|} | |} | ||
<references group=devID /> | <references group=devID /> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Hardware Accelerated Video == | == Hardware Accelerated Video == | ||
Line 147: | Line 103: | ||
== Process Technology == | == Process Technology == | ||
{{main|intel/microarchitectures/broadwell#Process_Technology|l1=Broadwell § Process Technology}} | {{main|intel/microarchitectures/broadwell#Process_Technology|l1=Broadwell § Process Technology}} | ||
− | Gen9 are part of the Skylake SoC die which uses the same [[14 nm process]] used for the Broadwell microarchitecture. | + | Gen9 LP are part of the Skylake SoC die which uses the same [[14 nm process]] used for the Broadwell microarchitecture. |
== Architecture == | == Architecture == | ||
− | Gen9 presents a large departure from the Gen8 and previous architectures. | + | Gen9 LP presents a large departure from the Gen8 LP and previous architectures. |
− | === Key changes from {{\\|Gen8}} === | + | === Key changes from {{\\|Gen8 LP}} === |
* Architecture is drastically different | * Architecture is drastically different | ||
− | ** Gen9 is composed of 3 | + | ** Gen9 LP is composed of 3 truely independent major components: Display block, Unslice, and the Slice. |
** Shared Virtual Memory (SVM) improvements | ** Shared Virtual Memory (SVM) improvements | ||
*** Improved cache coherency performance | *** Improved cache coherency performance | ||
Line 167: | Line 123: | ||
** RAW imaging capabilities | ** RAW imaging capabilities | ||
* Slice | * Slice | ||
− | |||
** L3 Cache | ** L3 Cache | ||
*** Increased to 768 [[KiB]]/slice (up from 576 KiB/slice) | *** Increased to 768 [[KiB]]/slice (up from 576 KiB/slice) | ||
Line 177: | Line 132: | ||
** Multi-plane overlays | ** Multi-plane overlays | ||
** Texture samplers now natively support an NV12 YUV | ** Texture samplers now natively support an NV12 YUV | ||
− | |||
** Preemption of execution is now supported at the thread level | ** Preemption of execution is now supported at the thread level | ||
** Round robin scheduling of threads within an execution unit. | ** Round robin scheduling of threads within an execution unit. | ||
Line 183: | Line 137: | ||
** 16-bit floating point capability is improved with native support for denormals and gradual underflow | ** 16-bit floating point capability is improved with native support for denormals and gradual underflow | ||
* L4$ | * L4$ | ||
− | ** The [[eDRAM]] is now a side cache instead of an L4$ like it was in {{\\|Gen8}}. (See {{\\|Skylake#eDRAM architectural changes|Skylake §eDRAM architectural changes}} for the reason) | + | ** The [[eDRAM]] is now a side cache instead of an L4$ like it was in {{\\|Gen8 LP}}. (See {{\\|Skylake#eDRAM architectural changes|Skylake §eDRAM architectural changes}} for the reason) |
** Side-cache eDRAM was moved into the system agent adjacent to the display controller | ** Side-cache eDRAM was moved into the system agent adjacent to the display controller | ||
Line 189: | Line 143: | ||
==== Entire SoC Overview ==== | ==== Entire SoC Overview ==== | ||
[[File:skylake soc block diagram.svg|900px]] | [[File:skylake soc block diagram.svg|900px]] | ||
− | ==== Gen9 ==== | + | ==== Gen9 LP ==== |
This block is for the most common setup, which is GT2 with 24 execution units. | This block is for the most common setup, which is GT2 with 24 execution units. | ||
Line 195: | Line 149: | ||
==== Individual Core ==== | ==== Individual Core ==== | ||
See {{intel|Skylake#Individual_Core|l=arch}}. | See {{intel|Skylake#Individual_Core|l=arch}}. | ||
+ | |||
+ | === Display === | ||
+ | {{empty section}} | ||
=== Unslice === | === Unslice === | ||
Line 203: | Line 160: | ||
The '''media general-purpose pipeline''' consists of two fixed-function units: Video Front End ('''VFE''') and the '''Thread Spawner''' ('''TS'''). The VFE unit handles the interfacing with the Command Streamer, writes thread payload data into the Unified Return Buffer, as well as prepares threads to be dispatched through TS unit. The VFE unit also contains the hardware '''Variable Length Decode''' ('''VLD''') engine for MPEG-2 video decode. The TS unit is primarily responsible for interfacing with the '''Thread Dispatcher''' ('''TD''') unit which is responsible for spawning new root-node parent threads originated from VFE unit and for spawning child threads (either leaf-node child threads or branch-node parent thread). | The '''media general-purpose pipeline''' consists of two fixed-function units: Video Front End ('''VFE''') and the '''Thread Spawner''' ('''TS'''). The VFE unit handles the interfacing with the Command Streamer, writes thread payload data into the Unified Return Buffer, as well as prepares threads to be dispatched through TS unit. The VFE unit also contains the hardware '''Variable Length Decode''' ('''VLD''') engine for MPEG-2 video decode. The TS unit is primarily responsible for interfacing with the '''Thread Dispatcher''' ('''TD''') unit which is responsible for spawning new root-node parent threads originated from VFE unit and for spawning child threads (either leaf-node child threads or branch-node parent thread). | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== 3D Pipeline Stages === | === 3D Pipeline Stages === | ||
Line 275: | Line 215: | ||
| Sample App2 || 17 ms || 24 ms || 240 µs || 200-430 µs | | Sample App2 || 17 ms || 24 ms || 240 µs || 200-430 µs | ||
|} | |} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Scalability == | == Scalability == | ||
Line 301: | Line 220: | ||
=== GT1 (ULP) === | === GT1 (ULP) === | ||
− | GT1 is the most compact configuration offering two benefits: reduced cost and reduced power. GT1 is made of 1 slice containing 2 subslices with 6 EUs/subslice for a total of 12 EUs. With the scale-down, GT1 changes the ratio to 6:1 EU:sampler ratio. Note that this does retains the same ratio of 12 texels/clock and 8 pixels/clock at the backend. This configuration is better suited for some of the low power | + | GT1 is the most compact configuration offering two benefits: reduced cost and reduced power. GT1 is made of 1 slice containing 2 subslices with 6 EUs/subslice for a total of 12 EUs. With the scale-down, GT1 changes the ratio to 6:1 EU:sampler ratio. Note that this does retains the same ratio of 12 texels/clock and 8 pixels/clock at the backend. This configuration is better suited for some of the low power worlkload (e.g. ASTC-LDR+HDR, ETC1/2 compression). Note that software stack remains unchanged compared to the larger models. |
[[File:gen9 lp gt1 block diagram.svg|600px]] | [[File:gen9 lp gt1 block diagram.svg|600px]] | ||
Line 492: | Line 411: | ||
|SFC Instances || 1 || 1 || 1 || 1 || 1 | |SFC Instances || 1 || 1 || 1 || 1 || 1 | ||
|} | |} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Facts about "Gen9 - Microarchitectures - Intel"
codename | Gen9 + |
designer | Intel + |
first launched | August 5, 2015 + |
full page name | intel/microarchitectures/gen9 + |
instance of | microarchitecture + |
manufacturer | Intel + |
microarchitecture type | GPU + |
name | Gen9 + |
process | 14 nm (0.014 μm, 1.4e-5 mm) + |