Latest revision |
Your text |
Line 1: |
Line 1: |
| {{x86 title|Advanced Matrix Extension (AMX)}}{{x86 isa main}} | | {{x86 title|Advanced Matrix Extension (AMX)}}{{x86 isa main}} |
− | '''Advanced Matrix Extension''' ('''AMX''') is an [[x86]] {{x86|extension}} that introduces a matrix register file and new instructions for operating on matrices. | + | '''Advanced Matrix Extension''' ('''AMX''') is an [[x86]] extension that introduces an accelerator framework for operating on matrices. |
| | | |
| == Overview == | | == Overview == |
− | [[File:amx architecture.svg|thumb|right|AMX Architecture]]
| + | The Advanced Matrix Extension (AMX) is an [[x86]] extension that introduces a new programming framework for working with matrices. AMX introduces two new components - a 2-dimensional register file with registers called 'tiles' and a set of [[accelerators]] that are able to operate on those tiles. The tiles represent a sub-array portion from a large 2-dimensional memory image. AMX instructions synchronous in the instructions stream and memory loads and stores by tiles are coherent with the host's memory accesses. AMX instructions may be freely interleaved with traditional x86 code and parallel with other extensions (e.g., [[AVX512]]) with special tile loads and stores and accelerator commands being sent over to the accelerator for execution. |
− | The Advanced Matrix Extension (AMX) is an [[x86]] extension that introduces a new programming framework for working with matrices (rank-2 tensors). The extensions introduce two new components: a 2-dimensional [[register file]] with registers called 'tiles' and a set of [[accelerators]] that are able to operate on those tiles. The tiles represent a sub-array portion from a large 2-dimensional memory image. AMX instructions are synchronous in the [[instruction stream]] with memory load/store operations by tiles being coherent with the host's memory accesses. AMX instructions may be freely interleaved with traditional x86 code and execute in parallel with other extensions (e.g., [[AVX512]]) with special tile loads and stores and accelerator commands being sent over to the accelerator for execution. | |
| | | |
| === Palettes === | | === Palettes === |
Line 19: |
Line 18: |
| AMX supports a set of accelerators that can operate on tiles. Currently, just one accelerator is defined. | | AMX supports a set of accelerators that can operate on tiles. Currently, just one accelerator is defined. |
| ==== Tile matrix multiply unit (TMUL) ==== | | ==== Tile matrix multiply unit (TMUL) ==== |
− | The '''Tile Matrix Multiply''' ('''TMUL''') Unit is an accelerator as part of AMX comprising a grid of fused multiply-add units capable of operating on tiles. Its existence is defined by the ''AMX-INT8'' and ''AMX-BF16'' sub-extensions. The TMUL unit instruction set computes Tile<sub>C</sub>[M][N] += Tile<sub>A</sub>[M][K] * Tile<sub>B</sub>[K][N].
| + | {{empty section}} |
− | | |
− | The TMUL unit comes with a number of parameters supported including the maximum height (<code>tmul_maxk</code>) and maximum SIMD dimension (<code>tmul_maxn</code>). Those parameters are dynamically read by the TMUL unit upon execution.
| |
− | | |
− | == Instructions ==
| |
− | [[File:amx dot product of tiles.svg|thumb|right|2 x 3 Dot Product]]
| |
− | AMX introduces 12 new instructions:
| |
− | | |
− | Configuration:
| |
− | * <code>LDTILECFG</code> - Load tile configuration, loads the tile configuration from the 64-byte memory location specified.
| |
− | * <code>STTILECFG</code> - Store tile configuration, stores the tile configuration in the 64-byte memory location specified.
| |
− | | |
− | Data:
| |
− | * <code>TILELOADD</code>/<code>TILELOADDT1</code> - Load tile
| |
− | * <code>TILESTORED</code> - Store tile
| |
− | * <code>TILERELEASE</code> - Release tile, returns TILECFG and TILEDATA to the INIT state
| |
− | * <code>TILEZERO</code> - Zero tile, zeroes the destination tile
| |
− | | |
− | Operation:
| |
− | * <code>TDPBF16PS</code> - Perform a dot-product of [[BF16]] tiles and accumulate the result. Packed Single Accumulation.
| |
− | * <code>TDPB[XX]D</code> - Perform a dot-product of [[Int8]] tiles and accumulate the result. Dword Accumulation.
| |
− | ** Where ''XX'' can be: ''SU'' = Signed/Unsigned, ''US'' = Unsigned/Signed, ''SS'' = Signed/Signed, and ''UU'' = Unsigned/Unsigned pairs.
| |
− | | |
− | === Feature set ===
| |
− | Not all hardware implementations support all operations. The AMX extension comprises three sub-extensions: '''AMX-TILE''', '''AMX-INT8''', and '''AMX-BF16'''.
| |
− | | |
− | {| class="wikitable" | |
− | |-
| |
− | ! rowspan="3" | Instruction !! colspan="3" | Feature Set
| |
− | |-
| |
− | |-
| |
− | ! Base || colspan="2" | [[#TMUL|TMUL]]
| |
− | |-
| |
− | ! AMX-TILE !! AMX-INT8 !! AMX-BF16
| |
− | |-
| |
− | | <code>LDTILECFG</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>STTILECFG</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>TILELOADD</code><br><code>TILELOADDT1</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>TILESTORED</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>TILERELEASE</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>TILEZERO</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>TDPBSSD</code><br>
| |
− | <code>TDPBSUD</code><br>
| |
− | <code>TDPBUSD</code><br>
| |
− | <code>TDPBUUD</code>
| |
− | | || {{tchk|yes}} ||
| |
− | |-
| |
− | | <code>TDPBF16PS</code> || || || {{tchk|yes}}
| |
− | |}
| |
− | | |
− | == Detection ==
| |
− | {| class="wikitable"
| |
− | ! colspan="2" | {{x86|CPUID}} !! rowspan="2" | Instruction Set
| |
− | |-
| |
− | ! Input !! Output
| |
− | |-
| |
− | | rowspan="3" | EAX=07H, ECX=0 || EDX[bit 22] || AMX-BF16
| |
− | |-
| |
− | | EDX[bit 24] || AMX-TILE
| |
− | |-
| |
− | | EDX[bit 25] || AMX-INT8
| |
− | |}
| |
− | | |
| | | |
| == Microarchitecture support == | | == Microarchitecture support == |
− | [[File:intel server roadmap (2020) with amx.png|thumb|right|AMX was first planned for {{intel|Sapphire Rapids|l=arch}}.]]
| |
| {| class="wikitable" | | {| class="wikitable" |
| |- | | |- |
− | ! Microarchitecture !! AMX-TILE !! AMX-INT8 !! AMX-BF16 | + | ! Instructions !! Introduction |
| |- | | |- |
− | | {{intel|Sapphire Rapids|l=arch}} || {{tchk|yes}} || {{tchk|yes}} || {{tchk|yes}} | + | | AMX || {{intel|Sapphire Rappids|l=arch}} (server) |
| |} | | |} |
| | | |