Latest revision |
Your text |
Line 1: |
Line 1: |
| {{x86 title|Advanced Matrix Extension (AMX)}}{{x86 isa main}} | | {{x86 title|Advanced Matrix Extension (AMX)}}{{x86 isa main}} |
− | '''Advanced Matrix Extension''' ('''AMX''') is an [[x86]] {{x86|extension}} that introduces a matrix register file and new instructions for operating on matrices. | + | '''Advanced Matrix Extension''' ('''AMX''') is an [[x86]] extension that introduces an asynchronous accelerator framework for operating on matrices. |
| | | |
| == Overview == | | == Overview == |
− | [[File:amx architecture.svg|thumb|right|AMX Architecture]]
| + | The Advanced Matrix Extension (AMX) is an [[x86]] extension that introduces a new programming framework for working with matrices. AMX introduces two new components - a 2-dimensional register file called 'Tiles' and a set of [[accelerators]] that are able to operate on those tiles. |
− | The Advanced Matrix Extension (AMX) is an [[x86]] extension that introduces a new programming framework for working with matrices (rank-2 tensors). The extensions introduce two new components: a 2-dimensional [[register file]] with registers called 'tiles' and a set of [[accelerators]] that are able to operate on those tiles. The tiles represent a sub-array portion from a large 2-dimensional memory image. AMX instructions are synchronous in the [[instruction stream]] with memory load/store operations by tiles being coherent with the host's memory accesses. AMX instructions may be freely interleaved with traditional x86 code and execute in parallel with other extensions (e.g., [[AVX512]]) with special tile loads and stores and accelerator commands being sent over to the accelerator for execution. | |
− | | |
− | === Palettes ===
| |
− | Determining the kind of operations available on specific hardware can be done by enumerating a palette of options.
| |
− | | |
− | Currently, 2 palettes exist:
| |
− | | |
− | * Palette 0 - initialized state
| |
− | * Palette 1 - an 8-tile register file with each register being 16 rows x 64-byte (1 KiB) for a total register file of 8 KiB.
| |
− | | |
− | A programmer can configure the size of the register file by configuring tiles of smaller dimensions to suit their algorithm. Tiles may be configured in rows and bytes_per_row which are stored as metadata for the accelerator to operate on. Information pertaining to the palette is stored in a tile control register (TILECFG) and is accessible via the palette_table CPUID leaf 1DH. The TILECFG is programmed using the <code>LDTILECFG </code> instruction.
| |
− | | |
− | === Accelerators ===
| |
− | AMX supports a set of accelerators that can operate on tiles. Currently, just one accelerator is defined.
| |
− | ==== Tile matrix multiply unit (TMUL) ====
| |
− | The '''Tile Matrix Multiply''' ('''TMUL''') Unit is an accelerator as part of AMX comprising a grid of fused multiply-add units capable of operating on tiles. Its existence is defined by the ''AMX-INT8'' and ''AMX-BF16'' sub-extensions. The TMUL unit instruction set computes Tile<sub>C</sub>[M][N] += Tile<sub>A</sub>[M][K] * Tile<sub>B</sub>[K][N].
| |
− | | |
− | The TMUL unit comes with a number of parameters supported including the maximum height (<code>tmul_maxk</code>) and maximum SIMD dimension (<code>tmul_maxn</code>). Those parameters are dynamically read by the TMUL unit upon execution.
| |
− | | |
− | == Instructions ==
| |
− | [[File:amx dot product of tiles.svg|thumb|right|2 x 3 Dot Product]]
| |
− | AMX introduces 12 new instructions:
| |
− | | |
− | Configuration:
| |
− | * <code>LDTILECFG</code> - Load tile configuration, loads the tile configuration from the 64-byte memory location specified.
| |
− | * <code>STTILECFG</code> - Store tile configuration, stores the tile configuration in the 64-byte memory location specified.
| |
− | | |
− | Data:
| |
− | * <code>TILELOADD</code>/<code>TILELOADDT1</code> - Load tile
| |
− | * <code>TILESTORED</code> - Store tile
| |
− | * <code>TILERELEASE</code> - Release tile, returns TILECFG and TILEDATA to the INIT state
| |
− | * <code>TILEZERO</code> - Zero tile, zeroes the destination tile
| |
− | | |
− | Operation:
| |
− | * <code>TDPBF16PS</code> - Perform a dot-product of [[BF16]] tiles and accumulate the result. Packed Single Accumulation.
| |
− | * <code>TDPB[XX]D</code> - Perform a dot-product of [[Int8]] tiles and accumulate the result. Dword Accumulation.
| |
− | ** Where ''XX'' can be: ''SU'' = Signed/Unsigned, ''US'' = Unsigned/Signed, ''SS'' = Signed/Signed, and ''UU'' = Unsigned/Unsigned pairs.
| |
− | | |
− | === Feature set ===
| |
− | Not all hardware implementations support all operations. The AMX extension comprises three sub-extensions: '''AMX-TILE''', '''AMX-INT8''', and '''AMX-BF16'''.
| |
− | | |
− | {| class="wikitable"
| |
− | |-
| |
− | ! rowspan="3" | Instruction !! colspan="3" | Feature Set
| |
− | |-
| |
− | |-
| |
− | ! Base || colspan="2" | [[#TMUL|TMUL]]
| |
− | |-
| |
− | ! AMX-TILE !! AMX-INT8 !! AMX-BF16
| |
− | |-
| |
− | | <code>LDTILECFG</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>STTILECFG</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>TILELOADD</code><br><code>TILELOADDT1</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>TILESTORED</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>TILERELEASE</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>TILEZERO</code> || {{tchk|yes}} || ||
| |
− | |-
| |
− | | <code>TDPBSSD</code><br>
| |
− | <code>TDPBSUD</code><br>
| |
− | <code>TDPBUSD</code><br>
| |
− | <code>TDPBUUD</code>
| |
− | | || {{tchk|yes}} ||
| |
− | |-
| |
− | | <code>TDPBF16PS</code> || || || {{tchk|yes}}
| |
− | |}
| |
− | | |
− | == Detection ==
| |
− | {| class="wikitable"
| |
− | ! colspan="2" | {{x86|CPUID}} !! rowspan="2" | Instruction Set
| |
− | |-
| |
− | ! Input !! Output
| |
− | |-
| |
− | | rowspan="3" | EAX=07H, ECX=0 || EDX[bit 22] || AMX-BF16
| |
− | |-
| |
− | | EDX[bit 24] || AMX-TILE
| |
− | |-
| |
− | | EDX[bit 25] || AMX-INT8
| |
− | |}
| |
− | | |
− | | |
− | == Microarchitecture support ==
| |
− | [[File:intel server roadmap (2020) with amx.png|thumb|right|AMX was first planned for {{intel|Sapphire Rapids|l=arch}}.]]
| |
− | {| class="wikitable"
| |
− | |-
| |
− | ! Microarchitecture !! AMX-TILE !! AMX-INT8 !! AMX-BF16
| |
− | |-
| |
− | | {{intel|Sapphire Rapids|l=arch}} || {{tchk|yes}} || {{tchk|yes}} || {{tchk|yes}}
| |
− | |}
| |
− | | |
− | == Intrinsic functions ==
| |
− | <source lang=asm>
| |
− | </source>
| |
− | | |
− | == Bibliography ==
| |
− | * ''Intel Architecture Instruction Set Extensions and Future Features Programming Reference'', Revision 40. (Ref #319433-040)
| |
− | | |
− | [[Category:x86_extensions]]
| |