Instruction Set Architecture
- Instructions
- Addressing Modes
- Registers
- Model-Specific Register
- Assembly
- Interrupts
- Micro-Ops
- Timer
- Calling Convention
- Microarchitectures
- CPUID
AVX-512 Integer Fused Multiply-Add (AVX512_IFMA) is an x86 extension and part of the AVX-512 SIMD instruction set.
Overview[edit]
-
VPMADD52HUQ
,VPMADD52LUQ
- Parallel multiply-accumulate. The instructions multiply the 52 least significant bits in corresponding unsigned quadwords of the source operands, then add the lower (L) or upper (H) half of the 104-bit product, zero-extended from 52 to 64 bits, to the corresponding quadword of the destination vector.
The destination and first source operand is a vector register, the second source operand can be a vector register, a vector in memory, or a vector obtained by broadcasting one quadword in memory to all elements in the vector. Both instructions can operate on 128-, 256-, or 512-bit wide vectors. If the vector size is less than 512 bits the instructions zero the unused higher bits in the destination register to avoid a dependency on earlier instructions writing those bits. As usual these instructions support write masking. That means they can write individual elements in the destination vector unconditionally, leave them unchanged, or zero them if the corresponding bit in a mask register supplied as an additional source operand is zero. The masking mode is encoded in the instruction opcode.
Detection[edit]
Support for these instructions is indicated by the AVX512_IFMA feature flag. 128- and 256-bit vectors are supported if the AVX512VL flag is set as well.
The AVX-IFMA extension adds AVX encoded versions of these instructions operating on 128- and 256-bit vectors.
CPUID | Instruction Set | |
---|---|---|
Input | Output | |
EAX=07H, ECX=0 | EBX[bit 21] | AVX512_IFMA |
EAX=07H, ECX=0 | EBX[bit 31] | AVX512VL |
Microarchitecture support[edit]
Designer | Microarchitecture | Year | Support Level | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
F | CD | ER | PF | BW | DQ | VL | FP16 | IFMA | VBMI | VBMI2 | BITALG | VPOPCNTDQ | VP2INTERSECT | 4VNNIW | 4FMAPS | VNNI | BF16 | ||||
Intel | Knights Landing | 2016 | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | |
Knights Mill | 2017 | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✔ | ✘ | ✔ | ✔ | ✘ | ✘ | ||
Skylake (server) | 2017 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ||
Cannon Lake | 2018 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ||
Cascade Lake | 2019 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✔ | ✘ | ||
Cooper Lake | 2020 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✔ | ✔ | ||
Tiger Lake | 2020 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✔ | ✘ | ||
Rocket Lake | 2021 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✔ | ✘ | ||
Alder Lake | 2021 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ||
Ice Lake (server) | 2021 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✔ | ✘ | ||
Sapphire Rapids | 2023 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✔ | ✔ | ||
AMD | Zen 4 | 2022 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✔ | ✔ | |
Centaur | CHA | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Intrinsic functions[edit]
// VPMADD52HUQ
__m128i _mm_madd52hi_epu64( __m128i a, __m128i b, __m128i c);
__m128i _mm_mask_madd52hi_epu64(__m128i s, __mmask8 k, __m128i a, __m128i b, __m128i c);
__m128i _mm_maskz_madd52hi_epu64( __mmask8 k, __m128i a, __m128i b, __m128i c);
__m256i _mm256_madd52hi_epu64( __m256i a, __m256i b, __m256i c);
__m256i _mm256_mask_madd52hi_epu64(__m256i s, __mmask8 k, __m256i a, __m256i b, __m256i c);
__m256i _mm256_maskz_madd52hi_epu64( __mmask8 k, __m256i a, __m256i b, __m256i c);
__m512i _mm512_madd52hi_epu64( __m512i a, __m512i b, __m512i c);
__m512i _mm512_mask_madd52hi_epu64(__m512i s, __mmask8 k, __m512i a, __m512i b, __m512i c);
__m512i _mm512_maskz_madd52hi_epu64( __mmask8 k, __m512i a, __m512i b, __m512i c);
// VPMADD52LUQ
__m128i _mm_madd52lo_epu64( __m128i a, __m128i b, __m128i c);
__m128i _mm_mask_madd52lo_epu64(__m128i s, __mmask8 k, __m128i a, __m128i b, __m128i c);
__m128i _mm_maskz_madd52lo_epu64( __mmask8 k, __m128i a, __m128i b, __m128i c);
__m256i _mm256_madd52lo_epu64( __m256i a, __m256i b, __m256i c);
__m256i _mm256_mask_madd52lo_epu64(__m256i s, __mmask8 k, __m256i a, __m256i b, __m256i c);
__m256i _mm256_maskz_madd52lo_epu64( __mmask8 k, __m256i a, __m256i b, __m256i c);
__m512i _mm512_madd52lo_epu64( __m512i a, __m512i b, __m512i c);
__m512i _mm512_mask_madd52lo_epu64(__m512i s, __mmask8 k, __m512i a, __m512i b, __m512i c);
__m512i _mm512_maskz_madd52lo_epu64( __mmask8 k, __m512i a, __m512i b, __m512i c);
Bibliography[edit]
- "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z", Intel Order Nr. 325383, Rev. 078US, December 2022
- "Intel® Architecture Instruction Set Extensions and Future Features", Intel Order Nr. 319433, Rev. 047, December 2022