(Created page with "{{x86 title|AVX-512 Doubleword and Quadword Instructions (AVX512DQ)}}{{x86 isa main}} '''AVX-512 Doubleword and Quadword Instructions''' ('''AVX512DQ''') is an x86 extensi...") |
(No difference)
|
Latest revision as of 14:09, 15 March 2023
Instruction Set Architecture
- Instructions
- Addressing Modes
- Registers
- Model-Specific Register
- Assembly
- Interrupts
- Micro-Ops
- Timer
- Calling Convention
- Microarchitectures
- CPUID
AVX-512 Doubleword and Quadword Instructions (AVX512DQ) is an x86 extension, part of the AVX-512 SIMD instruction set, and complements the AVX-512 Foundation and AVX512BW (byte and word) extensions.
Overview[edit]
The AVX512DQ extension adds new and supplementary vector instructions operating on 32-bit doublewords and 64-bit quadwords, and some floating point instructions.
Integer instructions[edit]
New AVX-512 instructions[edit]
-
VPMULLQ
- Parallel multiplication of unsigned quadwords, providing the lower half of the 128-bit product.
-
VPMOVD2M
,VPMOVQ2M
- These instructions set the bits in a mask register, copying the most significant bit in the corresponding doubleword or quadword of the source vector register.
-
VPMOVM2D
,VPMOVM2Q
- These instructions set the bits in each doubleword or quadword of the destination vector to all ones or zeros, copying the corresponding bit in a mask register.
-
VEXTRACTI32X8
,VEXTRACTI64X2
- These instructions extract eight doublewords or a pair of quadwords from a 128-bit lane of a vector register selected by a constant index, and store the data in memory, or in the lowest 128-bit lane of a destination vector register.
-
VINSERTI32X8
,VINSERTI64X2
- These instructions insert eight doublewords or a pair of quadwords in a 128-bit lane of a vector register selected by a constant index, loading the data from memory or the lowest 128-bit lane of a source vector register.
-
VPBROADCASTI32X2
,VPBROADCASTI32X8
,VPBROADCASTI64X2
- These instructions broadcast a pair of doublewords, eight doublewords, or a pair of quadwords from memory or the lowest lane of a vector register to all lanes of the same width of the destination vector.
The 32X8
instructions above support only 512-bit vectors, the 64X2
instructions only 256- and 512-bit vectors.
Instructions promoted from SSE and AVX to AVX-512[edit]
-
VPEXTRD
,VPEXTRQ
- These instructions extract a doubleword or quadword using a constant index to select an element from the lowest 128-bit lane of a vector register and store it in a general purpose register or in memory.
-
VPINSRD
,VPINSRQ
- These instructions insert a doubleword or quadword taken from the lowest bits of a general purpose register or from memory, into the lowest 128-bit lane of the destination vector register using a constant index to select the element. Bits 128 ... 511 of the destination vector register are zeroed. Write masking is not supported.
Floating point instructions[edit]
New AVX-512 instructions[edit]
-
VRANGE(PS/PD)
,VRANGE(SD/SS)
- These instructions perform a parallel minimum or maximum operation on single or double precision values, either on their original or absolute values. They optionally change the sign of all results to positive or negative, or copy the sign of the corresponding element in the first source operand. The operation is selected by an immediate byte which is part of the opcode.
- A saturation operation like min(max(-limit, value), +limit) for instance can be expressed as minimum of absolute values with sign copying.
-
VFPCLASS(PS/PD)
,VFPCLASS(SS/SD)
- These instructions test if the single or double precision values in the source operand belong to certain classes and set the bit corresponding to each element in the destination mask register to 1 = true or 0 = false. The "packed" instructions (PS/PD) operate on all elements, the "scalar" instructions (SS/SD) only on the lowest element and set a single mask bit. Unused higher bits of the 64-bit mask register are cleared. The instructions support write masking which means they optionally perform a bitwise 'and' on the destination using a second mask register. The class is selected by an immediate byte which is part of the opcode and can be: QNaN, +0, -0, +∞, -∞, denormal, negative, SNaN, or any combination.
-
VREDUCE(PS/PD)
,VREDUCE(SS/SD)
- Parallel reduce transformation on single or double precision values. The operation is
- dest = source – round(2M * source) * 2-M
- with desired rounding mode and M a constant in range 0 ... 15. These instructions can be used to accelerate transcendental functions.
The "scalar" variants (SS/SD) of the instructions above yield only a single result in the lowest element of the 128-bit destination vector. Higher elements are left unchanged. 256- and 512-bit vectors are not supported by these instructions, bits 128 ... 511 of the destination vector register are zeroed.
-
VCVT(PS/PD)2(QQ/UQQ)
-
VCVTT(PS/PD)2(QQ/UQQ)
-
VCVT(QQ/UQQ)2(PS/PD)
- Parallel conversion with desired rounding of signed (QQ) or unsigned (UQQ) quadwords to single precision (PS) or double precision (PD) values, or vice versa. The
VCVTT
variants always round with truncation i.e. toward zero.
-
VEXTRACTF32X8
,VEXTRACTF64X2
-
VINSERTF32X8
,VINSERTF64X2
-
VBROADCASTF32X2
,VBROADCASTF32X8
,VBROADCASTF64X2
- These instructions perform the same operation as their integer counterparts.
Instructions promoted from SSE and AVX to AVX-512[edit]
-
VAND(PS/PD)
,VANDN(PS/PD)
,VOR(PS/PD)
,VXOR(PS/PD)
- Parallel bitwise logical operations on single or double precision values. The
ANDN
operation is (not source1) and source2.
Mask register instructions[edit]
Most of the instructions above support write masking. That means they can write individual elements in the destination vector unconditionally, leave them unchanged, or zero them if the corresponding bit in a mask register supplied as an additional source operand is zero. The masking mode is encoded in the instruction opcode.
The AVX-512 Foundation defines instructions operating on 16-bit masks which are used e.g. with 512-bit vectors containing 16 single precision elements. AVX512DQ adds support for 8-bit operations. The AVX512BW extension completes this set with support for 32- and 64-bit masks.
-
KADDB
,KADDW
- Add two masks.
KADDW
was not defined by AVX512F.
-
KANDB
,KANDNB
,KNOTB
,KORB
,KXNORB
,KXORB
- Bitwise logical operations.
ANDN
is (not source1) and source2,XNOR
is not (source1 xor source2).
-
KTESTB
,KTESTW
- Performs bitwise operations temp1 = source1 and source2, temp2 = (not source1) and source2, sets the ZF and CF (for branch instructions) to indicate if the respective result is all zeros.
-
KORTESTB
- Performs bitwise operation temp = source1 or source2, sets ZF to indicate if the result is all zeros, CF if all ones.
-
KSHIFTLB
,KSHIFTRB
- Bitwise logical shift left/right by a constant.
-
KMOVB
- Copies a bit mask from a mask register to another mask register, a 32- or 64-bit GPR, or memory, or from a GPR or memory to a mask register. The mask is zero extended if the destination register is wider.
Detection[edit]
Support for these instructions is indicated by the AVX512DQ feature flag. Except as noted they operate on 512-bit vectors. Instruction variants operating on 128- and 256-bit vectors are supported if the AVX512VL flag is set as well.
CPUID | Instruction Set | |
---|---|---|
Input | Output | |
EAX=07H, ECX=0 | EBX[bit 17] | AVX512DQ |
EAX=07H, ECX=0 | EBX[bit 31] | AVX512VL |
Microarchitecture support[edit]
Designer | Microarchitecture | Year | Support Level | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
F | CD | ER | PF | BW | DQ | VL | FP16 | IFMA | VBMI | VBMI2 | BITALG | VPOPCNTDQ | VP2INTERSECT | 4VNNIW | 4FMAPS | VNNI | BF16 | ||||
Intel | Knights Landing | 2016 | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | |
Knights Mill | 2017 | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✔ | ✘ | ✔ | ✔ | ✘ | ✘ | ||
Skylake (server) | 2017 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ||
Cannon Lake | 2018 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ||
Cascade Lake | 2019 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✔ | ✘ | ||
Cooper Lake | 2020 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✔ | ✔ | ||
Tiger Lake | 2020 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✔ | ✘ | ||
Rocket Lake | 2021 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✔ | ✘ | ||
Alder Lake | 2021 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ||
Ice Lake (server) | 2021 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✔ | ✘ | ||
Sapphire Rapids | 2023 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✔ | ✔ | ||
AMD | Zen 4 | 2022 | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✔ | ✔ | |
Centaur | CHA | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Bibliography[edit]
- "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z", Intel Order Nr. 325383, Rev. 078US, December 2022