Instruction Set Architecture
 Instructions
 Addressing Modes
 Registers
 ModelSpecific Register
 Assembly
 Interrupts
 MicroOps
 Timer
 Calling Convention
 Microarchitectures
 CPUID
AVX512 Doubleword and Quadword Instructions (AVX512DQ) is an x86 extension, part of the AVX512 SIMD instruction set, and complements the AVX512 Foundation and AVX512BW (byte and word) extensions.
Overview[edit]
The AVX512DQ extension adds new and supplementary vector instructions operating on 32bit doublewords and 64bit quadwords, and some floating point instructions.
Integer instructions[edit]
New AVX512 instructions[edit]

VPMULLQ
 Parallel multiplication of unsigned quadwords, providing the lower half of the 128bit product.

VPMOVD2M
,VPMOVQ2M
 These instructions set the bits in a mask register, copying the most significant bit in the corresponding doubleword or quadword of the source vector register.

VPMOVM2D
,VPMOVM2Q
 These instructions set the bits in each doubleword or quadword of the destination vector to all ones or zeros, copying the corresponding bit in a mask register.

VEXTRACTI32X8
,VEXTRACTI64X2
 These instructions extract eight doublewords or a pair of quadwords from a 128bit lane of a vector register selected by a constant index, and store the data in memory, or in the lowest 128bit lane of a destination vector register.

VINSERTI32X8
,VINSERTI64X2
 These instructions insert eight doublewords or a pair of quadwords in a 128bit lane of a vector register selected by a constant index, loading the data from memory or the lowest 128bit lane of a source vector register.

VPBROADCASTI32X2
,VPBROADCASTI32X8
,VPBROADCASTI64X2
 These instructions broadcast a pair of doublewords, eight doublewords, or a pair of quadwords from memory or the lowest lane of a vector register to all lanes of the same width of the destination vector.
The 32X8
instructions above support only 512bit vectors, the 64X2
instructions only 256 and 512bit vectors.
Instructions promoted from SSE and AVX to AVX512[edit]

VPEXTRD
,VPEXTRQ
 These instructions extract a doubleword or quadword using a constant index to select an element from the lowest 128bit lane of a vector register and store it in a general purpose register or in memory.

VPINSRD
,VPINSRQ
 These instructions insert a doubleword or quadword taken from the lowest bits of a general purpose register or from memory, into the lowest 128bit lane of the destination vector register using a constant index to select the element. Bits 128 ... 511 of the destination vector register are zeroed. Write masking is not supported.
Floating point instructions[edit]
New AVX512 instructions[edit]

VRANGE(PS/PD)
,VRANGE(SD/SS)
 These instructions perform a parallel minimum or maximum operation on single or double precision values, either on their original or absolute values. They optionally change the sign of all results to positive or negative, or copy the sign of the corresponding element in the first source operand. The operation is selected by an immediate byte which is part of the opcode.
 A saturation operation like min(max(limit, value), +limit) for instance can be expressed as minimum of absolute values with sign copying.

VFPCLASS(PS/PD)
,VFPCLASS(SS/SD)
 These instructions test if the single or double precision values in the source operand belong to certain classes and set the bit corresponding to each element in the destination mask register to 1 = true or 0 = false. The "packed" instructions (PS/PD) operate on all elements, the "scalar" instructions (SS/SD) only on the lowest element and set a single mask bit. Unused higher bits of the 64bit mask register are cleared. The instructions support write masking which means they optionally perform a bitwise 'and' on the destination using a second mask register. The class is selected by an immediate byte which is part of the opcode and can be: QNaN, +0, 0, +∞, ∞, denormal, negative, SNaN, or any combination.

VREDUCE(PS/PD)
,VREDUCE(SS/SD)
 Parallel reduce transformation on single or double precision values. The operation is
 dest = source – round(2^{M} * source) * 2^{M}
 with desired rounding mode and M a constant in range 0 ... 15. These instructions can be used to accelerate transcendental functions.
The "scalar" variants (SS/SD) of the instructions above yield only a single result in the lowest element of the 128bit destination vector. Higher elements are left unchanged. 256 and 512bit vectors are not supported by these instructions, bits 128 ... 511 of the destination vector register are zeroed.

VCVT(PS/PD)2(QQ/UQQ)

VCVTT(PS/PD)2(QQ/UQQ)

VCVT(QQ/UQQ)2(PS/PD)
 Parallel conversion with desired rounding of signed (QQ) or unsigned (UQQ) quadwords to single precision (PS) or double precision (PD) values, or vice versa. The
VCVTT
variants always round with truncation i.e. toward zero.

VEXTRACTF32X8
,VEXTRACTF64X2

VINSERTF32X8
,VINSERTF64X2

VBROADCASTF32X2
,VBROADCASTF32X8
,VBROADCASTF64X2
 These instructions perform the same operation as their integer counterparts.
Instructions promoted from SSE and AVX to AVX512[edit]

VAND(PS/PD)
,VANDN(PS/PD)
,VOR(PS/PD)
,VXOR(PS/PD)
 Parallel bitwise logical operations on single or double precision values. The
ANDN
operation is (not source1) and source2.
Mask register instructions[edit]
Most of the instructions above support write masking. That means they can write individual elements in the destination vector unconditionally, leave them unchanged, or zero them if the corresponding bit in a mask register supplied as an additional source operand is zero. The masking mode is encoded in the instruction opcode.
The AVX512 Foundation defines instructions operating on 16bit masks which are used e.g. with 512bit vectors containing 16 single precision elements. AVX512DQ adds support for 8bit operations. The AVX512BW extension completes this set with support for 32 and 64bit masks.

KADDB
,KADDW
 Add two masks.
KADDW
was not defined by AVX512F.

KANDB
,KANDNB
,KNOTB
,KORB
,KXNORB
,KXORB
 Bitwise logical operations.
ANDN
is (not source1) and source2,XNOR
is not (source1 xor source2).

KTESTB
,KTESTW
 Performs bitwise operations temp1 = source1 and source2, temp2 = (not source1) and source2, sets the ZF and CF (for branch instructions) to indicate if the respective result is all zeros.

KORTESTB
 Performs bitwise operation temp = source1 or source2, sets ZF to indicate if the result is all zeros, CF if all ones.

KSHIFTLB
,KSHIFTRB
 Bitwise logical shift left/right by a constant.

KMOVB
 Copies a bit mask from a mask register to another mask register, a 32 or 64bit GPR, or memory, or from a GPR or memory to a mask register. The mask is zero extended if the destination register is wider.
Detection[edit]
Support for these instructions is indicated by the AVX512DQ feature flag. Except as noted they operate on 512bit vectors. Instruction variants operating on 128 and 256bit vectors are supported if the AVX512VL flag is set as well.
CPUID  Instruction Set  

Input  Output  
EAX=07H, ECX=0  EBX[bit 17]  AVX512DQ 
EAX=07H, ECX=0  EBX[bit 31]  AVX512VL 
Microarchitecture support[edit]
Designer  Microarchitecture  Year  Support Level  

F  CD  ER  PF  BW  DQ  VL  FP16  IFMA  VBMI  VBMI2  BITALG  VPOPCNTDQ  VP2INTERSECT  4VNNIW  4FMAPS  VNNI  BF16  
Intel  Knights Landing  2016  ✔  ✔  ✔  ✔  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  
Knights Mill  2017  ✔  ✔  ✔  ✔  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✔  ✘  ✔  ✔  ✘  ✘  
Skylake (server)  2017  ✔  ✔  ✘  ✘  ✔  ✔  ✔  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  
Cannon Lake  2018  ✔  ✔  ✘  ✘  ✔  ✔  ✔  ✘  ✔  ✔  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  
Cascade Lake  2019  ✔  ✔  ✘  ✘  ✔  ✔  ✔  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✔  ✘  
Cooper Lake  2020  ✔  ✔  ✘  ✘  ✔  ✔  ✔  ✔  ✘  ✘  ✘  ✘  ✘  ✘  ✔  ✔  
Tiger Lake  2020  ✔  ✔  ✘  ✘  ✔  ✔  ✔  ✘  ✔  ✔  ✔  ✔  ✔  ✔  ✘  ✘  ✔  
Rocket Lake  2021  ✔  ✔  ✘  ✘  ✔  ✔  ✔  ✘  ✔  ✔  ✔  ✔  ✔  ✘  ✘  ✘  ✔  
Alder Lake  2021  ✔  ✔  ✘  ✘  ✔  ✔  ✔  ✔  ✔  ✔  ✔  ✔  ✔  ✔  ✘  ✘  ✔  ✔  
Ice Lake (server)  2021  ✔  ✔  ✘  ✘  ✔  ✔  ✔  ✘  ✔  ✔  ✔  ✔  ✔  ✘  ✘  ✘  ✔  ✘  
Sapphire Rapids  2023  ✔  ✔  
AMD  Zen 4  2022  ✔  ✔  ✘  ✘  ✔  ✔  ✔  ✘  ✔  ✔  ✔  ✔  ✔  ✘  ✘  ✘  ✔  ✔  
Centaur  CHA  ✔  ✔  ✘  ✘  ✔  ✔  ✔  ✘  ✔  ✔  ✘  ✘  ✘  ✘  ✘  ✘  ✘  ✘ 
Bibliography[edit]
 "Intel® 64 and IA32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, AZ", Intel Order Nr. 325383, Rev. 078US, December 2022