Advanced Vector Extensions 512 (AVX-512) - x86

x86
Instruction Set Architecture

History
Families

AMD's CPUIDs
Intel's CPUIDs

Advanced Vector Extensions 512 (AVX-512) is collective name for a number of 512-bit SIMD x86 instruction set extensions. The extensions were formally introduced by Intel in July 2013 with first general-purpose microprocessors implementing the extensions introduced in July 2017.

Overview

AVX-512 is a set of 512-bit SIMD extensions that allow programs to pack sixteen single-precision eight double-precision floating-point numbers, or eight 64-bit or sixteen 32-bit integers within 512-bit vectors. The extension provides double the computation capabilities of that of AVX/AV2.

AVX512F - AVX-512 Foundation is base of the 512-bit SIMD instruction extensions which is a comprehensive list of features for most HPC and enterprise applications. AVX-512 Foundation is the natural extensions to AVX/AVX2 which is extended using the EVEX prefix which builds on the existing VEX prefix. Any processor that implements any portion of the AVX-512 extensions MUST implement AVX512F.

AVX512CD - AVX-512 Conflict Detection Instructions offer additional vectorization of loops with possible address conflict.

AVX512DQ - AVX-512 Doubleword and Quadword Instructions add new 32-bit and 64-bit AVX-512 instructions (conversion, transcendental support, etc..).

AVX512PF - AVX-512 Prefetch Instructions add new prefetch instructions for gather/scatter and PREFETCHWT1 .

AVX512ER - AVX-512 Exponential and Reciprocal Instructions (ERI) offer 28-bit precision RCP, RSQRT and EXP transcendentals for various scientific applications.

AVX512VL - AVX-512 Vector Length Instructions add vector length orthogonality, allowing most AVX-512 operations to also operate on XMM (128-bit SSE) registers and YMM (256-bit AVX) registers

AVX512BW - AVX-512 Byte and Word Instructions add support for for 8-bit and 16-bit integer operations.

AVX512IFMA - AVX-512 Integer Fused Multiply-Add Instructions add support for fused multiply add of integers using 52-bit precision.

AVX512VBMI - AVX-512 Vector Bit Manipulation, Version 1 add additional vector byte permutation instructions.

AVX512VBMI2 - AVX-512 Vector Bit Manipulation, Version 2 add additional vector byte permutation instructions.

AVX512VAES - AVX-512 Vector AES Instructions add vector AES operations

AVX512BITALG - AVX-512 Bit Algorithms add bit algorithms operations

AVX5124FMAPS - AVX-512 Fused Multiply Accumulation Packed Single precision Instructions add vector instructions for deep learning on floating-point single precision

AVX512VPCLMULQDQ - AVX-512 Vector Carry-less Multiply add vector carry-less multiply operations

AVX512GFNI - AVX-512 Galois Field New Instructions add Galois Field transformations instructions

AVX512_VNNI - AVX-512 Vector Neural Network Instructions add vector instructions for deep learning

AVX5124VNNIW - AVX-512 Vector Neural Network Instructions Word Variable Precision Instructions add vector instructions for deep learning on enhanced word variable precision

AVX512VPOPCNTDQ - AVX-512 Vector Population Count Doubleword and Quadword Instructions add double and quad word population count instructions.

AVX512_BF16 - AVX-512 BFloat16 Instructions add vector instructions for operating on bfloat16

Note that,

Formerly, the term AVX3.1 referred to F + CD + ER + PF.
Formerly, the term AVX3.2 referred to F + CD + BW + DQ + VL.

Detection

CPUID		Instruction Set
Input	Output	Instruction Set
EAX=07H, ECX=0	EBX[bit 16]	AVX512F
	ECX[bit 12]	AVX512BITALG
	EBX[bit 17]	AVX512DQ
	EBX[bit 21]	AVX512IFMA
	EBX[bit 26]	AVX512PF
	EBX[bit 27]	AVX512ER
	EBX[bit 28]	AVX512CD
	EBX[bit 30]	AVX512BW
	EBX[bit 31]	AVX512VL
	ECX[bit 01]	AVX512VBMI
	ECX[bit 11]	AVX512_VNNI
	ECX[bit 14]	AVX512VPOPCNTDQ
	EDX[bit 02]	AVX5124VNNIW
	EDX[bit 03]	AVX5124FMAPS
EAX=07H, ECX=1	EAX[bit 05]	AVX512_BF16

Implementation

Designer	Microarchitecture	Support Level
Designer	Microarchitecture	F	CD	ER	PF	BW	DQ	VL	IFMA	VBMI	4FMAPS	VNNI	4VNNIW	VPOPCNTDQ	BF16
Intel	Knights Landing	✔	✔	✔	✔	✘	✘	✘	✘	✘	✘	✘	✘	✘	✘
	Knights Mill	✔	✔	✔	✔	✘	✘	✘	✘	✘	✔	✔	✔	✔	✘
	Skylake (server)	✔	✔	✘	✘	✔	✔	✔	✘	✘	✘	✘	✘	✘	✘
	Cascade Lake	✔	✔	✘	✘	✔	✔	✔	✘	✘	✘	✔	✘	✘	✘
	Cannon Lake	✔	✔	✘	✘	✔	✔	✔	✔	✔	✘		✘	✘	✘
	Ice Lake (server)	✔	✔	✘	✘	✔	✔	✔	✔	✔	✔	✔	✘	✘	✘
	Cooper Lake	✔	✔	✘	✘	✔	✔	✔	✘	✘	✘	✔	✘	✘	✔
AMD	Zen 4	✔	✔	✘	✘	✔	✔	✔	✔	✔	✘	✔	✘	✔	✔
Centaur	CHA	✔	✔	✘	✘	✔	✔	✔	✔	✔	✘		✘	✘	✘

Documents

Intel® Advanced Vector Extensions 2015/2016; GNU Tools Cauldron 2014; Presented by Kirill Yukhin of Intel, July 2014

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas

Samsung

Contents

Overview

Detection

Implementation

Documents