From WikiChip
Galois Field New Instructions (GFNI) - x86
< x86

Galois Field New Instructions (GFNI) is an x86 extension.

Text document with shapes.svg This article is still a stub and needs your attention. You can help improve this article by editing this page and adding the missing information.


Overview[edit]

GF2P8AFFINEQB
...
GF2P8AFFINEINVQB
...
GF2P8MULB
...

Motivation[edit]

Detection[edit]

The GFNI feature flag indicates support for the SSE variant of these instructions operating on 128-bit vectors. 128- and 256-bit AVX versions are supported if the AVX flag is set as well.

The AVX-512 variants with EVEX encoding operating on 512-bit vectors are supported if the GFNI and AVX512F (Foundation) flags are set. The 128- and 256-bit versions with EVEX encoding are supported if the AVX512VL (Vector Length) flag is set as well.

CPUID Instruction Set
Input Output
EAX=01H ECX[bit 28] AVX
EAX=07H, ECX=0 EBX[bit 16] AVX512F
EAX=07H, ECX=0 EBX[bit 31] AVX512VL
EAX=07H, ECX=0 ECX[bit 08] GFNI

Microarchitecture support[edit]

Designer Microarchitecture Year Support Level
GFNI SSE GFNI AVX GFNI AVX-512 (128/256) GFNI AVX-512 (512)
AMD Zen 4 2022

Intrinsic functions[edit]

// GF2P8AFFINEQB
__m128i _mm_gf2p8affine_epi64_epi8(__m128i, __m128i, int);
__m128i _mm_mask_gf2p8affine_epi64_epi8(__m128i, __mmask16, __m128i, __m128i, int);
__m128i _mm_maskz_gf2p8affine_epi64_epi8(__mmask16, __m128i, __m128i, int);
__m256i _mm256_gf2p8affine_epi64_epi8(__m256i, __m256i, int);
__m256i _mm256_mask_gf2p8affine_epi64_epi8(__m256i, __mmask32, __m256i, __m256i, int);
__m256i _mm256_maskz_gf2p8affine_epi64_epi8(__mmask32, __m256i, __m256i, int);
__m512i _mm512_gf2p8affine_epi64_epi8(__m512i, __m512i, int);
__m512i _mm512_mask_gf2p8affine_epi64_epi8(__m512i, __mmask64, __m512i, __m512i, int);
__m512i _mm512_maskz_gf2p8affine_epi64_epi8(__mmask64, __m512i, __m512i, int);
// GF2P8AFFINEINVQB
__m128i _mm_gf2p8affineinv_epi64_epi8(__m128i, __m128i, int);
__m128i _mm_mask_gf2p8affineinv_epi64_epi8(__m128i, __mmask16, __m128i, __m128i, int);
__m128i _mm_maskz_gf2p8affineinv_epi64_epi8(__mmask16, __m128i, __m128i, int);
__m256i _mm256_gf2p8affineinv_epi64_epi8(__m256i, __m256i, int);
__m256i _mm256_mask_gf2p8affineinv_epi64_epi8(__m256i, __mmask32, __m256i, __m256i, int);
__m256i _mm256_maskz_gf2p8affineinv_epi64_epi8(__mmask32, __m256i, __m256i, int);
__m512i _mm512_gf2p8affineinv_epi64_epi8(__m512i, __m512i, int);
__m512i _mm512_mask_gf2p8affineinv_epi64_epi8(__m512i, __mmask64, __m512i, __m512i, int);
__m512i _mm512_maskz_gf2p8affineinv_epi64_epi8(__mmask64, __m512i, __m512i, int);
// VGF2P8MULB
__m128i _mm_gf2p8mul_epi8(__m128i, __m128i);
__m128i _mm_mask_gf2p8mul_epi8(__m128i, __mmask16, __m128i, __m128i);
__m128i _mm_maskz_gf2p8mul_epi8(__mmask16, __m128i, __m128i);
__m256i _mm256_gf2p8mul_epi8(__m256i, __m256i);
__m256i _mm256_mask_gf2p8mul_epi8(__m256i, __mmask32, __m256i, __m256i);
__m256i _mm256_maskz_gf2p8mul_epi8(__mmask32, __m256i, __m256i);
__m512i _mm512_gf2p8mul_epi8(__m512i, __m512i);
__m512i _mm512_mask_gf2p8mul_epi8(__m512i, __mmask64, __m512i, __m512i);
__m512i _mm512_maskz_gf2p8mul_epi8(__mmask64, __m512i, __m512i);

Bibliography[edit]