Latest revision |
Your text |
Line 1: |
Line 1: |
| {{x86 title|AVX-512 BFloat16 Instructions (BF16)}}{{x86 isa main}} | | {{x86 title|AVX-512 BFloat16 Instructions (BF16)}}{{x86 isa main}} |
| '''{{x86|AVX-512}} BFloat16 Instructions''' ('''AVX512_BF16''') is an [[x86]] extension, part of {{x86|AVX-512}}, designed to accelerate neural network-based [[algorithms]] by performing dot-product on [[bfloat16]]. | | '''{{x86|AVX-512}} BFloat16 Instructions''' ('''AVX512_BF16''') is an [[x86]] extension, part of {{x86|AVX-512}}, designed to accelerate neural network-based [[algorithms]] by performing dot-product on [[bfloat16]]. |
− |
| |
− | == Overview ==
| |
− | The '''AVX512 BF16''' [[x86]] {{x86|extension}} extends {{x86|AVX512F|AVX-512 Foundation}} by introducing three new instructions for converting and operating on [[bfloat16]].
| |
− |
| |
− | * <code>VCVTNE2PS2BF16</code> - Convert two SIMD registers with packed single-precision floating point values to [[bfloat16]] packed in a single register.
| |
− | * <code>VCVTNEPS2BF16</code> - Convert one SIMD register with packed single-precision floating-point values to [[bfloat16]] packed in a single register.
| |
− | * <code>VDPBF16PS</code> - Performs a SIMD dot-product on [[bfloat16]] pairs and accumulates the results into a packaged single-precision register.
| |
− |
| |
− | == Motivation ==
| |
− | See [[bfloat16#Motivation|bfloat16 § Motivation]].
| |
− |
| |
− | == Detection ==
| |
− | Support for these instructions is indicated by the AVX512_BF16 feature flag. 128- and 256-bit vectors are supported if the AVX512VL flag is set as well.
| |
− |
| |
− | {| class="wikitable"
| |
− | ! colspan="2" | {{x86|CPUID}} !! rowspan="2" | Instruction Set
| |
− | |-
| |
− | ! Input !! Output
| |
− | |-
| |
− | | EAX=07H, ECX=0 || EBX[bit 31] || AVX512VL
| |
− | |-
| |
− | | EAX=07H, ECX=1 || EAX[bit 05] || AVX512_BF16
| |
− | |}
| |
− |
| |
− | == Microarchitecture support ==
| |
− | <!-- Wrong/incomplete? Visit https://en.wikichip.org/wiki/Template:avx512_support_matrix -->
| |
− | {{avx512 support matrix|em=VL+BF16}}
| |
− |
| |
− | == Intrinsic functions ==
| |
− | <source lang=c>
| |
− | // VCVTNE2PS2BF16
| |
− | __m128bh _mm_cvtne2ps_pbh (__m128 a, __m128 b);
| |
− | __m128bh _mm_mask_cvtne2ps_pbh (__m128bh src, __mmask8 k, __m128 a, __m128 b);
| |
− | __m128bh _mm_maskz_cvtne2ps_pbh (__mmask8 k, __m128 a, __m128 b);
| |
− | __m256bh _mm256_cvtne2ps_pbh (__m256 a, __m256 b);
| |
− | __m256bh _mm256_mask_cvtne2ps_pbh (__m256bh src, __mmask16 k, __m256 a, __m256 b);
| |
− | __m256bh _mm256_maskz_cvtne2ps_pbh (__mmask16 k, __m256 a, __m256 b);
| |
− | __m512bh _mm512_cvtne2ps_pbh (__m512 a, __m512 b);
| |
− | __m512bh _mm512_mask_cvtne2ps_pbh (__m512bh src, __mmask32 k, __m512 a, __m512 b);
| |
− | __m512bh _mm512_maskz_cvtne2ps_pbh (__mmask32 k, __m512 a, __m512 b);
| |
− | // VCVTNEPS2BF16
| |
− | __m128bh _mm_cvtneps_pbh (__m128 a);
| |
− | __m128bh _mm_mask_cvtneps_pbh (__m128bh src, __mmask8 k, __m128 a);
| |
− | __m128bh _mm_maskz_cvtneps_pbh (__mmask8 k, __m128 a);
| |
− | __m128bh _mm256_cvtneps_pbh (__m256 a);
| |
− | __m128bh _mm256_mask_cvtneps_pbh (__m128bh src, __mmask8 k, __m256 a);
| |
− | __m128bh _mm256_maskz_cvtneps_pbh (__mmask8 k, __m256 a);
| |
− | __m256bh _mm512_cvtneps_pbh (__m512 a);
| |
− | __m256bh _mm512_mask_cvtneps_pbh (__m256bh src, __mmask16 k, __m512 a);
| |
− | __m256bh _mm512_maskz_cvtneps_pbh (__mmask16 k, __m512 a);
| |
− | // VDPBF16PS
| |
− | __m128 _mm_dpbf16_ps (__m128 src, __m128bh a, __m128bh b);
| |
− | __m128 _mm_mask_dpbf16_ps (__m128 src, __mmask8 k, __m128bh a, __m128bh b);
| |
− | __m128 _mm_maskz_dpbf16_ps (__mmask8 k, __m128 src, __m128bh a, __m128bh b);
| |
− | __m256 _mm256_dpbf16_ps (__m256 src, __m256bh a, __m256bh b);
| |
− | __m256 _mm256_mask_dpbf16_ps (__m256 src, __mmask8 k, __m256bh a, __m256bh b);
| |
− | __m256 _mm256_maskz_dpbf16_ps (__mmask8 k, __m256 src, __m256bh a, __m256bh b);
| |
− | __m512 _mm512_dpbf16_ps (__m512 src, __m512bh a, __m512bh b);
| |
− | __m512 _mm512_mask_dpbf16_ps (__m512 src, __mmask16 k, __m512bh a, __m512bh b);
| |
− | __m512 _mm512_maskz_dpbf16_ps (__mmask16 k, __m512 src, __m512bh a, __m512bh b);
| |
− | </source>
| |
− |
| |
− | == See also ==
| |
− | * [[DL Boost]]
| |
− | * [[AVX512_VNNI]]
| |
− |
| |
− | == Bibliography ==
| |
− | * ''Intel Architecture Instruction Set Extensions and Future Features Programming Reference'', Revision 36. (Ref #319433-036)
| |
− |
| |
− | [[Category:x86_extensions]]
| |