From WikiChip
Difference between revisions of "x86/avx512 vnni"
(AVX512 VNNI) |
|||
Line 1: | Line 1: | ||
{{x86 title|AVX512 VNNI}} | {{x86 title|AVX512 VNNI}} | ||
'''{{x86|AVX-512}} Vector Neural Network Instructions''' ('''AVX512 VNNI''') is an [[x86]] extension, part of the {{x86|AVX-512}}, designed to accelerate [[convolutional neural network]]-based [[algorithms]]. | '''{{x86|AVX-512}} Vector Neural Network Instructions''' ('''AVX512 VNNI''') is an [[x86]] extension, part of the {{x86|AVX-512}}, designed to accelerate [[convolutional neural network]]-based [[algorithms]]. | ||
+ | |||
+ | == Overview == | ||
+ | The '''AVX512 VNNI''' [[x86]] {{x86|extension}} extends {{x86|AVX512F|AVX-512 Foundation}} by introducing four new instructions for accelerating inner [[convolutional neural network]] loops. | ||
+ | |||
+ | * <code>VPDPBUSD</code> - Multiplies the individual bytes (8-bit) of the first source operand by the corresponding bytes (8-bit) of the second source operand, producing intermediate word (16-bit) results which are summed and accumulated in the double word (32-bit) of the destination operand. | ||
+ | * <code>VPDPBUSDS</code> - Same as above except on intermediate sum overflow which saturates to 0x7FFF_FFFF/0x8000_0000 for positive/negative numbers. | ||
+ | * <code>VPDPWSSD</code> - Multiplies the individual words (16-bit) of the first source operand by the corresponding word (16-bit) of the second source operand, producing intermediate word results which are summed and accumulated in the double word (32-bit) of the destination operand. | ||
+ | * <code>VPDPWSSDS</code> - Same as above except on intermediate sum overflow which saturates to 0x7FFF_FFFF/0x8000_0000 for positive/negative numbers. | ||
[[category:x86]] | [[category:x86]] |
Revision as of 23:56, 6 November 2018
AVX-512 Vector Neural Network Instructions (AVX512 VNNI) is an x86 extension, part of the AVX-512, designed to accelerate convolutional neural network-based algorithms.
Overview
The AVX512 VNNI x86 extension extends AVX-512 Foundation by introducing four new instructions for accelerating inner convolutional neural network loops.
-
VPDPBUSD
- Multiplies the individual bytes (8-bit) of the first source operand by the corresponding bytes (8-bit) of the second source operand, producing intermediate word (16-bit) results which are summed and accumulated in the double word (32-bit) of the destination operand. -
VPDPBUSDS
- Same as above except on intermediate sum overflow which saturates to 0x7FFF_FFFF/0x8000_0000 for positive/negative numbers. -
VPDPWSSD
- Multiplies the individual words (16-bit) of the first source operand by the corresponding word (16-bit) of the second source operand, producing intermediate word results which are summed and accumulated in the double word (32-bit) of the destination operand. -
VPDPWSSDS
- Same as above except on intermediate sum overflow which saturates to 0x7FFF_FFFF/0x8000_0000 for positive/negative numbers.