Latest revision |
Your text |
Line 1: |
Line 1: |
| {{title|Neural Processor}} | | {{title|Neural Processor}} |
− | A '''neural processor''', a '''neural processing unit''' ('''NPU'''), or simply an AI Accelerator is a [[application-specific microprocessor|specialized]] circuit that implements all the necessary control and arithmetic logic necessary to execute [[machine learning]] algorithms, typically by operating on [[predictive models]] such as [[artificial neural network]]s (ANNs) or [[random forest]]s (RFs). | + | A '''neural processor''' or a '''neural processing unit''' ('''NPU''') is a [[microprocessor]] that [[application-specific microprocessor|specializes]] in the [[hardware acceleration|acceleration]] of [[machine learning]] algorithms, typically by operating on [[predictive models]] such as [[artificial neural network]]s (ANNs) or [[random forest]]s (RFs). |
| | | |
| NPUs sometimes go by similar names such as a ''tensor processing unit'' (''TPU''), ''neural network processor'' (''NNP'') and ''intelligence processing unit'' (''IPU'') as well as ''vision processing unit'' (''VPU'') and ''graph processing unit'' (''GPU''). | | NPUs sometimes go by similar names such as a ''tensor processing unit'' (''TPU''), ''neural network processor'' (''NNP'') and ''intelligence processing unit'' (''IPU'') as well as ''vision processing unit'' (''VPU'') and ''graph processing unit'' (''GPU''). |
− |
| |
− | == Motivation ==
| |
− | Executing [[deep neural networks]] such as [[convolutional neural networks]] means performing a very large amount of [[multiply-accumulate operations]], typically in the billions and trillions of iterations. The large number of iterations comes from the fact that for each given input (e.g., image), a single convolution comprises of iterating over every channel, and then every pixel, and then performing a very large number of MAC operations. Many such convolutions are found in a single model and the model itself must be executed on each new input (e.g., every camera frame capture).
| |
− |
| |
− | Unlike traditional [[central processing units]] which are great at processing highly serialized instruction streams, machine learning workloads tend to be highly parallelizable, much like a [[graphics processing unit]]. Moreover, unlike a GPU, NPUs can benefit from vastly simpler logic because their workloads tend to exhibit high regularity in the computational patterns of [[deep neural networks]]. For those reasons, many custom-designed dedicated neural processors have been developed.
| |
| | | |
| == Overview == | | == Overview == |
− | A neural processing unit (NPU) is a well-partitioned circuit that comprises all the control and arithmetic logic components necessary to execute [[machine learning]] algorithms. NPUs are designed to accelerate the performance of common machine learning tasks such as image classification, machine translation, object detection, and various other predictive models. NPUs may be part of a large SoC, a plurality of NPUs may be instantiated on a single chip, or they may be part of a dedicated neural-network accelerator.
| |
− |
| |
− | === Classification ===
| |
− | Generally speaking, NPUs are classified as either ''training'' or ''inference''. For chips that are capable of performing both operations, the two phases are still generally performed independently.
| |
− |
| |
− | * '''Training''' - NPUs designed to accelerate training are designed to accelerate the curating of new models. This is a highly compute-intensive operation that involves inputting an existing dataset (typically tagged) and iterating over the dataset, adjusting model weights and biases in order to ensure an ever-more accurate model. Correcting a wrong prediction involves propagating back through the layers of the network and guessing a correction. The process involves guessing again and again until a correct answer is achieved at the desired accuracy.
| |
− |
| |
− | * '''Inference''' - NPUs designed to accelerate inference operate on complete models. Inference accelerators are designed to input a new piece of data (e.g., a new camera shot), process it through the already trained model and generate a result.
| |
− |
| |
− | === Data types ===
| |
| {{empty section}} | | {{empty section}} |
| | | |
Line 27: |
Line 12: |
| | count = 3 | | | count = 3 |
| | | | | |
− | * [[Alibaba]]: Ali-NPU | + | * [[Alibaba]], Ali-NPU |
− | * [[AlphaICs]]: Gluon
| + | * [[Amazon]], {{amazon|AWS Inferentia}} |
− | * [[Amazon]]: {{amazon|AWS Inferentia}} | + | * [[Apple]], Neural Engine |
− | * [[Apple]]: Neural Engine | + | * [[Baidu]], {{baidu|Kunlun}} |
− | * [[AMD]]: AI Engine
| + | * [[Bitmain]], {{bitmain|Sophon}} |
− | * [[Arm]]: {{arm|ML Processor}}
| + | * [[Cambricon]], {{cambricon|MLU}} |
− | * [[Baidu]]: {{baidu|Kunlun}} | + | * [[Flex Logix]], InferX |
− | * [[Bitmain]]: {{bitmain|Sophon}} | + | * [[Google]], {{google|TPU}} |
− | * [[Cambricon]]: {{cambricon|MLU}} | + | * [[Graphcore]], {{graphcore|IPU}} |
− | * [[cerebras|Cerebras]]: CS-1
| + | * [[Groq]], |
− | * [[Flex Logix]]: InferX | + | * [[Hailo]], Hailo-8 |
− | * [[Nepes]]: [[NM500]] ([[General Vision]] tech)
| + | * [[Huawei]], Ascend |
− | * [[GreenWaves]]: {{greenwaves|GAP8}}
| + | * [[Intel]], {{nervana|NNP}}, {{movidius|Myriad}}, {{mobileye|EyeQ}} |
− | * [[Google]]: {{google|TPU}} | + | * [[Kendryte]], K210 |
− | * [[Gyrfalcon Technology]]: Lightspeeur
| + | * [[NationalChip]], Neural Processing Unit (NPU) |
− | * [[Graphcore]]: {{graphcore|IPU}} | + | * [[Nvidia]], {{nvidia|NVDLA|l=arch}}, {{nvidia|Xavier}} |
− | * [[Groq]]: | + | * [[Samsung]], Neural Processing Unit (NPU) |
− | * [[Habana]]: {{habana|HL|HL Series}}
| + | * [[Wave Computing]], DPU |
− | * [[Hailo]]: Hailo-8 | |
− | * [[Huawei]]: Ascend | |
− | * [[Intel]]: {{nervana|NNP}}, {{movidius|Myriad}}, {{mobileye|EyeQ}}, {{intel|GNA}} | |
− | * [[Kendryte]]: K210 | |
− | * [[Mediatek]]: NeuroPilot
| |
− | * [[Mythic]]: {{mythic|IPU}}
| |
− | * [[NationalChip]]: Neural Processing Unit (NPU) | |
− | * [[NEC]]: {{nec|SX-Aurora}} (VPU)
| |
− | * [[Nvidia]]: {{nvidia|NVDLA|l=arch}}, {{nvidia|Xavier}} | |
− | * [[Qualcomm]]: Hexagon
| |
− | * [[Quadric]]: Chimera General Purpose NPU (GPNPU)
| |
− | * [[Samsung]]: Neural Processing Unit (NPU) | |
− | * [[Rockchip]]: RK3399Pro (NPU)
| |
− | * [[Amlogic]]: Khadas VIM3 (NPU)
| |
− | * [[SiMa.ai]]: Machine Learning System on chip (MLSoC)
| |
− | * [[Synaptics]]: SyNAP (NPU)
| |
− | * [[Tesla (car company)|Tesla]]: {{teslacar|FSD Chip}}
| |
− | * [[Vathys]]
| |
− | * [[Wave Computing]]: DPU | |
− | * [[Brainchip]]: Akida (NPU & NPEs)
| |
− | * [[Syntiant]]: Neural decision processors
| |
| }} | | }} |
| {{expand list}} | | {{expand list}} |