From WikiChip
Difference between revisions of "neural processor"

(Corrected (kirin ai) to kirin ai)
(Fix spelling of Qualcomm)
 
(51 intermediate revisions by 25 users not shown)
Line 1: Line 1:
 
{{title|Neural Processor}}
 
{{title|Neural Processor}}
A '''neural processor''' or a '''neural processing unit''' ('''NPU''') is a [[microprocessor]] that [[application-specific microprocessor|specializes]] in the [[hardware acceleration|acceleration]] of [[machine learning]] algorithms, typically by operating on [[predictive models]] such as [[artificial neural network]]s (ANNs) or [[random forest]]s (RFs).
+
A '''neural processor''', a '''neural processing unit''' ('''NPU'''), or simply an AI Accelerator is a [[application-specific microprocessor|specialized]] circuit that implements all the necessary control and arithmetic logic necessary to execute [[machine learning]] algorithms, typically by operating on [[predictive models]] such as [[artificial neural network]]s (ANNs) or [[random forest]]s (RFs).
  
 
NPUs sometimes go by similar names such as a ''tensor processing unit'' (''TPU''), ''neural network processor'' (''NNP'') and ''intelligence processing unit'' (''IPU'') as well as ''vision processing unit'' (''VPU'') and ''graph processing unit'' (''GPU'').
 
NPUs sometimes go by similar names such as a ''tensor processing unit'' (''TPU''), ''neural network processor'' (''NNP'') and ''intelligence processing unit'' (''IPU'') as well as ''vision processing unit'' (''VPU'') and ''graph processing unit'' (''GPU'').
 +
 +
== Motivation ==
 +
Executing [[deep neural networks]] such as [[convolutional neural networks]] means performing a very large amount of [[multiply-accumulate operations]], typically in the billions and trillions of iterations. The large number of iterations comes from the fact that for each given input (e.g., image), a single convolution comprises of iterating over every channel, and then every pixel, and then performing a very large number of MAC operations. Many such convolutions are found in a single model and the model itself must be executed on each new input (e.g., every camera frame capture).
 +
 +
Unlike traditional [[central processing units]] which are great at processing highly serialized instruction streams, machine learning workloads tend to be highly parallelizable, much like a [[graphics processing unit]]. Moreover, unlike a GPU, NPUs can benefit from vastly simpler logic because their workloads tend to exhibit high regularity in the computational patterns of [[deep neural networks]]. For those reasons, many custom-designed dedicated neural processors have been developed.
 +
 +
== Overview ==
 +
A neural processing unit (NPU) is a well-partitioned circuit that comprises all the control and arithmetic logic components necessary to execute [[machine learning]] algorithms. NPUs are designed to accelerate the performance of common machine learning tasks such as image classification, machine translation, object detection, and various other predictive models. NPUs may be part of a large SoC, a plurality of NPUs may be instantiated on a single chip, or they may be part of a dedicated neural-network accelerator.
 +
 +
=== Classification ===
 +
Generally speaking, NPUs are classified as either ''training'' or ''inference''. For chips that are capable of performing both operations, the two phases are still generally performed independently.
 +
 +
* '''Training''' - NPUs designed to accelerate training are designed to accelerate the curating of new models. This is a highly compute-intensive operation that involves inputting an existing dataset (typically tagged) and iterating over the dataset, adjusting model weights and biases in order to ensure an ever-more accurate model. Correcting a wrong prediction involves propagating back through the layers of the network and guessing a correction. The process involves guessing again and again until a correct answer is achieved at the desired accuracy.
 +
 +
* '''Inference''' - NPUs designed to accelerate inference operate on complete models. Inference accelerators are designed to input a new piece of data (e.g., a new camera shot), process it through the already trained model and generate a result.
 +
 +
=== Data types ===
 +
{{empty section}}
  
 
== List of machine learning processors ==
 
== List of machine learning processors ==
{| class="wikitable"
+
<!-- Alphabetized -->
! Designer !! NPU
+
{{collist
|-
+
| count = 3
| [[Alibaba]] || Ali-NPU
+
|
|-
+
* [[Alibaba]]: Ali-NPU
| [[Baidu]] || {{baidu|Kunlun}}
+
* [[AlphaICs]]: Gluon
|-
+
* [[Amazon]]: {{amazon|AWS Inferentia}}
| [[Bitmain]] || {{bitmain|Sophon}}
+
* [[Apple]]: Neural Engine
|-
+
* [[AMD]]: AI Engine
| [[Cambricon]] || {{cambricon|MLU}}
+
* [[Arm]]: {{arm|ML Processor}}
|-
+
* [[Baidu]]: {{baidu|Kunlun}}
| [[Google]] || {{google|TPU}}
+
* [[Bitmain]]: {{bitmain|Sophon}}
|-
+
* [[Cambricon]]: {{cambricon|MLU}}
| [[Graphcore]] || {{graphcore|IPU}}
+
* [[cerebras|Cerebras]]: CS-1
|-
+
* [[Flex Logix]]: InferX
| [[Intel]] || {{nervana|NNP}}, {{movidius|Myriad}}, {{mobileye|EyeQ}}
+
* [[Nepes]]: [[NM500]] ([[General Vision]] tech)
|-
+
* [[GreenWaves]]: {{greenwaves|GAP8}}
| [[Nvidia]] || {{nvidia|NVDLA|l=arch}}
+
* [[Google]]: {{google|TPU}}
|-
+
* [[Gyrfalcon Technology]]: Lightspeeur
| [[Huawei]] || {{Kirin Ai}}
+
* [[Graphcore]]: {{graphcore|IPU}}
|-
+
* [[Groq]]:
| [[Apple]] || Neural Engine
+
* [[Habana]]: {{habana|HL|HL Series}}
|}
+
* [[Hailo]]: Hailo-8
 +
* [[Huawei]]: Ascend
 +
* [[Intel]]: {{nervana|NNP}}, {{movidius|Myriad}}, {{mobileye|EyeQ}}, {{intel|GNA}}
 +
* [[Kendryte]]: K210
 +
* [[Mediatek]]: NeuroPilot
 +
* [[Mythic]]: {{mythic|IPU}}
 +
* [[NationalChip]]: Neural Processing Unit (NPU)
 +
* [[NEC]]: {{nec|SX-Aurora}} (VPU)
 +
* [[Nvidia]]: {{nvidia|NVDLA|l=arch}}, {{nvidia|Xavier}}
 +
* [[Qualcomm]]: Hexagon
 +
* [[Quadric]]: Chimera General Purpose NPU (GPNPU)
 +
* [[Samsung]]: Neural Processing Unit (NPU)
 +
* [[Rockchip]]: RK3399Pro (NPU)
 +
* [[Amlogic]]: Khadas VIM3 (NPU)
 +
* [[SiMa.ai]]: Machine Learning System on chip (MLSoC)
 +
* [[Synaptics]]: SyNAP (NPU)
 +
* [[Tesla (car company)|Tesla]]: {{teslacar|FSD Chip}}
 +
* [[Vathys]]
 +
* [[Wave Computing]]: DPU
 +
* [[Brainchip]]: Akida (NPU & NPEs)
 +
* [[Syntiant]]: Neural decision processors
 +
}}
 
{{expand list}}
 
{{expand list}}
  

Latest revision as of 15:27, 26 September 2023

A neural processor, a neural processing unit (NPU), or simply an AI Accelerator is a specialized circuit that implements all the necessary control and arithmetic logic necessary to execute machine learning algorithms, typically by operating on predictive models such as artificial neural networks (ANNs) or random forests (RFs).

NPUs sometimes go by similar names such as a tensor processing unit (TPU), neural network processor (NNP) and intelligence processing unit (IPU) as well as vision processing unit (VPU) and graph processing unit (GPU).

Motivation[edit]

Executing deep neural networks such as convolutional neural networks means performing a very large amount of multiply-accumulate operations, typically in the billions and trillions of iterations. The large number of iterations comes from the fact that for each given input (e.g., image), a single convolution comprises of iterating over every channel, and then every pixel, and then performing a very large number of MAC operations. Many such convolutions are found in a single model and the model itself must be executed on each new input (e.g., every camera frame capture).

Unlike traditional central processing units which are great at processing highly serialized instruction streams, machine learning workloads tend to be highly parallelizable, much like a graphics processing unit. Moreover, unlike a GPU, NPUs can benefit from vastly simpler logic because their workloads tend to exhibit high regularity in the computational patterns of deep neural networks. For those reasons, many custom-designed dedicated neural processors have been developed.

Overview[edit]

A neural processing unit (NPU) is a well-partitioned circuit that comprises all the control and arithmetic logic components necessary to execute machine learning algorithms. NPUs are designed to accelerate the performance of common machine learning tasks such as image classification, machine translation, object detection, and various other predictive models. NPUs may be part of a large SoC, a plurality of NPUs may be instantiated on a single chip, or they may be part of a dedicated neural-network accelerator.

Classification[edit]

Generally speaking, NPUs are classified as either training or inference. For chips that are capable of performing both operations, the two phases are still generally performed independently.

  • Training - NPUs designed to accelerate training are designed to accelerate the curating of new models. This is a highly compute-intensive operation that involves inputting an existing dataset (typically tagged) and iterating over the dataset, adjusting model weights and biases in order to ensure an ever-more accurate model. Correcting a wrong prediction involves propagating back through the layers of the network and guessing a correction. The process involves guessing again and again until a correct answer is achieved at the desired accuracy.
  • Inference - NPUs designed to accelerate inference operate on complete models. Inference accelerators are designed to input a new piece of data (e.g., a new camera shot), process it through the already trained model and generate a result.

Data types[edit]

New text document.svg This section is empty; you can help add the missing info by editing this page.

List of machine learning processors[edit]

This list is incomplete; you can help by expanding it.

See also[edit]