From WikiChip
Difference between revisions of "qualcomm/microarchitectures/cloud ai 100"
< qualcomm

(outline)
(Memory Hierarchy)
 
(2 intermediate revisions by the same user not shown)
Line 27: Line 27:
  
 
=== SoC ===
 
=== SoC ===
 +
:[[File:cloud ai 100 soc.svg|700px]]
  
 
=== AI Core ===
 
=== AI Core ===
 +
:[[File:cloud ai 100 ai core.svg|350px]]
  
 
== Memory Hierarchy ==
 
== Memory Hierarchy ==
 +
* L1D$ / L1I$
 +
** Private per AI Core
 +
* L2
 +
** 1 MiB / AI Core
 +
* Vector Tightly-Coupled Memory (VTCM)
 +
** 8 MiB / AI Core
 +
* DRAM
 +
** 8-32 GiB
 +
*** LPDDR4x-4266
 +
**** 68.25 - 136.5 GB/s
  
 
== Overview ==
 
== Overview ==
Line 37: Line 49:
  
 
== Performance claims ==
 
== Performance claims ==
 +
Performance-per-watt was published by Quall based on an Int8 3×3 convolution operation with uniformly distributed weights and input action comprising 50% zeros which Qualcomm says is typical for Deep CNN with Relu operators. To that end, Qualcomm says the AI 100 can achieve up to ~150 TOPs at ~12 W at over 12 TOPS/W in edge cases and ~363 TOPs at under 70 W at 5.24 TOPs/W in data center uses. Numbers are at the SoC level.
 +
 +
{| class="wikitable"
 +
! SoC Power
 +
| 12.05 W || 19.74 W || 69.26 W
 +
|-
 +
! TOPS
 +
| 149.01 || 196.94 || 363.02
 +
|-
 +
! TOPS/W
 +
| 12.37 || 9.98 || 5.24
 +
|}
  
 
== Bibliography ==
 
== Bibliography ==
 
* Linley Fall Processor Conference 2021
 
* Linley Fall Processor Conference 2021
 
* {{bib|hc|33|Qualcomm}}
 
* {{bib|hc|33|Qualcomm}}

Latest revision as of 06:27, 15 September 2021

Edit Values
Cloud AI 100 µarch
General Info
Arch TypeNPU
DesignerQualcomm
ManufacturerTSMC
IntroductionMarch, 2021
Process7 nm
PE Configs16
Pipeline
TypeVLIW
Decode4-way
Cache
L2 Cache1 MiB/core
Side Cache8 MiB/core

Cloud AI 100 is an NPU microarchitecture designed by Qualcomm for the server and edge market. Those NPUs are sold under the Cloud AI brand.

Process Technology[edit]

The Cloud AI 100 SoC is fabricated on TSMC's 7-nanometer process.

Architecture[edit]

Key Features[edit]

Block Diagram[edit]

SoC[edit]

cloud ai 100 soc.svg

AI Core[edit]

cloud ai 100 ai core.svg

Memory Hierarchy[edit]

  • L1D$ / L1I$
    • Private per AI Core
  • L2
    • 1 MiB / AI Core
  • Vector Tightly-Coupled Memory (VTCM)
    • 8 MiB / AI Core
  • DRAM
    • 8-32 GiB
      • LPDDR4x-4266
        • 68.25 - 136.5 GB/s

Overview[edit]

AI Core[edit]

Performance claims[edit]

Performance-per-watt was published by Quall based on an Int8 3×3 convolution operation with uniformly distributed weights and input action comprising 50% zeros which Qualcomm says is typical for Deep CNN with Relu operators. To that end, Qualcomm says the AI 100 can achieve up to ~150 TOPs at ~12 W at over 12 TOPS/W in edge cases and ~363 TOPs at under 70 W at 5.24 TOPs/W in data center uses. Numbers are at the SoC level.

SoC Power 12.05 W 19.74 W 69.26 W
TOPS 149.01 196.94 363.02
TOPS/W 12.37 9.98 5.24

Bibliography[edit]

  • Linley Fall Processor Conference 2021
  • Qualcomm, IEEE Hot Chips 33 Symposium (HCS) 2021.
codenameCloud AI 100 +
designerQualcomm +
first launchedMarch 2021 +
full page namequalcomm/microarchitectures/cloud ai 100 +
instance ofmicroarchitecture +
manufacturerTSMC +
nameCloud AI 100 +
process7 nm (0.007 μm, 7.0e-6 mm) +
processing element count16 +