From WikiChip
Difference between revisions of "qualcomm/microarchitectures/cloud ai 100"
< qualcomm

(cloud ai)
 
(Memory Hierarchy)
 
(4 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
|introduction=March, 2021
 
|introduction=March, 2021
 
|process=7 nm
 
|process=7 nm
 +
|processing elements=16
 
|type=VLIW
 
|type=VLIW
 
|decode=4-way
 
|decode=4-way
 +
|l2=1 MiB
 +
|l2 per=core
 +
|side cache=8 MiB
 +
|side cache per=core
 
}}
 
}}
 
'''Cloud AI 100''' is an [[NPU]] microarchitecture designed by [[Qualcomm]] for the server and edge market. Those NPUs are sold under the {{qualcomm|Cloud AI}} brand.
 
'''Cloud AI 100''' is an [[NPU]] microarchitecture designed by [[Qualcomm]] for the server and edge market. Those NPUs are sold under the {{qualcomm|Cloud AI}} brand.
 +
 +
== Process Technology ==
 +
The Cloud AI 100 SoC is fabricated on TSMC's [[7-nanometer process]].
 +
 +
== Architecture ==
 +
 +
=== Key Features ===
 +
 +
== Block Diagram ==
 +
 +
=== SoC ===
 +
:[[File:cloud ai 100 soc.svg|700px]]
 +
 +
=== AI Core ===
 +
:[[File:cloud ai 100 ai core.svg|350px]]
 +
 +
== Memory Hierarchy ==
 +
* L1D$ / L1I$
 +
** Private per AI Core
 +
* L2
 +
** 1 MiB / AI Core
 +
* Vector Tightly-Coupled Memory (VTCM)
 +
** 8 MiB / AI Core
 +
* DRAM
 +
** 8-32 GiB
 +
*** LPDDR4x-4266
 +
**** 68.25 - 136.5 GB/s
 +
 +
== Overview ==
 +
 +
== AI Core ==
 +
 +
== Performance claims ==
 +
Performance-per-watt was published by Quall based on an Int8 3×3 convolution operation with uniformly distributed weights and input action comprising 50% zeros which Qualcomm says is typical for Deep CNN with Relu operators. To that end, Qualcomm says the AI 100 can achieve up to ~150 TOPs at ~12 W at over 12 TOPS/W in edge cases and ~363 TOPs at under 70 W at 5.24 TOPs/W in data center uses. Numbers are at the SoC level.
 +
 +
{| class="wikitable"
 +
! SoC Power
 +
| 12.05 W || 19.74 W || 69.26 W
 +
|-
 +
! TOPS
 +
| 149.01 || 196.94 || 363.02
 +
|-
 +
! TOPS/W
 +
| 12.37 || 9.98 || 5.24
 +
|}
 +
 +
== Bibliography ==
 +
* Linley Fall Processor Conference 2021
 +
* {{bib|hc|33|Qualcomm}}

Latest revision as of 05:27, 15 September 2021

Edit Values
Cloud AI 100 µarch
General Info
Arch TypeNPU
DesignerQualcomm
ManufacturerTSMC
IntroductionMarch, 2021
Process7 nm
PE Configs16
Pipeline
TypeVLIW
Decode4-way
Cache
L2 Cache1 MiB/core
Side Cache8 MiB/core

Cloud AI 100 is an NPU microarchitecture designed by Qualcomm for the server and edge market. Those NPUs are sold under the Cloud AI brand.

Process Technology[edit]

The Cloud AI 100 SoC is fabricated on TSMC's 7-nanometer process.

Architecture[edit]

Key Features[edit]

Block Diagram[edit]

SoC[edit]

cloud ai 100 soc.svg

AI Core[edit]

cloud ai 100 ai core.svg

Memory Hierarchy[edit]

  • L1D$ / L1I$
    • Private per AI Core
  • L2
    • 1 MiB / AI Core
  • Vector Tightly-Coupled Memory (VTCM)
    • 8 MiB / AI Core
  • DRAM
    • 8-32 GiB
      • LPDDR4x-4266
        • 68.25 - 136.5 GB/s

Overview[edit]

AI Core[edit]

Performance claims[edit]

Performance-per-watt was published by Quall based on an Int8 3×3 convolution operation with uniformly distributed weights and input action comprising 50% zeros which Qualcomm says is typical for Deep CNN with Relu operators. To that end, Qualcomm says the AI 100 can achieve up to ~150 TOPs at ~12 W at over 12 TOPS/W in edge cases and ~363 TOPs at under 70 W at 5.24 TOPs/W in data center uses. Numbers are at the SoC level.

SoC Power 12.05 W 19.74 W 69.26 W
TOPS 149.01 196.94 363.02
TOPS/W 12.37 9.98 5.24

Bibliography[edit]

  • Linley Fall Processor Conference 2021
  • Qualcomm, IEEE Hot Chips 33 Symposium (HCS) 2021.
codenameCloud AI 100 +
designerQualcomm +
first launchedMarch 2021 +
full page namequalcomm/microarchitectures/cloud ai 100 +
instance ofmicroarchitecture +
manufacturerTSMC +
nameCloud AI 100 +
process7 nm (0.007 μm, 7.0e-6 mm) +
processing element count16 +