Ethos - ARM

Ethos is a family of synthesizable neural processor IPs designed by Arm for IoT and edge applications.

Overview

First introduced in late 2019, Ethos is a family of synthesizable neural processor IPs designed by Arm for various markets. The Ethos family represents the series of NPUs as part of Project Trillium. The underlying microarchitecture for all the Ethos NPUs is the MLP which is designed to be scalable through various configurations based on the SRAM sizes and the number of compute engines.

Members

N-Series

The Ethos-N series was introduced in October 2019. These are fully independent mainstream mobile and IoT NPUs which can be integrated into any SoC just like the Cortex family. These NPUs designed to scale from 1-4 TOPS and from 250 mW up to 1.5 W. Multiple instances of those IPs may be combined using Arm's CCN-500 or CMN-600 interconnects to scale to higher performance.

General		Configuration		SRAM		Compute
NPU	Introduction	Quads	CEs	Bank	Total	MACs	OPS/clk	Int8	Int16
Ethos-N37	Oct 23, 2019	1	4	128 KiB	512 KiB	512 MACs	1024 OPs/clk	1.024 TOPS	256 GOPS
Ethos-N57	Oct 23, 2019	2	8	64 KiB	512 KiB	1024 MACs	2048 OPs/clk	2.048 TOPS	512 GOPS
Ethos-N77	Oct 23, 2019	4	16	64 KiB 256 KiB	1 MiB 4 MiB	2048 MACs	4096 OPs/clk	4.096 TOPS	1.024 TOPS

U-Series

The Ethos-U series was introduced in early 2020. This series targets deeply-embedded AI applications. The 'microNPUs' in this series are not complete NPUs like the Ethos-N series. Instead, they feature a much slimmer design. The Ethos-U series is designed to tightly work with a companion Cortex-M processor such as the Cortex-M55 (The M55 is preferred due to its Helium extension support but but other cores such as M7, M4, and the M33 should do fine). Conceptually, the U series can be thought of as a single-compute engine (CE) design whereby the PLE is removed and instead, relies on using a companion Cortex-M core for the extra processing. Due to the power and area constraints, the dedicated SRAM banks are also removed and instead rely on using the shared SoC (or ideally, the Cortex-M cache) for the weights and activations.

General		Configuration		Compute
NPU	Introduction	SRAM	MACs	OPS/clk	Performance (Int8)
Ethos-U55	February 10	Shared with Cortex	32-256	64-512 OPs/clk	25.6-204.8 GOPS (@ 100-400 MHz, typical on 50/40 nm) 64-512 GOPS (@ 1 GHz, typical on 16/7/5 nm)

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas

Samsung

Contents

Overview

Members

N-Series

U-Series

See also