Exynos M4 - Microarchitectures - Samsung

	Edit Values
	Exynos M4 (Cheetah) µarch
	General Info
Arch Type	CPU
Designer	Samsung
Manufacturer	Samsung
Introduction	2019
Process	8 nm
Core Configs	4
	Pipeline
Type	Superscalar, Superpipeline
OoOE	Yes
Speculative	Yes
Reg Renaming	Yes
Stages	16
Decode	6-way
	Instructions
ISA	ARMv8.2
	Cache
L1I Cache	64 KiB/core; 4-way set associative
L1D Cache	64 KiB/core; 8-way set associative
L2 Cache	512 KiB/core; 8-way set associative
L3 Cache	2 MiB/cluster; 16-way set associative
	Succession
	M3 (Meerkat) M5 (Lion)

Exynos M4 (Cheetah) <aka Mongoose 4 > is the successor to the Exynos M3 (Meerkat) <aka Mongoose 3 >, an 8 nm ARM microarchitecture designed by Samsung for their consumer electronics.

Process Technology[edit]

The M4 is fabricated on Samsung's 8 nm process (8LPP).

Compiler support[edit]

Compiler	Arch-Specific	Arch-Favorable
GCC	`-mcpu=exynos-m4`	`-mtune=exynos-m4`
LLVM	`-mcpu=exynos-m4`	`-mtune=exynos-m4`

Architecture[edit]

The M4 is an incremental microarchitecture that brought a die shrink and minor enhancements.

Key changes from M3 (Meerkat)[edit]

8 nm process (from 10 nm)
ARMv8.2 (from ARMv8)
- Support for full FP16 scalar extension
- Support for integer dot product extension
Front end
- Larger instruction queue (48 entries, up from 40)
Back end
- LSU execution units reorganized
- Floating-point execution units reorganized

This list is incomplete; you can help by expanding it.

Block Diagram[edit]

Individual Core[edit]

Memory Hierarchy[edit]

Cache
- L1I Caches
  - 64 KiB, 4-way set associative
    - 128 B line size, per core
  - Parity-protected
- L1D Cache
  - 64 KiB, 8-way set associative
    - 64 B line size, per core
  - 4 cycles for fastest load-to-use
  - 32 B/cycle load bandwidth
  - 16 B/cycle store bandwidth
- L2 Cache
  - 512 KiB, 8-way set associative
  - Inclusive of L1
  - 12 cycles latency
  - 32 B/cycle bandwidth
- L3 Cache
  - 2 MiB, 16-way set associative
    - 1 MiB slice/core
  - Exlusive of L2
  - ~37-cycle typical (NUCA)
- BIU
  - 80 outstanding transactions

The M3 TLB consists of dedicated L1 TLB for instruction
cache (ITLB) and another one for data cache (DTLB).
Additionally, there is a unified L2 TLB (STLB).

TLBs
- ITLB
  - 512-entry
- DTLB
  - 32-entry
  - 512-entry Mid-level DTLB
- STLB
  - 4,096-entry, per core

BPU
- 4K-entry main BTB
- 128-entry µBTB
- 64-entry return stack
- 16K-entry L2 BTB

Core[edit]

The core of the M4 is largely the same as M3. A number of buffers have been enlarged and some of the execution units have been reorganized.

Execution engine[edit]

Floating-point cluster[edit]

The execution units on the M4 have been reorganized. In total, three new units were also added - a second FP square root unit,

a second vector multiplication unit, and a new horizontal vector arithmetic unit.

Floating-point pipe changes.

Memory subsystem[edit]

Samsung also made an enhancement to the M4 memory subsystem. In the M3, there were three AGUs - two dedicated Load AGUs and a single dedicated Store AGU. In the M4, Samsung changed one of the dedicated Load AGUs into a generic AGU capable of handling both loads and stores. In other words, the M4 can now schedule both load and store µOPs on two ports.

All M4 Processors[edit]

	List of M4-based Processors
	Main processor					Integrated Graphics		TDP	TDP down		TDP up
Model	Family	Launched	Arch	Cores	Frequency	GPU	Frequency	P	P	Frequ.	P	Frequ.
9820	Exynos	January 2019	Cortex-A75, Cortex-A55, Mongoose 4	8	2.73 GHz 2,730 MHz 2,730,000 kHz , 2.31 GHz 2,310 MHz 2,310,000 kHz , 1.95 GHz 1,950 MHz 1,950,000 kHz	Mali-G76	600 MHz 0.6 GHz 600,000 KHz
9825	Exynos	2019	Cortex-A75, Cortex-A55, Mongoose 4	8	2.73 GHz 2,730 MHz 2,730,000 kHz , 2.4 GHz 2,400 MHz 2,400,000 kHz , 1.95 GHz 1,950 MHz 1,950,000 kHz	Mali-G76	754 MHz 0.754 GHz 754,000 KHz	5 W 5,000 mW 0.00671 hp 0.005 kW	5 W 5,000 mW 0.00671 hp 0.005 kW	2.73 GHz 2,730 MHz 2,730,000 kHz	8 W 8,000 mW 0.0107 hp 0.008 kW	3.016 GHz 3,016 MHz 3,016,000 kHz
Count: 2

Bibliography[edit]

LLVM: lib/Target/AArch64/AArch64SchedExynosM4.td

codename	Exynos M4 (Cheetah) +
core count	4 +
designer	Samsung +
first launched	2019 +
full page name	samsung/microarchitectures/m4 +
instance of	microarchitecture +
instruction set architecture	ARMv8.2 +
manufacturer	Samsung +
microarchitecture type	CPU +
name	Exynos M4 (Cheetah) +
pipeline stages	16 +
process	8 nm (0.008 μm, 8.0e-6 mm) +

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple