(→Individual Core) |
(→Core) |
||
Line 94: | Line 94: | ||
== Core == | == Core == | ||
− | The core of the M4 is largely the same as {{\\|M3}}. | + | The core of the M4 is largely the same as {{\\|M3}}. A number of buffers have been enlarged and some of the execution units have been reorganized. |
=== Execution engine === | === Execution engine === | ||
==== Floating-point cluster ==== | ==== Floating-point cluster ==== | ||
− | {{ | + | The execution units on the M4 have been reorganized. In total, three new units were also added - a second FP square root unit, a second vector multiplication unit, and a new horizontal vector arithmetic unit. |
+ | |||
+ | :[[File:m4 fp eu pipes changes.svg|thumb|left|600px|Floating-point pipe changes.]] | ||
+ | |||
+ | {{clear}} | ||
+ | |||
==== Memory subsystem ==== | ==== Memory subsystem ==== | ||
{{empty section}} | {{empty section}} |
Revision as of 23:54, 13 January 2019
Edit Values | |
Mongoose 4 µarch | |
General Info | |
Arch Type | CPU |
Designer | Samsung |
Manufacturer | Samsung |
Introduction | 2018 |
Process | 8 nm |
Instructions | |
ISA | ARMv8 |
Succession | |
Exynos Mongoose 4 (M4) is the successor to the Mongoose 3, an 8 nm ARM microarchitecture designed by Samsung for their consumer electronics.
Contents
Process Technology
The M4 is fabricated on Samsung's 8 nm process (8LPP).
Compiler support
Compiler | Arch-Specific | Arch-Favorable |
---|---|---|
GCC | -mcpu=exynos-m4 |
-mtune=exynos-m4
|
LLVM | -mcpu=exynos-m4 |
-mtune=exynos-m4
|
Architecture
The M4 is an incremental microarchitecture that brought a die shrink and minor enhancements.
Key changes from M3
- 8 nm process (from 10 nm)
- ARMv8.2 (from ARMv8)
- Support for full FP16 scalar extension
- Suppot for integer dot product extension
- Front end
- Larger instruction queue (48 entries, up from 40)
- Back end
- LSU reorganized
- Floating-point execution units reorganized
This list is incomplete; you can help by expanding it.
Block Diagram
Individual Core
Memory Hierarchy
- Cache
- L1I Cache
- 64 KiB, 4-way set associative
- 128 B line size
- per core
- Parity-protected
- 64 KiB, 4-way set associative
- L1D Cache
- 64 KiB, 8-way set associative
- 64 B line size
- per core
- 4 cycles for fastest load-to-use
- 32 B/cycle load bandwidth
- 16 B/cycle store bandwidth
- 64 KiB, 8-way set associative
- L2 Cache
- 512 KiB, 8-way set associative
- Inclusive of L1
- 12 cycles latency
- 32 B/cycle bandwidth
- L3 Cache
- 4 MiB, 16-way set associative
- 1 MiB slice/core
- Exlusive of L2
- ~37-cycle typical (NUCA)
- 4 MiB, 16-way set associative
- BIU
- 80 outstanding transactions
- L1I Cache
Mongoose 1 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB).
- TLBs
- ITLB
- 512-entry
- DTLB
- 32-entry
- 512-entry Mid-level DTLB
- STLB
- 4,096-entry
- Per core
- ITLB
- BPU
- 4K-entry main BTB
- 128-entry µBTB
- 64-entry return stack
- 16K-entry L2 BTB
Core
The core of the M4 is largely the same as M3. A number of buffers have been enlarged and some of the execution units have been reorganized.
Execution engine
Floating-point cluster
The execution units on the M4 have been reorganized. In total, three new units were also added - a second FP square root unit, a second vector multiplication unit, and a new horizontal vector arithmetic unit.
Memory subsystem
This section is empty; you can help add the missing info by editing this page. |
All M3 Processors
List of M4-based Processors | |||||||
---|---|---|---|---|---|---|---|
Main processor | Integrated Graphics | ||||||
Model | Family | Launched | Arch | Cores | Frequency | GPU | Frequency |
Count: 0 |
Bibliography
- LLVM: lib/Target/AArch64/AArch64SchedExynosM4.td
codename | Mongoose 4 + |
designer | Samsung + |
first launched | 2018 + |
full page name | samsung/microarchitectures/m4 + |
instance of | microarchitecture + |
instruction set architecture | ARMv8 + |
manufacturer | Samsung + |
microarchitecture type | CPU + |
name | Mongoose 4 + |
process | 8 nm (0.008 μm, 8.0e-6 mm) + |