(Exynos M4 is not Mongoose 4, its codenamed as "Cheetah") |
(→All M4 Processors) |
||
| Line 144: | Line 144: | ||
{{comp table header|main|5:Main processor|2:Integrated Graphics}} | {{comp table header|main|5:Main processor|2:Integrated Graphics}} | ||
{{comp table header|cols|Family|Launched|Arch|Cores|%Frequency|GPU|%Frequency}} | {{comp table header|cols|Family|Launched|Arch|Cores|%Frequency|GPU|%Frequency}} | ||
| − | {{#ask: [[Category:microprocessor models by samsung]] [[microarchitecture:: | + | {{#ask: [[Category:microprocessor models by samsung]] [[microarchitecture::M4]] |
|?full page name | |?full page name | ||
|?model number | |?model number | ||
| Line 160: | Line 160: | ||
|valuesep=, | |valuesep=, | ||
}} | }} | ||
| − | {{comp table count|ask=[[Category:microprocessor models by samsung]] [[microarchitecture:: | + | {{comp table count|ask=[[Category:microprocessor models by samsung]] [[microarchitecture::M4]]}} |
</table> | </table> | ||
{{comp table end}} | {{comp table end}} | ||
Revision as of 09:17, 15 February 2020
| Edit Values | |
| Cheetah µarch | |
| General Info | |
| Arch Type | CPU |
| Designer | Samsung |
| Manufacturer | Samsung |
| Introduction | 2019 |
| Process | 8 nm |
| Core Configs | 4 |
| Pipeline | |
| Type | Superscalar, Superpipeline |
| OoOE | Yes |
| Speculative | Yes |
| Reg Renaming | Yes |
| Stages | 16 |
| Decode | 6-way |
| Instructions | |
| ISA | ARMv8.2 |
| Cache | |
| L1I Cache | 64 KiB/core 4-way set associative |
| L1D Cache | 64 KiB/core 8-way set associative |
| L2 Cache | 512 KiB/core 8-way set associative |
| L3 Cache | 2 MiB/cluster 16-way set associative |
| Succession | |
Exynos Cheetah (M4) is the successor to the M3, an 8 nm ARM microarchitecture designed by Samsung for their consumer electronics.
Contents
Process Technology
The M4 is fabricated on Samsung's 8 nm process (8LPP).
Compiler support
| Compiler | Arch-Specific | Arch-Favorable |
|---|---|---|
| GCC | -mcpu=exynos-m4 |
-mtune=exynos-m4
|
| LLVM | -mcpu=exynos-m4 |
-mtune=exynos-m4
|
Architecture
The M4 is an incremental microarchitecture that brought a die shrink and minor enhancements.
Key changes from M3
- 8 nm process (from 10 nm)
- ARMv8.2 (from ARMv8)
- Support for full FP16 scalar extension
- Support for integer dot product extension
- Front end
- Larger instruction queue (48 entries, up from 40)
- Back end
- LSU executiion units reorganized
- Floating-point execution units reorganized
This list is incomplete; you can help by expanding it.
Block Diagram
Individual Core
Memory Hierarchy
- Cache
- L1I Caches
- 64 KiB, 4-way set associative
- 128 B line size
- per core
- Parity-protected
- 64 KiB, 4-way set associative
- L1D Cache
- 64 KiB, 8-way set associative
- 64 B line size
- per core
- 4 cycles for fastest load-to-use
- 32 B/cycle load bandwidth
- 16 B/cycle store bandwidth
- 64 KiB, 8-way set associative
- L2 Cache
- 512 KiB, 8-way set associative
- Inclusive of L1
- 12 cycles latency
- 32 B/cycle bandwidth
- L3 Cache
- 2 MiB, 16-way set associative
- 1 MiB slice/core
- Exlusive of L2
- ~37-cycle typical (NUCA)
- 2 MiB, 16-way set associative
- BIU
- 80 outstanding transactions
- L1I Caches
The M3 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB).
- TLBs
- ITLB
- 512-entry
- DTLB
- 32-entry
- 512-entry Mid-level DTLB
- STLB
- 4,096-entry
- Per core
- ITLB
- BPU
- 4K-entry main BTB
- 128-entry µBTB
- 64-entry return stack
- 16K-entry L2 BTB
Core
The core of the M4 is largely the same as M3. A number of buffers have been enlarged and some of the execution units have been reorganized.
Execution engine
Floating-point cluster
The execution units on the M4 have been reorganized. In total, three new units were also added - a second FP square root unit, a second vector multiplication unit, and a new horizontal vector arithmetic unit.
Memory subsystem
Samsung also made an enhancement to the M4 memory subsystem. In the M3, there were three AGUs - two dedicated Load AGUs and a single dedicated Store AGU. In the M4, Samsung changed one of the dedicated Load AGUs into a generic AGU capable of handling both loads and stores. In other words, the M4 can now schedule both load and store µOPs on two ports.
All M4 Processors
| List of M4-based Processors | |||||||
|---|---|---|---|---|---|---|---|
| Main processor | Integrated Graphics | ||||||
| Model | Family | Launched | Arch | Cores | Frequency | GPU | Frequency |
| Count: 0 | |||||||
Bibliography
- LLVM: lib/Target/AArch64/AArch64SchedExynosM4.td
| codename | Cheetah + |
| core count | 4 + |
| designer | Samsung + |
| first launched | 2019 + |
| full page name | samsung/microarchitectures/m4 + |
| instance of | microarchitecture + |
| instruction set architecture | ARMv8.2 + |
| manufacturer | Samsung + |
| microarchitecture type | CPU + |
| name | Cheetah + |
| pipeline stages | 16 + |
| process | 8 nm (0.008 μm, 8.0e-6 mm) + |