(→Memory Hierarchy) |
(→All M4 Processors: Show cTDP with frequency.) |
||
(16 intermediate revisions by 7 users not shown) | |||
Line 2: | Line 2: | ||
{{microarchitecture | {{microarchitecture | ||
|atype=CPU | |atype=CPU | ||
− | |name= | + | |name=Cheetah |
|designer=Samsung | |designer=Samsung | ||
|manufacturer=Samsung | |manufacturer=Samsung | ||
Line 25: | Line 25: | ||
|l2 per=core | |l2 per=core | ||
|l2 desc=8-way set associative | |l2 desc=8-way set associative | ||
− | |l3= | + | |l3=2 MiB |
|l3 per=cluster | |l3 per=cluster | ||
|l3 desc=16-way set associative | |l3 desc=16-way set associative | ||
Line 33: | Line 33: | ||
|successor link=samsung/microarchitectures/m5 | |successor link=samsung/microarchitectures/m5 | ||
}} | }} | ||
− | '''Exynos | + | '''Exynos M4''' ('''Cheetah''') is the successor to the {{\\|M3}}, an [[8 nm]] [[ARM]] microarchitecture designed by [[Samsung]] for their consumer electronics. |
== Process Technology == | == Process Technology == | ||
Line 56: | Line 56: | ||
* [[ARMv8.2]] (from [[ARMv8]]) | * [[ARMv8.2]] (from [[ARMv8]]) | ||
** Support for full FP16 scalar extension | ** Support for full FP16 scalar extension | ||
− | ** | + | ** Support for integer dot product extension |
* Front end | * Front end | ||
** Larger [[instruction queue]] (48 entries, up from 40) | ** Larger [[instruction queue]] (48 entries, up from 40) | ||
* Back end | * Back end | ||
− | ** LSU | + | ** LSU execution units reorganized |
** Floating-point execution units reorganized | ** Floating-point execution units reorganized | ||
{{expand list}} | {{expand list}} | ||
Line 66: | Line 66: | ||
=== Block Diagram === | === Block Diagram === | ||
==== Individual Core ==== | ==== Individual Core ==== | ||
+ | |||
[[File:mongoose 4 block diagram.svg|900px]] | [[File:mongoose 4 block diagram.svg|900px]] | ||
=== Memory Hierarchy === | === Memory Hierarchy === | ||
* Cache | * Cache | ||
− | ** L1I | + | ** L1I Caches |
*** 64 KiB, 4-way set associative | *** 64 KiB, 4-way set associative | ||
**** 128 B line size | **** 128 B line size | ||
Line 88: | Line 89: | ||
*** 32 B/cycle bandwidth | *** 32 B/cycle bandwidth | ||
** L3 Cache | ** L3 Cache | ||
− | *** | + | *** 2 MiB, 16-way set associative |
**** 1 MiB slice/core | **** 1 MiB slice/core | ||
*** Exlusive of L2 | *** Exlusive of L2 | ||
Line 95: | Line 96: | ||
*** 80 outstanding transactions | *** 80 outstanding transactions | ||
− | The | + | The M3 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB). |
* TLBs | * TLBs | ||
Line 126: | Line 127: | ||
==== Memory subsystem ==== | ==== Memory subsystem ==== | ||
[[File:m4 data cache.svg|thumb|left]] | [[File:m4 data cache.svg|thumb|left]] | ||
− | + | Samsung also made an enhancement to the M4 memory subsystem. In the M3, there were three AGUs - two dedicated Load [[AGUs]] and a single dedicated Store [[AGU]]. In the M4, Samsung changed one of the dedicated Load [[AGU]]s into a generic AGU capable of handling both loads and stores. In other words, the M4 can now schedule both load and store µOPs on two ports. | |
{{clear}} | {{clear}} | ||
− | == All | + | == All M4 Processors == |
<!-- NOTE: | <!-- NOTE: | ||
This table is generated automatically from the data in the actual articles. | This table is generated automatically from the data in the actual articles. | ||
Line 140: | Line 141: | ||
{{comp table start}} | {{comp table start}} | ||
<table class="comptable sortable tc5 tc6 tc7"> | <table class="comptable sortable tc5 tc6 tc7"> | ||
− | {{comp table header|main| | + | {{comp table header|main|12:List of M4-based Processors}} |
− | {{comp table header|main|5:Main processor|2:Integrated Graphics}} | + | {{comp table header|main|5:Main processor|2:Integrated Graphics|{{abbr|TDP}}|2:TDP down|2:TDP up}} |
− | {{comp table header|cols|Family|Launched|Arch|Cores|%Frequency|GPU|%Frequency}} | + | {{comp table header|cols|Family|Launched|Arch|Cores|%Frequency|GPU|%Frequency|P|P|Frequ.|P|Frequ.}} |
− | {{#ask: [[Category:microprocessor models by samsung]] [[microarchitecture:: | + | {{#ask: [[Category:microprocessor models by samsung]] [[microarchitecture::M4]] |
|?full page name | |?full page name | ||
|?model number | |?model number | ||
Line 153: | Line 154: | ||
|?integrated gpu | |?integrated gpu | ||
|?integrated gpu base frequency | |?integrated gpu base frequency | ||
+ | |?tdp | ||
+ | |?tdp down | ||
+ | |?tdp down frequency#GHz | ||
+ | |?tdp up | ||
+ | |?tdp up frequency#GHz | ||
|format=template | |format=template | ||
|template=proc table 3 | |template=proc table 3 | ||
− | |userparam= | + | |userparam=14 |
|mainlabel=- | |mainlabel=- | ||
|valuesep=, | |valuesep=, | ||
}} | }} | ||
− | {{comp table count|ask=[[Category:microprocessor models by samsung]] [[microarchitecture:: | + | {{comp table count|ask=[[Category:microprocessor models by samsung]] [[microarchitecture::M4]]}} |
</table> | </table> | ||
{{comp table end}} | {{comp table end}} |
Latest revision as of 13:43, 16 March 2023
Edit Values | |
Cheetah µarch | |
General Info | |
Arch Type | CPU |
Designer | Samsung |
Manufacturer | Samsung |
Introduction | 2019 |
Process | 8 nm |
Core Configs | 4 |
Pipeline | |
Type | Superscalar, Superpipeline |
OoOE | Yes |
Speculative | Yes |
Reg Renaming | Yes |
Stages | 16 |
Decode | 6-way |
Instructions | |
ISA | ARMv8.2 |
Cache | |
L1I Cache | 64 KiB/core 4-way set associative |
L1D Cache | 64 KiB/core 8-way set associative |
L2 Cache | 512 KiB/core 8-way set associative |
L3 Cache | 2 MiB/cluster 16-way set associative |
Succession | |
Exynos M4 (Cheetah) is the successor to the M3, an 8 nm ARM microarchitecture designed by Samsung for their consumer electronics.
Contents
Process Technology[edit]
The M4 is fabricated on Samsung's 8 nm process (8LPP).
Compiler support[edit]
Compiler | Arch-Specific | Arch-Favorable |
---|---|---|
GCC | -mcpu=exynos-m4 |
-mtune=exynos-m4
|
LLVM | -mcpu=exynos-m4 |
-mtune=exynos-m4
|
Architecture[edit]
The M4 is an incremental microarchitecture that brought a die shrink and minor enhancements.
Key changes from M3[edit]
- 8 nm process (from 10 nm)
- ARMv8.2 (from ARMv8)
- Support for full FP16 scalar extension
- Support for integer dot product extension
- Front end
- Larger instruction queue (48 entries, up from 40)
- Back end
- LSU execution units reorganized
- Floating-point execution units reorganized
This list is incomplete; you can help by expanding it.
Block Diagram[edit]
Individual Core[edit]
Memory Hierarchy[edit]
- Cache
- L1I Caches
- 64 KiB, 4-way set associative
- 128 B line size
- per core
- Parity-protected
- 64 KiB, 4-way set associative
- L1D Cache
- 64 KiB, 8-way set associative
- 64 B line size
- per core
- 4 cycles for fastest load-to-use
- 32 B/cycle load bandwidth
- 16 B/cycle store bandwidth
- 64 KiB, 8-way set associative
- L2 Cache
- 512 KiB, 8-way set associative
- Inclusive of L1
- 12 cycles latency
- 32 B/cycle bandwidth
- L3 Cache
- 2 MiB, 16-way set associative
- 1 MiB slice/core
- Exlusive of L2
- ~37-cycle typical (NUCA)
- 2 MiB, 16-way set associative
- BIU
- 80 outstanding transactions
- L1I Caches
The M3 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB).
- TLBs
- ITLB
- 512-entry
- DTLB
- 32-entry
- 512-entry Mid-level DTLB
- STLB
- 4,096-entry
- Per core
- ITLB
- BPU
- 4K-entry main BTB
- 128-entry µBTB
- 64-entry return stack
- 16K-entry L2 BTB
Core[edit]
The core of the M4 is largely the same as M3. A number of buffers have been enlarged and some of the execution units have been reorganized.
Execution engine[edit]
Floating-point cluster[edit]
The execution units on the M4 have been reorganized. In total, three new units were also added - a second FP square root unit, a second vector multiplication unit, and a new horizontal vector arithmetic unit.
Memory subsystem[edit]
Samsung also made an enhancement to the M4 memory subsystem. In the M3, there were three AGUs - two dedicated Load AGUs and a single dedicated Store AGU. In the M4, Samsung changed one of the dedicated Load AGUs into a generic AGU capable of handling both loads and stores. In other words, the M4 can now schedule both load and store µOPs on two ports.
All M4 Processors[edit]
List of M4-based Processors | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Main processor | Integrated Graphics | TDP | TDP down | TDP up | ||||||||
Model | Family | Launched | Arch | Cores | Frequency | GPU | Frequency | P | P | Frequ. | P | Frequ. |
9825 | Exynos | 2019 | Cortex-A75, Cortex-A55, M4 | 8 | 2.73 GHz 2,730 MHz , 2.4 GHz2,730,000 kHz 2,400 MHz , 1.95 GHz2,400,000 kHz 1,950 MHz 1,950,000 kHz | Mali-G76 | 754 MHz 0.754 GHz 754,000 KHz | 5 W 5,000 mW 0.00671 hp 0.005 kW | 5 W 5,000 mW 0.00671 hp 0.005 kW | 2.73 GHz 2,730 MHz 2,730,000 kHz | 8 W 8,000 mW 0.0107 hp 0.008 kW | 3.016 GHz 3,016 MHz 3,016,000 kHz |
Count: 1 |
Bibliography[edit]
- LLVM: lib/Target/AArch64/AArch64SchedExynosM4.td
codename | Cheetah + |
core count | 4 + |
designer | Samsung + |
first launched | 2019 + |
full page name | samsung/microarchitectures/m4 + |
instance of | microarchitecture + |
instruction set architecture | ARMv8.2 + |
manufacturer | Samsung + |
microarchitecture type | CPU + |
name | Cheetah + |
pipeline stages | 16 + |
process | 8 nm (0.008 μm, 8.0e-6 mm) + |