(Adding Mongoose 4 following Samsung's patch) |
(fixed) |
||
| (32 intermediate revisions by 8 users not shown) | |||
| Line 1: | Line 1: | ||
| − | {{samsung title| | + | {{samsung title|Exynos M4|arch}} |
{{microarchitecture | {{microarchitecture | ||
|atype=CPU | |atype=CPU | ||
| − | |name= | + | |name=Exynos M4 (Cheetah) |
|designer=Samsung | |designer=Samsung | ||
|manufacturer=Samsung | |manufacturer=Samsung | ||
|introduction=2019 | |introduction=2019 | ||
| − | |process= | + | |process=8 nm |
| − | |isa=ARMv8 | + | |cores=4 |
| − | |predecessor= | + | |type=Superscalar |
| − | |predecessor link=samsung/microarchitectures/ | + | |type 2=Superpipeline |
| − | |successor= | + | |oooe=Yes |
| − | |successor link=samsung/microarchitectures/ | + | |speculative=Yes |
| + | |renaming=Yes | ||
| + | |stages=16 | ||
| + | |decode=6-way | ||
| + | |isa=ARMv8.2 | ||
| + | |l1i=64 KiB | ||
| + | |l1i per=core | ||
| + | |l1i desc=4-way set associative | ||
| + | |l1d=64 KiB | ||
| + | |l1d per=core | ||
| + | |l1d desc=8-way set associative | ||
| + | |l2=512 KiB | ||
| + | |l2 per=core | ||
| + | |l2 desc=8-way set associative | ||
| + | |l3=2 MiB | ||
| + | |l3 per=cluster | ||
| + | |l3 desc=16-way set associative | ||
| + | |predecessor=M3 (Meerkat) | ||
| + | |predecessor link=samsung/microarchitectures/m3 | ||
| + | |successor=M5 (Lion) | ||
| + | |successor link=samsung/microarchitectures/m5 | ||
}} | }} | ||
| − | ''' | + | '''Exynos M4''' ('''Cheetah''') <aka ''{{\\|Mongoose 4}}'' > is the successor to the [[Exynos]] {{\\|M3}} (Meerkat) <aka ''{{\\|Mongoose 3}}'' >, an [[8 nm]] [[ARM]] microarchitecture designed by [[Samsung]] for their consumer electronics. |
| + | == Process Technology == | ||
| + | The M4 is fabricated on Samsung's [[8 nm process]] (8LPP). | ||
| − | { | + | == Compiler support == |
| + | {| class="wikitable" | ||
| + | |- | ||
| + | ! Compiler !! Arch-Specific || Arch-Favorable | ||
| + | |- | ||
| + | | [[GCC]] || <code>-mcpu=exynos-m4</code> || <code>-mtune=exynos-m4</code> | ||
| + | |- | ||
| + | | [[LLVM]] || <code>-mcpu=exynos-m4</code> || <code>-mtune=exynos-m4</code> | ||
| + | |} | ||
== Architecture == | == Architecture == | ||
| − | + | The M4 is an incremental microarchitecture that brought a die shrink and minor enhancements. | |
| − | === Key changes from {{\\|Mongoose 3|M3}} === | + | |
| − | {{ | + | === Key changes from {{\\|Mongoose 3|M3}} (Meerkat) === |
| + | * [[8 nm process]] (from [[10 nm]]) | ||
| + | * [[ARMv8.2]] (from [[ARMv8]]) | ||
| + | ** Support for full FP16 scalar extension | ||
| + | ** Support for integer dot product extension | ||
| + | * Front end | ||
| + | ** Larger [[instruction queue]] (48 entries, up from 40) | ||
| + | * Back end | ||
| + | ** LSU execution units reorganized | ||
| + | ** Floating-point execution units reorganized | ||
| + | {{expand list}} | ||
| + | |||
| + | === Block Diagram === | ||
| + | ==== Individual Core ==== | ||
| + | |||
| + | [[File:mongoose 4 block diagram.svg|900px]] | ||
| + | |||
| + | === Memory Hierarchy === | ||
| + | {| border="0" cellpadding="5" width="100%" | ||
| + | |- | ||
| + | |width="50%" valign="top" align="left"| | ||
| + | * Cache | ||
| + | ** L1I Caches | ||
| + | *** 64 KiB, 4-way set associative | ||
| + | **** 128 B line size, per core | ||
| + | *** Parity-protected | ||
| + | ** L1D Cache | ||
| + | *** 64 KiB, 8-way set associative | ||
| + | **** 64 B line size, per core | ||
| + | *** 4 cycles for fastest load-to-use | ||
| + | *** 32 B/cycle load bandwidth | ||
| + | *** 16 B/cycle store bandwidth | ||
| + | ** L2 Cache | ||
| + | *** 512 KiB, 8-way set associative | ||
| + | *** Inclusive of L1 | ||
| + | *** 12 cycles latency | ||
| + | *** 32 B/cycle bandwidth | ||
| + | ** L3 Cache | ||
| + | *** 2 MiB, 16-way set associative | ||
| + | **** 1 MiB slice/core | ||
| + | *** Exlusive of L2 | ||
| + | *** ~37-cycle typical (NUCA) | ||
| + | ** BIU | ||
| + | *** 80 outstanding transactions | ||
| + | |width="50%" valign="top" align="left"| | ||
| + | The M3 TLB consists of dedicated L1 TLB for instruction <br>cache (ITLB) and another one for data cache (DTLB). <br>Additionally, there is a unified L2 TLB (STLB). | ||
| + | |||
| + | * TLBs | ||
| + | ** ITLB | ||
| + | *** 512-entry | ||
| + | ** DTLB | ||
| + | *** 32-entry | ||
| + | *** 512-entry Mid-level DTLB | ||
| + | ** STLB | ||
| + | *** 4,096-entry, per core | ||
| + | |||
| + | * BPU | ||
| + | ** 4K-entry main BTB | ||
| + | ** 128-entry µBTB | ||
| + | ** 64-entry return stack | ||
| + | ** 16K-entry L2 BTB | ||
| + | |} | ||
| + | |||
| + | == Core == | ||
| + | The core of the M4 is largely the same as {{\\|M3}}. A number of buffers have been enlarged and some of the execution units have been reorganized. | ||
| + | |||
| + | === Execution engine === | ||
| + | ==== Floating-point cluster ==== | ||
| + | The execution units on the M4 have been reorganized. In total, three new units were also added - a second FP square root unit, | ||
| + | :a second vector multiplication unit, and a new horizontal vector arithmetic unit. | ||
| + | |||
| + | :[[File:m4 fp eu pipes changes.svg|thumb|left|600px|Floating-point pipe changes.]] | ||
| + | |||
| + | {{clear}} | ||
| + | |||
| + | ==== Memory subsystem ==== | ||
| + | [[File:m4 data cache.svg|thumb|right]] | ||
| + | [[Samsung]] also made an enhancement to the M4 memory subsystem. In the {{\\|M3}}, there were three AGUs - two dedicated ''Load AGUs'' and a single dedicated ''Store AGU''. In the M4, Samsung changed one of the dedicated ''Load AGUs'' into a generic AGU capable of handling both loads and stores. In other words, the M4 can now schedule both load and store µOPs on two ports. | ||
| + | |||
| + | {{clear}} | ||
| + | |||
| + | == All M4 Processors == | ||
| + | <!-- NOTE: | ||
| + | This table is generated automatically from the data in the actual articles. | ||
| + | If a microprocessor is missing from the list, an appropriate article for it needs to be | ||
| + | created and tagged accordingly. | ||
| + | |||
| + | Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips | ||
| + | --> | ||
| + | {{comp table start}} | ||
| + | <table class="comptable sortable tc5 tc6 tc7"> | ||
| + | {{comp table header|main|12:List of M4-based Processors}} | ||
| + | {{comp table header|main|5:Main processor|2:Integrated Graphics|{{abbr|TDP}}|2:TDP down|2:TDP up}} | ||
| + | {{comp table header|cols|Family|Launched|Arch|Cores|%Frequency|GPU|%Frequency|P|P|Frequ.|P|Frequ.}} | ||
| + | {{#ask: [[Category:microprocessor models by samsung]] [[microarchitecture::~*M4*||Mongoose 4||Exynos 4]] | ||
| + | |?full page name | ||
| + | |?model number | ||
| + | |?family | ||
| + | |?first launched | ||
| + | |?microarchitecture | ||
| + | |?core count | ||
| + | |?base frequency#GHz | ||
| + | |?integrated gpu | ||
| + | |?integrated gpu base frequency | ||
| + | |?tdp | ||
| + | |?tdp down | ||
| + | |?tdp down frequency#GHz | ||
| + | |?tdp up | ||
| + | |?tdp up frequency#GHz | ||
| + | |format=template | ||
| + | |template=proc table 3 | ||
| + | |userparam=14 | ||
| + | |mainlabel=- | ||
| + | |valuesep=, | ||
| + | }} | ||
| + | {{comp table count|ask=[[Category:microprocessor models by samsung]] [[microarchitecture::~*M4*||Mongoose 4||Exynos 4]]}} | ||
| + | </table> | ||
| + | {{comp table end}} | ||
| + | |||
| + | == Bibliography == | ||
| + | * LLVM: lib/Target/AArch64/AArch64SchedExynosM4.td | ||
Latest revision as of 13:06, 22 January 2026
| Edit Values | |
| Exynos M4 (Cheetah) µarch | |
| General Info | |
| Arch Type | CPU |
| Designer | Samsung |
| Manufacturer | Samsung |
| Introduction | 2019 |
| Process | 8 nm |
| Core Configs | 4 |
| Pipeline | |
| Type | Superscalar, Superpipeline |
| OoOE | Yes |
| Speculative | Yes |
| Reg Renaming | Yes |
| Stages | 16 |
| Decode | 6-way |
| Instructions | |
| ISA | ARMv8.2 |
| Cache | |
| L1I Cache | 64 KiB/core 4-way set associative |
| L1D Cache | 64 KiB/core 8-way set associative |
| L2 Cache | 512 KiB/core 8-way set associative |
| L3 Cache | 2 MiB/cluster 16-way set associative |
| Succession | |
Exynos M4 (Cheetah) <aka Mongoose 4 > is the successor to the Exynos M3 (Meerkat) <aka Mongoose 3 >, an 8 nm ARM microarchitecture designed by Samsung for their consumer electronics.
Contents
Process Technology[edit]
The M4 is fabricated on Samsung's 8 nm process (8LPP).
Compiler support[edit]
| Compiler | Arch-Specific | Arch-Favorable |
|---|---|---|
| GCC | -mcpu=exynos-m4 |
-mtune=exynos-m4
|
| LLVM | -mcpu=exynos-m4 |
-mtune=exynos-m4
|
Architecture[edit]
The M4 is an incremental microarchitecture that brought a die shrink and minor enhancements.
Key changes from M3 (Meerkat)[edit]
- 8 nm process (from 10 nm)
- ARMv8.2 (from ARMv8)
- Support for full FP16 scalar extension
- Support for integer dot product extension
- Front end
- Larger instruction queue (48 entries, up from 40)
- Back end
- LSU execution units reorganized
- Floating-point execution units reorganized
This list is incomplete; you can help by expanding it.
Block Diagram[edit]
Individual Core[edit]
Memory Hierarchy[edit]
|
The M3 TLB consists of dedicated L1 TLB for instruction
|
Core[edit]
The core of the M4 is largely the same as M3. A number of buffers have been enlarged and some of the execution units have been reorganized.
Execution engine[edit]
Floating-point cluster[edit]
The execution units on the M4 have been reorganized. In total, three new units were also added - a second FP square root unit,
- a second vector multiplication unit, and a new horizontal vector arithmetic unit.
Memory subsystem[edit]
Samsung also made an enhancement to the M4 memory subsystem. In the M3, there were three AGUs - two dedicated Load AGUs and a single dedicated Store AGU. In the M4, Samsung changed one of the dedicated Load AGUs into a generic AGU capable of handling both loads and stores. In other words, the M4 can now schedule both load and store µOPs on two ports.
All M4 Processors[edit]
| List of M4-based Processors | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Main processor | Integrated Graphics | TDP | TDP down | TDP up | ||||||||
| Model | Family | Launched | Arch | Cores | Frequency | GPU | Frequency | P | P | Frequ. | P | Frequ. |
| 9820 | Exynos | January 2019 | Cortex-A75, Cortex-A55, Exynos M4 | 8 | Mali-G76 | |||||||
| 9825 | Exynos | 2019 | Cortex-A75, Cortex-A55, Mongoose 4 | 8 | 2.73 GHz 2,730 MHz , 2.4 GHz2,730,000 kHz 2,400 MHz , 1.95 GHz2,400,000 kHz 1,950 MHz 1,950,000 kHz | Mali-G76 | 754 MHz 0.754 GHz 754,000 KHz | 5 W 5,000 mW 0.00671 hp 0.005 kW | 5 W 5,000 mW 0.00671 hp 0.005 kW | 2.73 GHz 2,730 MHz 2,730,000 kHz | 8 W 8,000 mW 0.0107 hp 0.008 kW | 3.016 GHz 3,016 MHz 3,016,000 kHz |
| Count: 2 | ||||||||||||
Bibliography[edit]
- LLVM: lib/Target/AArch64/AArch64SchedExynosM4.td
| codename | Exynos M4 (Cheetah) + |
| core count | 4 + |
| designer | Samsung + |
| first launched | 2019 + |
| full page name | samsung/microarchitectures/m4 + |
| instance of | microarchitecture + |
| instruction set architecture | ARMv8.2 + |
| manufacturer | Samsung + |
| microarchitecture type | CPU + |
| name | Exynos M4 (Cheetah) + |
| pipeline stages | 16 + |
| process | 8 nm (0.008 μm, 8.0e-6 mm) + |