From WikiChip
Difference between revisions of "samsung/microarchitectures/m4"
(→Architecture) |
|||
Line 43: | Line 43: | ||
** Floating-point execution units reorganized | ** Floating-point execution units reorganized | ||
{{expand list}} | {{expand list}} | ||
+ | |||
+ | === Block Diagram === | ||
+ | ==== Individual Core ==== | ||
+ | [[File:m4 block diagram.svg|900px]] | ||
+ | |||
+ | === Memory Hierarchy === | ||
+ | * Cache | ||
+ | ** L1I Cache | ||
+ | *** 64 KiB, 4-way set associative | ||
+ | **** 128 B line size | ||
+ | **** per core | ||
+ | *** Parity-protected | ||
+ | ** L1D Cache | ||
+ | *** 64 KiB, 8-way set associative | ||
+ | **** 64 B line size | ||
+ | **** per core | ||
+ | *** 4 cycles for fastest load-to-use | ||
+ | *** 32 B/cycle load bandwidth | ||
+ | *** 16 B/cycle store bandwidth | ||
+ | ** L2 Cache | ||
+ | *** 512 KiB, 8-way set associative | ||
+ | *** Inclusive of L1 | ||
+ | *** 12 cycles latency | ||
+ | *** 32 B/cycle bandwidth | ||
+ | ** L3 Cache | ||
+ | *** 4 MiB, 16-way set associative | ||
+ | **** 1 MiB slice/core | ||
+ | *** Exlusive of L2 | ||
+ | *** ~37-cycle typical (NUCA) | ||
+ | ** BIU | ||
+ | *** 80 outstanding transactions | ||
+ | |||
+ | Mongoose 1 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB). | ||
+ | |||
+ | * TLBs | ||
+ | ** ITLB | ||
+ | *** 512-entry | ||
+ | ** DTLB | ||
+ | *** 32-entry | ||
+ | *** 512-entry Mid-level DTLB | ||
+ | ** STLB | ||
+ | *** 4,096-entry | ||
+ | *** Per core | ||
+ | |||
+ | * BPU | ||
+ | ** 4K-entry main BTB | ||
+ | ** 128-entry µBTB | ||
+ | ** 64-entry return stack | ||
+ | ** 16K-entry L2 BTB | ||
+ | |||
+ | == Core == | ||
+ | The core of the M4 is largely the same as {{\\|M3}}. | ||
+ | |||
+ | === Execution engine === | ||
+ | ==== Floating-point cluster ==== | ||
+ | {{empty section}} | ||
+ | ==== Memory subsystem ==== | ||
+ | {{empty section}} |
Revision as of 22:39, 13 January 2019
Edit Values | |
Mongoose 4 µarch | |
General Info | |
Arch Type | CPU |
Designer | Samsung |
Manufacturer | Samsung |
Introduction | 2018 |
Process | 8 nm |
Instructions | |
ISA | ARMv8 |
Succession | |
Exynos Mongoose 4 (M4) is the successor to the Mongoose 3, an 8 nm ARM microarchitecture designed by Samsung for their consumer electronics.
Contents
Process Technology
The M4 is fabricated on Samsung's 8 nm process (8LPP).
Compiler support
Compiler | Arch-Specific | Arch-Favorable |
---|---|---|
GCC | -mcpu=exynos-m4 |
-mtune=exynos-m4
|
LLVM | -mcpu=exynos-m4 |
-mtune=exynos-m4
|
Architecture
The M4 is an incremental microarchitecture that brought a die shrink and minor enhancements.
Key changes from M3
- 8 nm process (from 10 nm)
- ARMv8.2 (from ARMv8)
- Support for full FP16 scalar extension
- Suppot for integer dot product extension
- Front end
- Larger instruction queue (48 entries, up from 40)
- Back end
- LSU reorganized
- Floating-point execution units reorganized
This list is incomplete; you can help by expanding it.
Block Diagram
Individual Core
Memory Hierarchy
- Cache
- L1I Cache
- 64 KiB, 4-way set associative
- 128 B line size
- per core
- Parity-protected
- 64 KiB, 4-way set associative
- L1D Cache
- 64 KiB, 8-way set associative
- 64 B line size
- per core
- 4 cycles for fastest load-to-use
- 32 B/cycle load bandwidth
- 16 B/cycle store bandwidth
- 64 KiB, 8-way set associative
- L2 Cache
- 512 KiB, 8-way set associative
- Inclusive of L1
- 12 cycles latency
- 32 B/cycle bandwidth
- L3 Cache
- 4 MiB, 16-way set associative
- 1 MiB slice/core
- Exlusive of L2
- ~37-cycle typical (NUCA)
- 4 MiB, 16-way set associative
- BIU
- 80 outstanding transactions
- L1I Cache
Mongoose 1 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB).
- TLBs
- ITLB
- 512-entry
- DTLB
- 32-entry
- 512-entry Mid-level DTLB
- STLB
- 4,096-entry
- Per core
- ITLB
- BPU
- 4K-entry main BTB
- 128-entry µBTB
- 64-entry return stack
- 16K-entry L2 BTB
Core
The core of the M4 is largely the same as M3.
Execution engine
Floating-point cluster
This section is empty; you can help add the missing info by editing this page. |
Memory subsystem
This section is empty; you can help add the missing info by editing this page. |
Facts about "Exynos M4 - Microarchitectures - Samsung"
codename | Mongoose 4 + |
designer | Samsung + |
first launched | 2018 + |
full page name | samsung/microarchitectures/m4 + |
instance of | microarchitecture + |
instruction set architecture | ARMv8 + |
manufacturer | Samsung + |
microarchitecture type | CPU + |
name | Mongoose 4 + |
process | 8 nm (0.008 μm, 8.0e-6 mm) + |