From WikiChip
Difference between revisions of "samsung/microarchitectures/m4"
< samsung

(Individual Core)
(Core)
Line 94: Line 94:
  
 
== Core ==
 
== Core ==
The core of the M4 is largely the same as {{\\|M3}}.
+
The core of the M4 is largely the same as {{\\|M3}}. A number of buffers have been enlarged and some of the execution units have been reorganized.
  
 
=== Execution engine ===
 
=== Execution engine ===
 
==== Floating-point cluster ====
 
==== Floating-point cluster ====
{{empty section}}
+
The execution units on the M4 have been reorganized. In total, three new units were also added  - a second FP square root unit, a second vector multiplication unit, and a new horizontal vector arithmetic unit.
 +
 
 +
:[[File:m4 fp eu pipes changes.svg|thumb|left|600px|Floating-point pipe changes.]]
 +
 
 +
{{clear}}
 +
 
 
==== Memory subsystem ====
 
==== Memory subsystem ====
 
{{empty section}}
 
{{empty section}}

Revision as of 00:54, 14 January 2019

Edit Values
Mongoose 4 µarch
General Info
Arch TypeCPU
DesignerSamsung
ManufacturerSamsung
Introduction2018
Process8 nm
Instructions
ISAARMv8
Succession

Exynos Mongoose 4 (M4) is the successor to the Mongoose 3, an 8 nm ARM microarchitecture designed by Samsung for their consumer electronics.

Process Technology

The M4 is fabricated on Samsung's 8 nm process (8LPP).

Compiler support

Compiler Arch-Specific Arch-Favorable
GCC -mcpu=exynos-m4 -mtune=exynos-m4
LLVM -mcpu=exynos-m4 -mtune=exynos-m4


Architecture

The M4 is an incremental microarchitecture that brought a die shrink and minor enhancements.

Key changes from M3

  • 8 nm process (from 10 nm)
  • ARMv8.2 (from ARMv8)
    • Support for full FP16 scalar extension
    • Suppot for integer dot product extension
  • Front end
  • Back end
    • LSU reorganized
    • Floating-point execution units reorganized

This list is incomplete; you can help by expanding it.

Block Diagram

Individual Core

mongoose 4 block diagram.svg

Memory Hierarchy

  • Cache
    • L1I Cache
      • 64 KiB, 4-way set associative
        • 128 B line size
        • per core
      • Parity-protected
    • L1D Cache
      • 64 KiB, 8-way set associative
        • 64 B line size
        • per core
      • 4 cycles for fastest load-to-use
      • 32 B/cycle load bandwidth
      • 16 B/cycle store bandwidth
    • L2 Cache
      • 512 KiB, 8-way set associative
      • Inclusive of L1
      • 12 cycles latency
      • 32 B/cycle bandwidth
    • L3 Cache
      • 4 MiB, 16-way set associative
        • 1 MiB slice/core
      • Exlusive of L2
      • ~37-cycle typical (NUCA)
    • BIU
      • 80 outstanding transactions

Mongoose 1 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB).

  • TLBs
    • ITLB
      • 512-entry
    • DTLB
      • 32-entry
      • 512-entry Mid-level DTLB
    • STLB
      • 4,096-entry
      • Per core
  • BPU
    • 4K-entry main BTB
    • 128-entry µBTB
    • 64-entry return stack
    • 16K-entry L2 BTB

Core

The core of the M4 is largely the same as M3. A number of buffers have been enlarged and some of the execution units have been reorganized.

Execution engine

Floating-point cluster

The execution units on the M4 have been reorganized. In total, three new units were also added - a second FP square root unit, a second vector multiplication unit, and a new horizontal vector arithmetic unit.

Floating-point pipe changes.

Memory subsystem

New text document.svg This section is empty; you can help add the missing info by editing this page.

All M3 Processors

 List of M4-based Processors
 Main processorIntegrated Graphics
ModelFamilyLaunchedArchCoresFrequencyGPUFrequency
Count: 0

Bibliography

  • LLVM: lib/Target/AArch64/AArch64SchedExynosM4.td
codenameMongoose 4 +
designerSamsung +
first launched2018 +
full page namesamsung/microarchitectures/m4 +
instance ofmicroarchitecture +
instruction set architectureARMv8 +
manufacturerSamsung +
microarchitecture typeCPU +
nameMongoose 4 +
process8 nm (0.008 μm, 8.0e-6 mm) +