From WikiChip
Mongoose 1 (M1) - Microarchitectures - Samsung
< samsung
Revision as of 02:55, 1 February 2018 by David (talk | contribs) (Core Cluster Floorplan)

Edit Values
Mongoose 1 µarch
General Info
Arch TypeCPU
DesignerSamsung
ManufacturerSamsung
Process14 nm
Pipeline
OoOEYes
SpeculativeYes
Reg RenamingYes
Decode4-way
Instructions
ISAARMv8
Cache
L1I Cache64 KiB/core
4-way set associative
L1D Cache32 KiB/core
8-way set associative
L2 Cache2 MiB/cluster
16-way set associative
Succession

Mongoose 1 (M1) is an ARM microarchitecture designed by Samsung for their consumer electronics. This was Samsung's first in-house developed high-performance low-power ARM microarchitecture.

History

The Mongoose 1 (M1) microarchitecture was Samsung's first in-house design which was done entirely from scratch. A design team was assembled and in roughly 3 years, they've gone from requirements to tape-out. The design was done at Samsung's Austin R&D Center (SARC) which was founded in 2010 for the sole purpose of developing high-performance, low-power, complex CPU and System IPs. A large portion of the design team consists of many ex-AMD Austin engineers as well as ex-IBMers.

Process Technology

M1 was fabricated on Samsung's 14 nm process.

Architecture

The M1 is Samsung's first in-house design from scratch.

  • ARM v8.0
  • 2.6 GHz clock frequency
    • 2.3 GHz for multi-core workloads
  • Sub 3-watt/core
  • 14 nm process (FinFET)
  • Core
    • Advanced branch predictor
    • 4-way instruction decode
      • Most instructions map to a single µOP, with a few exceptions
    • 4-way µOP dispatch and retire
    • Out-of-order execution
      • Out-of-order load and stores
    • Multistride/multistream prefetcher
    • Low-latency and low-power caches

Block Diagram

Entire SoC Overview

mongoose 1 soc block diagram.svg

Individual Core

mongoose 1 block diagram.svg

Memory Hierarchy

  • Cache
    • L1I Cache
      • 64 KiB, 4-way set associative
        • 128 B line size
        • per core
    • L1D Cache
      • 32 KiB, 8-way set associative
        • 64 B line size
        • per core
      • 4 cycles for fastest load-to-use
      • 16 B/cycle load bandwidth
      • 16 B/cycle store bandwidth
    • L2 Cache
      • 2 MiB, 16-way set associative
        • 4x banks (512 KiB each)
      • Inclusive of L1
      • 22 cycles latency
      • 16 B/cycle/CPU bandwidth

Mongoose 1 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally there is a unified L2 TLB (STLB).

  • TLBs
    • ITLB
      • 256-entry
    • DTLB
      • 32-entry
    • STLB
      • 1,024-entry

Overview

New text document.svg This section is empty; you can help add the missing info by editing this page.

Core

New text document.svg This section is empty; you can help add the missing info by editing this page.

Die

Core Floorplan

mongoose 1 core floorplan.png

Core Cluster Floorplan

mongoose 1 core cluster.png


mongoose 1 core cluster (annotated).png
codenameMongoose 1 +
designerSamsung +
full page namesamsung/microarchitectures/m1 +
instance ofmicroarchitecture +
instruction set architectureARMv8 +
manufacturerSamsung +
microarchitecture typeCPU +
nameMongoose 1 +
process14 nm (0.014 μm, 1.4e-5 mm) +