Latest revision |
Your text |
Line 1: |
Line 1: |
− | {{samsung title|Exynos M4|arch}} | + | {{samsung title|Mongoose 4 (M4)|arch}} |
| {{microarchitecture | | {{microarchitecture |
| |atype=CPU | | |atype=CPU |
− | |name=Cheetah | + | |name=Mongoose 4 |
| |designer=Samsung | | |designer=Samsung |
| |manufacturer=Samsung | | |manufacturer=Samsung |
− | |introduction=2019 | + | |introduction=2018 |
| |process=8 nm | | |process=8 nm |
− | |cores=4
| + | |isa=ARMv8 |
− | |type=Superscalar
| |
− | |type 2=Superpipeline
| |
− | |oooe=Yes
| |
− | |speculative=Yes
| |
− | |renaming=Yes
| |
− | |stages=16
| |
− | |decode=6-way
| |
− | |isa=ARMv8.2 | |
− | |l1i=64 KiB
| |
− | |l1i per=core
| |
− | |l1i desc=4-way set associative
| |
− | |l1d=64 KiB
| |
− | |l1d per=core
| |
− | |l1d desc=8-way set associative
| |
− | |l2=512 KiB
| |
− | |l2 per=core
| |
− | |l2 desc=8-way set associative
| |
− | |l3=2 MiB
| |
− | |l3 per=cluster
| |
− | |l3 desc=16-way set associative
| |
| |predecessor=M3 | | |predecessor=M3 |
| |predecessor link=samsung/microarchitectures/m3 | | |predecessor link=samsung/microarchitectures/m3 |
Line 33: |
Line 13: |
| |successor link=samsung/microarchitectures/m5 | | |successor link=samsung/microarchitectures/m5 |
| }} | | }} |
− | '''Exynos M4''' ('''Cheetah''') is the successor to the {{\\|M3}}, an [[8 nm]] [[ARM]] microarchitecture designed by [[Samsung]] for their consumer electronics. | + | '''Mongoose 4''' ('''M4''') is the successor to the {{\\|Mongoose 3}}, an [[8 nm]] [[ARM]] microarchitecture designed by [[Samsung]] for their consumer electronics. |
| | | |
| == Process Technology == | | == Process Technology == |
− | The M4 is fabricated on Samsung's [[8 nm process]] (8LPP). | + | The M4 is fabricated on Samsung's [[8 nm process]]. |
| | | |
| == Compiler support == | | == Compiler support == |
Line 50: |
Line 30: |
| | | |
| == Architecture == | | == Architecture == |
− | The M4 is an incremental microarchitecture that brought a die shrink and minor enhancements.
| + | {{empty section}} |
− | | |
| === Key changes from {{\\|Mongoose 3|M3}} === | | === Key changes from {{\\|Mongoose 3|M3}} === |
− | * [[8 nm process]] (from [[10 nm]])
| + | {{empty section}} |
− | * [[ARMv8.2]] (from [[ARMv8]])
| |
− | ** Support for full FP16 scalar extension
| |
− | ** Support for integer dot product extension
| |
− | * Front end
| |
− | ** Larger [[instruction queue]] (48 entries, up from 40)
| |
− | * Back end
| |
− | ** LSU execution units reorganized
| |
− | ** Floating-point execution units reorganized
| |
− | {{expand list}} | |
− | | |
− | === Block Diagram ===
| |
− | ==== Individual Core ====
| |
− | | |
− | [[File:mongoose 4 block diagram.svg|900px]]
| |
− | | |
− | === Memory Hierarchy ===
| |
− | * Cache
| |
− | ** L1I Caches
| |
− | *** 64 KiB, 4-way set associative
| |
− | **** 128 B line size
| |
− | **** per core
| |
− | *** Parity-protected
| |
− | ** L1D Cache
| |
− | *** 64 KiB, 8-way set associative
| |
− | **** 64 B line size
| |
− | **** per core
| |
− | *** 4 cycles for fastest load-to-use
| |
− | *** 32 B/cycle load bandwidth
| |
− | *** 16 B/cycle store bandwidth
| |
− | ** L2 Cache
| |
− | *** 512 KiB, 8-way set associative
| |
− | *** Inclusive of L1
| |
− | *** 12 cycles latency
| |
− | *** 32 B/cycle bandwidth
| |
− | ** L3 Cache
| |
− | *** 2 MiB, 16-way set associative
| |
− | **** 1 MiB slice/core
| |
− | *** Exlusive of L2
| |
− | *** ~37-cycle typical (NUCA)
| |
− | ** BIU
| |
− | *** 80 outstanding transactions
| |
− | | |
− | The M3 TLB consists of dedicated L1 TLB for instruction cache (ITLB) and another one for data cache (DTLB). Additionally, there is a unified L2 TLB (STLB).
| |
− | | |
− | * TLBs
| |
− | ** ITLB
| |
− | *** 512-entry
| |
− | ** DTLB
| |
− | *** 32-entry
| |
− | *** 512-entry Mid-level DTLB
| |
− | ** STLB
| |
− | *** 4,096-entry
| |
− | *** Per core
| |
− | | |
− | * BPU
| |
− | ** 4K-entry main BTB
| |
− | ** 128-entry µBTB
| |
− | ** 64-entry return stack
| |
− | ** 16K-entry L2 BTB
| |
− | | |
− | == Core ==
| |
− | The core of the M4 is largely the same as {{\\|M3}}. A number of buffers have been enlarged and some of the execution units have been reorganized.
| |
− | | |
− | === Execution engine ===
| |
− | ==== Floating-point cluster ====
| |
− | The execution units on the M4 have been reorganized. In total, three new units were also added - a second FP square root unit, a second vector multiplication unit, and a new horizontal vector arithmetic unit.
| |
− | | |
− | :[[File:m4 fp eu pipes changes.svg|thumb|left|600px|Floating-point pipe changes.]]
| |
− | | |
− | {{clear}}
| |
− | | |
− | ==== Memory subsystem ====
| |
− | [[File:m4 data cache.svg|thumb|left]]
| |
− | Samsung also made an enhancement to the M4 memory subsystem. In the M3, there were three AGUs - two dedicated Load [[AGUs]] and a single dedicated Store [[AGU]]. In the M4, Samsung changed one of the dedicated Load [[AGU]]s into a generic AGU capable of handling both loads and stores. In other words, the M4 can now schedule both load and store µOPs on two ports.
| |
− | | |
− | {{clear}}
| |
− | | |
− | == All M4 Processors ==
| |
− | <!-- NOTE:
| |
− | This table is generated automatically from the data in the actual articles.
| |
− | If a microprocessor is missing from the list, an appropriate article for it needs to be
| |
− | created and tagged accordingly.
| |
− | | |
− | Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
| |
− | -->
| |
− | {{comp table start}}
| |
− | <table class="comptable sortable tc5 tc6 tc7">
| |
− | {{comp table header|main|12:List of M4-based Processors}}
| |
− | {{comp table header|main|5:Main processor|2:Integrated Graphics|{{abbr|TDP}}|2:TDP down|2:TDP up}}
| |
− | {{comp table header|cols|Family|Launched|Arch|Cores|%Frequency|GPU|%Frequency|P|P|Frequ.|P|Frequ.}}
| |
− | {{#ask: [[Category:microprocessor models by samsung]] [[microarchitecture::M4]]
| |
− | |?full page name
| |
− | |?model number
| |
− | |?family
| |
− | |?first launched
| |
− | |?microarchitecture
| |
− | |?core count
| |
− | |?base frequency#GHz
| |
− | |?integrated gpu
| |
− | |?integrated gpu base frequency
| |
− | |?tdp
| |
− | |?tdp down
| |
− | |?tdp down frequency#GHz
| |
− | |?tdp up
| |
− | |?tdp up frequency#GHz
| |
− | |format=template
| |
− | |template=proc table 3
| |
− | |userparam=14
| |
− | |mainlabel=-
| |
− | |valuesep=,
| |
− | }}
| |
− | {{comp table count|ask=[[Category:microprocessor models by samsung]] [[microarchitecture::M4]]}}
| |
− | </table>
| |
− | {{comp table end}}
| |
− | | |
− | == Bibliography ==
| |
− | * LLVM: lib/Target/AArch64/AArch64SchedExynosM4.td
| |