From WikiChip
Editing samsung/microarchitectures/m3
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 2: | Line 2: | ||
{{microarchitecture | {{microarchitecture | ||
|atype=CPU | |atype=CPU | ||
− | |name= | + | |name=Mongoose 3 |
|designer=Samsung | |designer=Samsung | ||
|manufacturer=Samsung | |manufacturer=Samsung | ||
Line 33: | Line 33: | ||
|successor link=samsung/microarchitectures/m4 | |successor link=samsung/microarchitectures/m4 | ||
}} | }} | ||
− | '''Exynos | + | '''Exynos Mongoose 3''' ('''M3''') is the successor to the {{\\|Mongoose 2}}, a [[10 nm]] [[ARM]] microarchitecture designed by [[Samsung]] for their consumer electronics. |
== History == | == History == | ||
Line 114: | Line 114: | ||
=== Memory Hierarchy === | === Memory Hierarchy === | ||
* Cache | * Cache | ||
− | ** L1I | + | ** L1I Cache |
*** 64 KiB, 4-way set associative | *** 64 KiB, 4-way set associative | ||
**** 128 B line size | **** 128 B line size | ||
Line 236: | Line 236: | ||
In the prior generations, the M1 was capable of a single 128-bit load each cycle and a single 128-bit store each cycle. The M3 supports two 128-bit loads each cycle and one 128-bit store per cycle. Note that both operations can be done at the same cycle. With the [[floating-point]] stores in [[ARM]], the M3 can match and load-store bandwidth in many copy scenarios. Despite doubling the cache size, the level 1 data cache still maintains a 4-cycle load latency and can support 12 outstanding misses to the [[L2]] hierarchy. Additionally, the M3 LSU schedulers are larger and the store buffer was doubled in capacity. | In the prior generations, the M1 was capable of a single 128-bit load each cycle and a single 128-bit store each cycle. The M3 supports two 128-bit loads each cycle and one 128-bit store per cycle. Note that both operations can be done at the same cycle. With the [[floating-point]] stores in [[ARM]], the M3 can match and load-store bandwidth in many copy scenarios. Despite doubling the cache size, the level 1 data cache still maintains a 4-cycle load latency and can support 12 outstanding misses to the [[L2]] hierarchy. Additionally, the M3 LSU schedulers are larger and the store buffer was doubled in capacity. | ||
− | The M3 has a [[multi-stride]] prefetcher which allows it to detect patterns and start the fetching request ahead of execution. There is also some stream/copy optimizations as well which accelerate certain observable traffic patterns. In the M3, Samsung added new stream and copy optimizations | + | The M3 has a [[multi-stride]] prefetcher which allows it to detect patterns and start the fetching request ahead of execution. There is also some stream/copy optimizations as well which accelerate certain observable traffic patterns. In the M3, Samsung added new stream and copy optimizations. They also added new patterns to the prefetcher in order to address addition scenarios. |
As with the M1, the [[dTLB]] is still 32 entries which remain considerably smaller than the 512-entry [[iTLB]]. The reason remains similar to that of the M1 which is because the front-end is designed with a lot more room in mind as far as handling a larger TLB capacity natively in its pipeline. It's also physically laid out much further on the floor plan. This allows the L2 TLB to service the dTLB more aggressively. The TLB on the M3 has been slightly remapped. In addition to the 32-entry primary dTLB, there is a new mid-level dTLB that's 512-entry deep. The second-level STLB has also quadrupled in size from 1K in the prior generation to 4K on the M3. | As with the M1, the [[dTLB]] is still 32 entries which remain considerably smaller than the 512-entry [[iTLB]]. The reason remains similar to that of the M1 which is because the front-end is designed with a lot more room in mind as far as handling a larger TLB capacity natively in its pipeline. It's also physically laid out much further on the floor plan. This allows the L2 TLB to service the dTLB more aggressively. The TLB on the M3 has been slightly remapped. In addition to the 32-entry primary dTLB, there is a new mid-level dTLB that's 512-entry deep. The second-level STLB has also quadrupled in size from 1K in the prior generation to 4K on the M3. | ||
Line 305: | Line 305: | ||
== Bibliography == | == Bibliography == | ||
− | * {{ | + | * {{hcbib|30}} |
* LLVM: lib/Target/AArch64/AArch64SchedExynosM3.td | * LLVM: lib/Target/AArch64/AArch64SchedExynosM3.td |
Facts about "Exynos M3 - Microarchitectures - Samsung"
codename | Meerkat + |
core count | 4 + |
designer | Samsung + |
first launched | 2018 + |
full page name | samsung/microarchitectures/m3 + |
instance of | microarchitecture + |
instruction set architecture | ARMv8 + |
manufacturer | Samsung + |
microarchitecture type | CPU + |
name | Meerkat + |
pipeline stages | 16 + |
process | 10 nm (0.01 μm, 1.0e-5 mm) + |