From WikiChip
Editing arm holdings/microarchitectures/mlp
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 25: | Line 25: | ||
== Architecture == | == Architecture == | ||
=== Block diagram === | === Block diagram === | ||
− | + | {{empty section}} | |
− | |||
− | |||
− | |||
== Overview == | == Overview == | ||
Line 36: | Line 33: | ||
The MLP microarchitecture is sold as [[synthesizable]] [[RTL]] IP with various SKUs under the {{armh|Ethos|Ethos family}}. Those SKUs come preconfigured with a fixed number of compute engines (CEs) and cache sizes designed to meet a certain performance level and power envelopes. | The MLP microarchitecture is sold as [[synthesizable]] [[RTL]] IP with various SKUs under the {{armh|Ethos|Ethos family}}. Those SKUs come preconfigured with a fixed number of compute engines (CEs) and cache sizes designed to meet a certain performance level and power envelopes. | ||
− | The MLP is fully statically scheduled by the compiler which takes a given [[neural network]] and maps it to a command stream | + | The MLP is fully statically scheduled by the compiler which takes a given [[neural network]] and maps it to a command stream. The command stream includes that necessary [[DMA]] operations such as the block fetching operations along with the accompanying compute operations. At a very high level, the MLP itself comprises a [[DMA engine]], the network control unit (NCU), and a configurable number of compute engines (CEs). During runtime, the host processor loads the command stream onto the control unit which parses the stream and executes the operations by controlling the various functional blocks. The DMA engine is capable of talking to external memory while being aware of the various supported neural network layouts - allowing it to handle strides and other predictable NN memory operations, fetching the data for compute ahead of time. The compute engines are the workhorse component of the system. Each CE comprises two functional blocks, the MAC Compute Engine (MCE) and the Programmable Layer Engine (PLE). The MCE performs fixed-function multiply-accumulate operations efficiently on 8-bit integers while the PLE offers a more flexible programmable processor that supports vector operations and can implement more complex or less common operations. The architecture design relies on careful co-design with the compiler but yields more simplified hardware while enabling more deterministic performance characteristics. |
== Compute Engine (CE) == | == Compute Engine (CE) == | ||
Line 57: | Line 54: | ||
=== Programmable Layer Engine (PLE) === | === Programmable Layer Engine (PLE) === | ||
− | + | {{empty section}} | |
− | |||
− | |||
− | |||
== Die == | == Die == | ||
Line 72: | Line 66: | ||
== Bibliography == | == Bibliography == | ||
* {{bib|hc|30|Arm}} | * {{bib|hc|30|Arm}} | ||
− | |||
− | |||
== See also == | == See also == | ||
* {{armh|Ethos}} | * {{armh|Ethos}} | ||
* {{intel|Spring Hill|l=arch}} | * {{intel|Spring Hill|l=arch}} |
codename | MLP + |
designer | Arm Holdings + |
first launched | 2018 + |
full page name | arm holdings/microarchitectures/mlp + |
instance of | microarchitecture + |
manufacturer | TSMC +, Samsung + and UMC + |
name | MLP + |
process | 16 nm (0.016 μm, 1.6e-5 mm) + and 7 nm (0.007 μm, 7.0e-6 mm) + |
processing element count | 4 +, 12 +, 8 + and 16 + |