|  (→Panel Architecture) | |||
| Line 80: | Line 80: | ||
| * Panel & NoC operates @ 2 GHz | * Panel & NoC operates @ 2 GHz | ||
| * CMC operates @ 1.5 GHz | * CMC operates @ 1.5 GHz | ||
| + | |||
| + | === Interconnects & Hawk === | ||
| + | '''Hawk''' is Pythium cache coherence protocol which implements a distributed directory-based global cache coherency across all the panels. Hawk is a [[MOESI]]-like package-based protocol. The network has a node on each panel called a '''Directory Control Unit''' ('''DCU''') which is responsible for interfacing between the L2 caches in each panel to the CMCs (see [[#Panel_Architecture|§ Panel Architecture]]). Phytium noted that it's optimized for exclusive atomic accesses. | ||
| + | |||
| + | Xiaomi implements a 2D concentrated mesh architecture on-die connecting each of the panels. Phytium "Mars" chip contains 8 panels which are organized in two rows of four panels each. Switching is relatively low latency with 3 cycles per hop. On average, packets will have around 9 cycles latency from any other panel. This network results in a bandwidth of 384 GiB/s each cell. | ||
| + | |||
| + | {| | ||
| + | | [[File:xiaomi 2d network.png|600px]] ||  | ||
| + | {| class="wikitable" style="text-align: center;" | ||
| + | ! Destination !! Latency | ||
| + | |- | ||
| + | | 0 || 3 | ||
| + | |- | ||
| + | | 1 || 6 | ||
| + | |- | ||
| + | | 2 || 9 | ||
| + | |- | ||
| + | | 3 || 12 | ||
| + | |- | ||
| + | | 4 || 15 | ||
| + | |- | ||
| + | | 5 || 12 | ||
| + | |- | ||
| + | | 6 || 9 | ||
| + | |- | ||
| + | | 7 || 6 | ||
| + | |- | ||
| + | | Average || 9 | ||
| + | |} | ||
| + | |} | ||
| == Die == | == Die == | ||
Revision as of 18:35, 17 March 2019
| Edit Values | |
| Mars I µarch | |
| General Info | |
| Arch Type | CPU | 
| Designer | Phytium | 
| Manufacturer | TSMC | 
| Introduction | 2017 | 
| Process | 28 nm | 
| Core Configs | 64 | 
| Pipeline | |
| Type | Superscalar, Pipelined | 
| OoOE | Yes | 
| Speculative | Yes | 
| Reg Renaming | Yes | 
| Instructions | |
| ISA | ARMv8 | 
| Succession | |
Mars I is the first many-core ARM SoC microarchitecture designed by Phytium Technology for the Chinese server market.
Contents
Process technology
Mars I is designed for TSMC's 28 nm process.
Architecture
- 64 ARM cores
- 28 nm process
- 2 GHz
- FTC-661/0 Xiaomi core
-  System memory
- DDR3
- 1600 MT/s
 
This list is incomplete; you can help by expanding it.
Block diagram
Entire SoC
Panel
Core
- Main article: Xiaomi Core
See Xiaomi Core.
Panel Architecture
Phytium organizes their processors using a grid-layout they call Panels they call Panel-based data affinity architecture. Each panel consists of 8 independent ARMv8-compatible cores. Phytium "Mars" processor consists of 8 such panels for a total of 64 cores. Panels are interconnected with a 2-dimensional mesh network-on-a-chip level 2 cache with 4 MiB per panel for a total of 32 MiB.
In addition to the main die, Mars uses an additional Cache & Memory chips (CMC) auxiliary chips. "Mars" uses 8 such chips connected to the main die providing 16 MiB of level 3 cache for a total of 128 MiB as well as 8 dual-channel DDR3-1600 memory controllers for a total maximum bandwidth of 204 GiB/s. Mars also provides two 16-lane PCIe 3.0 interfaces. The chips incorporates ECC and parity protection on all caches, tags, and TLBs.
Panel
Each Panel consists of 8 cores - each ARMv8-compatible, supporting AArch32 and AArch64 modes, Exception Levels EL0-EL3, as well as ASIMD-128 operations. Each core has its own inclusive L1 cache and a shared L2 cache (4 MiB per panel). Each panel contains two Directory Control Units (DCU) which are in charge of maintaining directory-based cache coherency and one routing cell for managing the inter-panel communication.
On TSMC's 28 nm process, a panel is 6,000 µm x 10,600 µm (63.6 mm²).
|   |   | 
Cache & Memory Chip (CMC)
The solve the complexity involved in having more than eight memory controllers on a chip, Xiaomi uses a coupled auxiliary Cache & Memory Chip (CMC) to scale the bandwidth with computing power. In the case of Phytium "Mars" chip which contains 64 cores on 8 panels, eight CMC chips are used which provides 16 DDR3 controllers (8x2) along with 16 MiB of data L3 cache and 2 MiB of data ECC. Phytium proprietary interface is used between the processor and the CMC chip.
|   | 
 | 
- Panel & NoC operates @ 2 GHz
- CMC operates @ 1.5 GHz
Interconnects & Hawk
Hawk is Pythium cache coherence protocol which implements a distributed directory-based global cache coherency across all the panels. Hawk is a MOESI-like package-based protocol. The network has a node on each panel called a Directory Control Unit (DCU) which is responsible for interfacing between the L2 caches in each panel to the CMCs (see § Panel Architecture). Phytium noted that it's optimized for exclusive atomic accesses.
Xiaomi implements a 2D concentrated mesh architecture on-die connecting each of the panels. Phytium "Mars" chip contains 8 panels which are organized in two rows of four panels each. Switching is relatively low latency with 3 cycles per hop. On average, packets will have around 9 cycles latency from any other panel. This network results in a bandwidth of 384 GiB/s each cell.
|   | 
 | 
Die
SoC
- Mars is fabricated on TSMC's 28 nm process
- 10 metal layers
- ~180 million instances
- 639.576 mm² die size
-  FCBGA Package
- ~3000 pins
 
- 0.9 VCORE, 1.8 VIO
- 2 GHz, 120 W
Bibliography
| codename | Mars I + | 
| core count | 64 + | 
| designer | Phytium + | 
| first launched | 2017 + | 
| full page name | phytium/microarchitectures/mars i + | 
| instance of | microarchitecture + | 
| instruction set architecture | ARMv8 + | 
| manufacturer | TSMC + | 
| microarchitecture type | CPU + | 
| name | Mars I + | 
| process | 28 nm (0.028 μm, 2.8e-5 mm) + | 




