From WikiChip
Mars I - Microarchitectures - Phytium
< phytium
Revision as of 20:00, 16 March 2019 by David (talk | contribs)

Edit Values
Mars I µarch
General Info
Arch TypeCPU
DesignerPhytium
ManufacturerTSMC
Introduction2017
Process28 nm
Core Configs64
Pipeline
TypeSuperscalar, Pipelined
OoOEYes
SpeculativeYes
Reg RenamingYes
Instructions
ISAARMv8
Succession

Mars I is the first many-core ARM SoC microarchitecture designed by Phytium Technology for the Chinese server market.

Process technology

Mars I is designed for TSMC's 28 nm process.

Architecture

This list is incomplete; you can help by expanding it.

Block diagram

Entire SoC

mars ii soc block diagram.svg

Panel

mars ii panel block diagram.svg

Core

Main article: Xiaomi Core

See Xiaomi Core.

Panel Architecture

xiaomi panel-based data affinity architecture.png

Phytium organizes their processors using a grid-layout they call Panels they call Panel-based data affinity architecture. Each panel consists of 8 independent ARMv8-compatible cores. Phytium "Mars" processor consists of 8 such panels for a total of 64 cores. Panels are interconnected with a 2-dimensional mesh network-on-a-chip level 2 cache with 4 MiB per panel for a total of 32 MiB.

In addition to the main die, Mars uses an additional Cache & Memory chips (CMC) auxiliary chips. "Mars" uses 8 such chips connected to the main die providing 16 MiB of level 3 cache for a total of 128 MiB as well as 8 dual-channel DDR3-1600 memory controllers for a total maximum bandwidth of 204 GiB/s. Mars also provides two 16-lane PCIe 3.0 interfaces. The chips incorporates ECC and parity protection on all caches, tags, and TLBs.

Panel

Each Panel consists of 8 cores - each ARMv8-compatible, supporting AArch32 and AArch64 modes, Exception Levels EL0-EL3, as well as ASIMD-128 operations. Each core has its own inclusive L1 cache and a shared L2 cache (4 MiB per panel). Each panel contains two Directory Control Units (DCU) which are in charge of maintaining directory-based cache coherency and one routing cell for managing the inter-panel communication.

On TSMC's 28 nm process, a panel is 6,000 µm x 10,600 µm (63.6 mm²).

xiaomi panel.png   xiaomi panel die (28nm).png

Cache & Memory Chip (CMC)

xiaomi cmc.png

The solve the complexity involved in having more than eight memory controllers on a chip, Xiaomi uses a coupled auxiliary Cache & Memory Chip (CMC) to scale the bandwidth with computing power. In the case of Phytium "Mars" chip which contains 64 cores on 8 panels, eight CMC chips are used which provides 16 DDR3 controllers (8x2) along with 16 MiB of data L3 cache and 2 MiB of data ECC. Phytium proprietary interface is used between the processor and the CMC chip.

xiaomi latency.png
Memory access Latency(ns)
Local L1 cache hit ~2
Local L2 cache hit ~8
Affinitive L2 cache hit ~20
Affinitive L3 cache hit ~36
Affinitive DDR access ~70
  • Panel & NoC operates @ 2 GHz
  • CMC operates @ 1.5 GHz

Die

SoC

  • Mars is fabricated on TSMC's 28 nm process
  • 10 metal layers
  • ~180 million instances
  • 639.576 mm² die size
  • FCBGA Package
    • ~3000 pins
  • 0.9 VCORE, 1.8 VIO
  • 2 GHz, 120 W
xiaomi floor plan.png
codenameMars I +
core count64 +
designerPhytium +
first launched2017 +
full page namephytium/microarchitectures/mars i +
instance ofmicroarchitecture +
instruction set architectureARMv8 +
manufacturerTSMC +
microarchitecture typeCPU +
nameMars I +
process28 nm (0.028 μm, 2.8e-5 mm) +