From WikiChip
Difference between revisions of "phytium/microarchitectures/mars i"
< phytium

(mars i)
 
 
(7 intermediate revisions by the same user not shown)
Line 42: Line 42:
 
{{main|phytium/microarchitectures/xiaomi|l1=Xiaomi Core}}
 
{{main|phytium/microarchitectures/xiaomi|l1=Xiaomi Core}}
 
See {{\\|Xiaomi|Xiaomi Core}}.
 
See {{\\|Xiaomi|Xiaomi Core}}.
 +
 +
== SoC ==
 +
 +
=== Panel Architecture ===
 +
[[File:xiaomi panel-based data affinity architecture.png|right|450px]]
 +
Phytium organizes their processors using a grid-layout they call '''Panels''' they call '''Panel-based data affinity architecture'''.  Each panel consists of 8 independent [[ARMv8]]-compatible cores. Phytium "Mars" processor consists of 8 such panels for a total of [[64 cores]]. Panels are interconnected with a 2-dimensional mesh network-on-a-chip [[level 2 cache]] with 4 MiB per panel for a total of 32 MiB.
 +
 +
In addition to the main die, Mars uses an additional '''Cache & Memory chips''' ('''CMC''') auxiliary chips. "Mars" uses 8 such chips connected to the main die providing 16 MiB of [[level 3 cache]] for a total of 128 MiB as well as 8 dual-channel DDR3-1600 [[memory controller]]s for a total maximum bandwidth of 204 GiB/s. Mars also provides two 16-lane PCIe 3.0 interfaces. The chips incorporates ECC and parity protection on all caches, tags, and TLBs.
 +
 +
==== Panel ====
 +
Each Panel consists of 8 cores - each [[ARMv8]]-compatible, supporting AArch32 and AArch64 modes, Exception Levels EL0-EL3, as well as ASIMD-128 operations. Each core has its own inclusive [[L1 cache]] and a shared [[L2 cache]] (4 MiB per panel). Each panel contains two '''Directory Control Units''' ('''DCU''') which are in charge of maintaining directory-based [[cache coherency]] and one routing cell for managing the inter-panel communication.
 +
 +
On TSMC's [[28 nm process]], a panel is 6,000 µm x 10,600 µm (63.6 mm²).
 +
 +
{| style="border-spacing: 15px;"
 +
| [[File:xiaomi panel.png|400px]] || &nbsp; || [[File:xiaomi panel die (28nm).png|300px]]
 +
|}
 +
 +
==== Cache & Memory Chip (CMC) ====
 +
[[File:xiaomi cmc.png|right|300px]]
 +
The solve the complexity involved in having more than eight memory controllers on a chip, Xiaomi uses a coupled auxiliary '''Cache & Memory Chip''' ('''CMC''') to scale the bandwidth with computing power. In the case of Phytium "Mars" chip which contains 64 cores on 8 panels, eight CMC chips are used which provides 16 DDR3 controllers (8x2) along with 16 MiB of data L3 cache and 2 MiB of data ECC. Phytium proprietary interface is used between the processor and the CMC chip.
 +
 +
{|
 +
| [[File:xiaomi latency.png|600px]] ||
 +
{| class="wikitable"
 +
! Memory access !! Latency(ns)
 +
|-
 +
|Local L1 cache hit || ~2
 +
|-
 +
|Local L2 cache hit || ~8
 +
|-
 +
|Affinitive L2 cache hit || ~20
 +
|-
 +
|Affinitive L3 cache hit || ~36
 +
|-
 +
|Affinitive DDR access || ~70
 +
|}
 +
|}
 +
* Panel & NoC operates @ 2 GHz
 +
* CMC operates @ 1.5 GHz
 +
 +
=== Interconnects & Hawk ===
 +
'''Hawk''' is Pythium cache coherence protocol which implements a distributed directory-based global cache coherency across all the panels. Hawk is a [[MOESI]]-like package-based protocol. The network has a node on each panel called a '''Directory Control Unit''' ('''DCU''') which is responsible for interfacing between the L2 caches in each panel to the CMCs (see [[#Panel_Architecture|§ Panel Architecture]]). Phytium noted that it's optimized for exclusive atomic accesses.
 +
 +
Xiaomi implements a 2D concentrated mesh architecture on-die connecting each of the panels. Phytium "Mars" chip contains 8 panels which are organized in two rows of four panels each. Switching is relatively low latency with 3 cycles per hop. On average, packets will have around 9 cycles latency from any other panel. This network results in a bandwidth of 384 GiB/s each cell.
 +
 +
{|
 +
| [[File:xiaomi 2d network.png|600px]] ||
 +
{| class="wikitable" style="text-align: center;"
 +
! Destination !! Latency
 +
|-
 +
| 0 || 3
 +
|-
 +
| 1 || 6
 +
|-
 +
| 2 || 9
 +
|-
 +
| 3 || 12
 +
|-
 +
| 4 || 15
 +
|-
 +
| 5 || 12
 +
|-
 +
| 6 || 9
 +
|-
 +
| 7 || 6
 +
|-
 +
| Average || 9
 +
|}
 +
|}
  
 
== Die ==
 
== Die ==
Line 47: Line 117:
 
* Mars is fabricated on [[TSMC]]'s [[28 nm process]]
 
* Mars is fabricated on [[TSMC]]'s [[28 nm process]]
 
* 10 metal layers
 
* 10 metal layers
* ~180 million instances
+
* 4,800,000,000 transistors
 +
** ~180 million instances
 
* 639.576 mm² die size
 
* 639.576 mm² die size
 +
** 25.38 mm x 25.2 mm
 
* FCBGA Package
 
* FCBGA Package
 
** ~3000 pins
 
** ~3000 pins
Line 55: Line 127:
  
 
:[[File:xiaomi floor plan.png|class=wikichip_ogimage|700px]]
 
:[[File:xiaomi floor plan.png|class=wikichip_ogimage|700px]]
 +
 +
== All Mars I Processors ==
 +
<!-- NOTE:
 +
          This table is generated automatically from the data in the actual articles.
 +
          If a microprocessor is missing from the list, an appropriate article for it needs to be
 +
          created and tagged accordingly.
 +
 +
          Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 +
-->
 +
{{comp table start}}
 +
<table class="comptable sortable tc4">
 +
{{comp table header|main|6:List of Mars I-based Processors}}
 +
{{comp table header|cols|Launched|Cores|L2|%Frequency|%TDP}}
 +
{{#ask: [[Category:microprocessor models by phytium]] [[microarchitecture::Mars I]]
 +
|?full page name
 +
|?model number
 +
|?first launched
 +
|?core count
 +
|?l2$ size
 +
|?base frequency#GHz
 +
|?tdp#W
 +
|format=template
 +
|template=proc table 3
 +
|userparam=7
 +
|mainlabel=-
 +
}}
 +
{{comp table count|ask=[[Category:microprocessor models by phytium]] [[microarchitecture::Mars I]]}}
 +
</table>
 +
{{comp table end}}
 +
 +
== Bibliography ==
 +
* {{bib|hc|27|Phytium}}

Latest revision as of 15:49, 15 October 2019

Edit Values
Mars I µarch
General Info
Arch TypeCPU
DesignerPhytium
ManufacturerTSMC
Introduction2017
Process28 nm
Core Configs64
Pipeline
TypeSuperscalar, Pipelined
OoOEYes
SpeculativeYes
Reg RenamingYes
Instructions
ISAARMv8
Succession

Mars I is the first many-core ARM SoC microarchitecture designed by Phytium Technology for the Chinese server market.

Process technology[edit]

Mars I is designed for TSMC's 28 nm process.

Architecture[edit]

This list is incomplete; you can help by expanding it.

Block diagram[edit]

Entire SoC[edit]

mars ii soc block diagram.svg

Panel[edit]

mars ii panel block diagram.svg

Core[edit]

Main article: Xiaomi Core

See Xiaomi Core.

SoC[edit]

Panel Architecture[edit]

xiaomi panel-based data affinity architecture.png

Phytium organizes their processors using a grid-layout they call Panels they call Panel-based data affinity architecture. Each panel consists of 8 independent ARMv8-compatible cores. Phytium "Mars" processor consists of 8 such panels for a total of 64 cores. Panels are interconnected with a 2-dimensional mesh network-on-a-chip level 2 cache with 4 MiB per panel for a total of 32 MiB.

In addition to the main die, Mars uses an additional Cache & Memory chips (CMC) auxiliary chips. "Mars" uses 8 such chips connected to the main die providing 16 MiB of level 3 cache for a total of 128 MiB as well as 8 dual-channel DDR3-1600 memory controllers for a total maximum bandwidth of 204 GiB/s. Mars also provides two 16-lane PCIe 3.0 interfaces. The chips incorporates ECC and parity protection on all caches, tags, and TLBs.

Panel[edit]

Each Panel consists of 8 cores - each ARMv8-compatible, supporting AArch32 and AArch64 modes, Exception Levels EL0-EL3, as well as ASIMD-128 operations. Each core has its own inclusive L1 cache and a shared L2 cache (4 MiB per panel). Each panel contains two Directory Control Units (DCU) which are in charge of maintaining directory-based cache coherency and one routing cell for managing the inter-panel communication.

On TSMC's 28 nm process, a panel is 6,000 µm x 10,600 µm (63.6 mm²).

xiaomi panel.png   xiaomi panel die (28nm).png

Cache & Memory Chip (CMC)[edit]

xiaomi cmc.png

The solve the complexity involved in having more than eight memory controllers on a chip, Xiaomi uses a coupled auxiliary Cache & Memory Chip (CMC) to scale the bandwidth with computing power. In the case of Phytium "Mars" chip which contains 64 cores on 8 panels, eight CMC chips are used which provides 16 DDR3 controllers (8x2) along with 16 MiB of data L3 cache and 2 MiB of data ECC. Phytium proprietary interface is used between the processor and the CMC chip.

xiaomi latency.png
Memory access Latency(ns)
Local L1 cache hit ~2
Local L2 cache hit ~8
Affinitive L2 cache hit ~20
Affinitive L3 cache hit ~36
Affinitive DDR access ~70
  • Panel & NoC operates @ 2 GHz
  • CMC operates @ 1.5 GHz

Interconnects & Hawk[edit]

Hawk is Pythium cache coherence protocol which implements a distributed directory-based global cache coherency across all the panels. Hawk is a MOESI-like package-based protocol. The network has a node on each panel called a Directory Control Unit (DCU) which is responsible for interfacing between the L2 caches in each panel to the CMCs (see § Panel Architecture). Phytium noted that it's optimized for exclusive atomic accesses.

Xiaomi implements a 2D concentrated mesh architecture on-die connecting each of the panels. Phytium "Mars" chip contains 8 panels which are organized in two rows of four panels each. Switching is relatively low latency with 3 cycles per hop. On average, packets will have around 9 cycles latency from any other panel. This network results in a bandwidth of 384 GiB/s each cell.

xiaomi 2d network.png
Destination Latency
0 3
1 6
2 9
3 12
4 15
5 12
6 9
7 6
Average 9

Die[edit]

SoC[edit]

  • Mars is fabricated on TSMC's 28 nm process
  • 10 metal layers
  • 4,800,000,000 transistors
    • ~180 million instances
  • 639.576 mm² die size
    • 25.38 mm x 25.2 mm
  • FCBGA Package
    • ~3000 pins
  • 0.9 VCORE, 1.8 VIO
  • 2 GHz, 120 W
xiaomi floor plan.png

All Mars I Processors[edit]

 List of Mars I-based Processors
ModelLaunchedCoresL2FrequencyTDP
FT-2000/6420176432 MiB
32,768 KiB
33,554,432 B
0.0313 GiB
2 GHz
2,000 MHz
2,000,000 kHz
120 W
120,000 mW
0.161 hp
0.12 kW
Count: 1

Bibliography[edit]

  • Phytium, IEEE Hot Chips 27 Symposium (HCS) 2015.
codenameMars I +
core count64 +
designerPhytium +
first launched2017 +
full page namephytium/microarchitectures/mars i +
instance ofmicroarchitecture +
instruction set architectureARMv8 +
manufacturerTSMC +
microarchitecture typeCPU +
nameMars I +
process28 nm (0.028 μm, 2.8e-5 mm) +