From WikiChip
Editing cea-leti/microarchitectures/tsarlet

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 21: Line 21:
 
|l1d per=core
 
|l1d per=core
 
|l2=256 KiB
 
|l2=256 KiB
|l2 per=cluster
+
|l2 per=core
 
|l3=1 MiB
 
|l3=1 MiB
|l3 per=cluster
+
|l3 per=core
 
}}
 
}}
 
'''TSARLET''' was a research microarchitecture designed by [[CEA-Leti]] demonstarting the theoretical capabilities of a large-scale high-performance 3D stacked [[chiplets]]-based SoC technology. The project comprised 96 [[MIPS]] cores built using 6 [[chiplets]] [[3D stack]] on an active interposer in order to demonstarte in-package silicon [[scale-out]] capabilities with superior inter-chip capabilities while reducing the overall power and production cost.
 
'''TSARLET''' was a research microarchitecture designed by [[CEA-Leti]] demonstarting the theoretical capabilities of a large-scale high-performance 3D stacked [[chiplets]]-based SoC technology. The project comprised 96 [[MIPS]] cores built using 6 [[chiplets]] [[3D stack]] on an active interposer in order to demonstarte in-package silicon [[scale-out]] capabilities with superior inter-chip capabilities while reducing the overall power and production cost.
  
 
== Architecture ==
 
== Architecture ==
* Multi-chip architecture
 
** 6 compute [[chiplets]]
 
*** [[28 nm]] FDSOI
 
*** 4 quad-core clusters
 
**** 5-stage scalar MIPS32v1 cores
 
** Active base die
 
*** [[65 nm]] CMOS
 
*** Per-chiplet voltage regulator and power management
 
** NoCs
 
*** 4 NoCs
 
**** 2D and 3D mesh interconnects
 
* Packaging
 
** Face-to-face 3D stacked packaging technology
 
*** 20 μm pitch μbumps
 
*** 40 μm pitch TSVs
 
{{expand list}}
 
 
== Block Diagram ==
 
=== Compute chiplet ===
 
[[File:tsarlet chiplet block.svg|600px]]
 
 
 
=== Memory Hierarchy ===
 
=== Memory Hierarchy ===
* L1 Cache
+
{{empty section}}
** L1 Instruction cache
 
*** 16 KiB/core
 
** L1 Data cache
 
*** 16 KiB/core
 
* L2 Cache
 
** 256 KiB/cluster
 
* L3 Cache
 
** 1 MiB/cluster
 
  
 
== Overview ==
 
== Overview ==
Line 70: Line 41:
  
 
There are three individual 2D [[mesh interconnect|mesh]] [[NoCs]]. A dedicated 2D mesh connects the L1 caches to the L2 caches, another 2D mesh connects the L2 caches to the L3 caches, and a third 2D mesh connects the L3 caches to the external memory. All three NoCs are extended from the chiplet through the interposer to the other chiplets.
 
There are three individual 2D [[mesh interconnect|mesh]] [[NoCs]]. A dedicated 2D mesh connects the L1 caches to the L2 caches, another 2D mesh connects the L2 caches to the L3 caches, and a third 2D mesh connects the L3 caches to the external memory. All three NoCs are extended from the chiplet through the interposer to the other chiplets.
 
=== Cache Coherency ===
 
Each core implements a 32-bit virtual address space that's mapped onto a 40-bit physical address space that is physically distributed among the L2 caches. TSARLET is a [[NUMA]] architecture with the 8 most significant bits of the address being used for per-cluster. The L3 cache is shared by all the cores and clusters with more demanding workloads allocating more portions. Cache coherency for the L1 and I/O is maintained by the L2 caches using a directory-based coherency protocol using a list-based directory. Up to four sharers may share the same cache lines. Cache lines are in either list mode or counter mode. When in list mode, the sharer's ID is stored in a linked list with consequent sharer's IDs stored in the heap. On a modification, a multicast update/invalidate message is issued to all the sharers. A line is in counter mode when the heap is full or four sharers are occupied. In this scenario, broadcast invalidates are issued and only the sharers' count is stored. Hardware support is provided for broadcast to allow only sharers to answer.
 
  
 
== Base die ==
 
== Base die ==
[[File:tsarlet package front.png|right|thumb]]
 
 
All the compute chiplets rest on the base die. The base die is designed to interlink the compute chiplets and provide the necessary interfaces to the outside world. Measuring roughly 200 mm² and fabricated on a legacy [[65 nm process]] in order to reduce cost and improve yield. The major role of the base die is to seamlessly extend the cache NoCs between the various chiplets. 3D-Plug communication IPs are utilized, implementing the logical and physical interfaces between the chiplets and the base die. There are two versions of plugs: synchronous and asynchronous.
 
All the compute chiplets rest on the base die. The base die is designed to interlink the compute chiplets and provide the necessary interfaces to the outside world. Measuring roughly 200 mm² and fabricated on a legacy [[65 nm process]] in order to reduce cost and improve yield. The major role of the base die is to seamlessly extend the cache NoCs between the various chiplets. 3D-Plug communication IPs are utilized, implementing the logical and physical interfaces between the chiplets and the base die. There are two versions of plugs: synchronous and asynchronous.
  
Line 101: Line 68:
 
</table>
 
</table>
  
[[File:tsarlet scvr unit cell circuit.png|right|thumb|SCVR Unit Cell]]
 
 
TSARLET uses [[switch cap voltage regulators]] for power management. With 6 chiplets landing on the base die, there are 6 SCVRs - one for each chiplet. In fact, Leti reported that the SCVRs make up around 30% of the die area. Each unit is managed by a central clock-frequency and feedback controller with a sub-10ns step response, enabling the SCVR to provide very rapid transitions and local IR-drop mitigation. Relatively high voltage (~2.5V) is brought in to the SoC via the interposer back-face through the 40 μm pitch TSV array in order to reduce the number of pins that are required. The SCVRs are fully integrated using thick oxide transistors with no external passive components. On-chip CAPs are used using MOM+MOM+MIM for a total capacitance density of 8.9 nF/mm².  
 
TSARLET uses [[switch cap voltage regulators]] for power management. With 6 chiplets landing on the base die, there are 6 SCVRs - one for each chiplet. In fact, Leti reported that the SCVRs make up around 30% of the die area. Each unit is managed by a central clock-frequency and feedback controller with a sub-10ns step response, enabling the SCVR to provide very rapid transitions and local IR-drop mitigation. Relatively high voltage (~2.5V) is brought in to the SoC via the interposer back-face through the 40 μm pitch TSV array in order to reduce the number of pins that are required. The SCVRs are fully integrated using thick oxide transistors with no external passive components. On-chip CAPs are used using MOM+MOM+MIM for a total capacitance density of 8.9 nF/mm².  
  
Line 107: Line 73:
  
 
=== 3D-Plug ===
 
=== 3D-Plug ===
[[File:tsarlet 3d plug matrix.png|thumb|right|3D-Plug μbumps matrix]][[File:tsarlet 3d plug ubumps.png|thumb|right|μbumps]]
+
Although this paticular SoC uses the same type of chiplets, in order to theoretically allow different types of chiplets to be integrated on the same base die, a generic chiplet-interposer interface called 3D-Plug was designed. Every compute chiplet incorporates four 3D-plugs - one for each core cluster. They are physically located at each corner of the die. The actual interfaces are a μ-bump matrix array of 12 x 28 μ-bumps with a 20 μm pitch. The interface consists of the logic interface, μ-buffers, and various [[design for testability|DFT]] support (e.g., [[boundry scan]]). The μ-buffers std cell integrates a bidirectional driver, ESD protection, pull-up, and a level-shifter to bridge between the two different domains between the bottom die and upper die.
Although this particular SoC uses the same type of chiplets, in order to theoretically allow different types of chiplets to be integrated on the same base die, a generic chiplet-interposer interface called 3D-Plug was designed. Every compute chiplet incorporates four 3D-plugs - one for each core cluster. They are physically located at each corner of the die. The actual interfaces are a μ-bump matrix array of 12 x 28 μ-bumps with a 20 μm pitch. The interface consists of the logic interface, μ-buffers, and various [[design for testability|DFT]] support (e.g., [[boundry scan]]). The μ-buffers std cell integrates a bidirectional driver, ESD protection, pull-up, and a level-shifter to bridge between the two different domains between the bottom die and upper die.
 
  
 
<table class="wikitable">
 
<table class="wikitable">
Line 117: Line 82:
 
<tr><th>Bandwidth Density</th><td>3.0 Tb/s/mm²</td></tr>
 
<tr><th>Bandwidth Density</th><td>3.0 Tb/s/mm²</td></tr>
 
</table>
 
</table>
 
:[[File:tsarlet scvr unit cell.png|500px]]
 
  
 
== 3D Stacking ==
 
== 3D Stacking ==
[[File:tsarlet interposer with chiplet.png|thumb|right|early packaging test]]
 
 
The compute chiplets are 3D-stacked on the base interposer die in a face-to-face configuration. The connections are done using a 20 μm μ-bumps onto the base die. Direct connections to the package were done with 40 μm pitch [[TSVs]].
 
The compute chiplets are 3D-stacked on the base interposer die in a face-to-face configuration. The connections are done using a 20 μm μ-bumps onto the base die. Direct connections to the package were done with 40 μm pitch [[TSVs]].
  
 
[[File:tsarlet xsection.png|400px]]
 
  
 
== Package ==
 
== Package ==
Line 171: Line 131:
  
 
:[[File:tsarlet base interposer.png|400px]]
 
:[[File:tsarlet base interposer.png|400px]]
 
== Bibliography ==
 
* {{bib|isscc|2020|CEA-Leti}}
 
* {{bib|ectc|2019|CEA-Leti}}
 

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameTSARLET +
core count96 +
designerCEA-Leti +
full page namecea-leti/microarchitectures/tsarlet +
instance ofmicroarchitecture +
instruction set architectureMIPS32v1 +
manufacturerSTMicroelectronics +
microarchitecture typeCPU +
nameTSARLET +
pipeline stages5 +
process28 nm (0.028 μm, 2.8e-5 mm) + and 65 nm (0.065 μm, 6.5e-5 mm) +