From WikiChip
Editing nvidia/nvlink

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 9: Line 9:
  
  
NVLink is designed to replace the inter-GPU-GPU communication from going over the PCIe lanes. It's worth noting that NVLink was also designed for CPU-GPU communication with higher bandwidth than PCIe. Although it's unlikely that NVLink would be implemented on an x86 system by either [[AMD]] or [[Intel]], [[IBM]] has collaborated with Nvidia to support NVLink on their [[POWER]] microprocessors. For supported microprocessors, the NVLink can eliminate PCIe entirely for all links.
+
NVLink is designed to replace the inter-GPU-GPU communication from going over the PCIe lanes. It's worth noting that NVLink was also designed for CPU-GPU communication with higher bandwidth than PCIe. Although it's unlikely that NVLink would be implemented on an x86 system by either [[AMD]] or [[Intel]], [[IBM]] has collaborate with Nvidia to support NVLink on their [[POWER]] microprocessors. For support microprocessors, the NVLink can eliminate PCIe entirely for all links.
  
  
Line 15: Line 15:
  
 
=== Links ===
 
=== Links ===
An NVLink channel is called a '''Brick''' (or an ''NVLink Brick''). A single NVLink is a bidirectional interface which comprises 8 differential pairs in each direction for a total of 32 wires. The pairs are DC coupled an use an 85Ω differential termination with an embedded clock. To ease routing, NVLink supports [[lane reversal]] and [[lane polarity]], meaning the physical lane ordering and their polarity between the two devices may be reversed.
+
A single NVLink is a bidirectional interface which comprises 8 differential pairs in each direction for a total of 32 wires. The pairs are DC coupled an use an 85Ω differential termination with an embedded clock. To ease routing, NVLink supports [[lane reversal]] and [[lane polarity]], meaning the physical lane ordering and their polarity between the two devices may be reversed.
  
  
Line 40: Line 40:
 
=== Data Rates ===
 
=== Data Rates ===
 
<table class="wikitable">
 
<table class="wikitable">
<tr><th>&nbsp;</th><th>[[#NVLink 1.0|NVLink 1.0]]</th><th>[[#NVLink 2.0|NVLink 2.0]]</th><th> [[#NVLink 3.0|NVlink 3.0]]</th><th> [[#NVLink 4.0|NVlink 4.0]]</th></tr>
+
<tr><th>&nbsp;</th><th>[[#NVLink 1.0|NVLink 1.0]]</th><th>[[#NVLink 2.0|NVLink 2.0]]</th></tr>
<tr><th>Signaling Rate</th><td>20 GT/s</td><td>25 GT/s</td><td>50 GT/s</td><td>100 GT/s</td></tr>
+
<tr><th>Signaling Rate</th><td>20 GT/s</td><td>25 GT/s</td></tr>
<tr><th>Lanes/Link</th><td>8</td><td>8</td><td>4</td><td>2</td></tr>
+
<tr><th>Lanes/Link</th><td>8</td><td>8</td></tr>
<tr><th>Rate/Link</th><td>20 GB/s</td><td>25 GB/s</td><td>25 GB/s</td><td>25 GB/s</td></tr>
+
<tr><th>Rate/Link</th><td>20 GB/s</td><td>25 GB/s</td></tr>
<tr><th>BiDir BW/Link</th><td>40 GB/s</td><td>50 GB/s</td><td> 50 GB/s</td><td> 50 GB/s</td></tr>
+
<tr><th>BiDir BW/Link</th><td>40 GB/s</td><td>50 GB/s</td></tr>
<tr><th>Links/Chip</th><td>4 (P100)</td><td>6 (V100)</td><td> 12 (A100)</td><td> 18 (H100)</td></tr>
+
<tr><th>Links/Chip</th><td>4 (P100)</td><td>6 (V100)</td></tr>
<tr><th>BiDir BW/Chip</th><td>160 GB/s (P100)</td><td>300 GB/s (V100)</td><td>600 GB/s (A100)</td><td>900 GB/s (H100)</td></tr>
+
<tr><th>BiDir BW/Chip</th><td>160 GB/s (P100)</td><td>300 GB/s (V100)</td></tr>
 
</table>
 
</table>
  
 
== NVLink 1.0 ==
 
== NVLink 1.0 ==
NVLink 1.0 was first introduced with the {{nvidia|P100}} [[GPGPU]] based on the {{nvidia|Pascal|l=arch}} microarchitecture. {{nvidia|P100}} comes with its own [[HBM]] memory in addition to being able to access system memory from the CPU side. The P100 has four NVLinks, which supports up to 20 GB/s for a bidrectional bandwidth of 40 GB/s for a total aggregated bandwidth of 160 GB/s. In the most basic configuration, all four links are connected between the two GPUs for 160 GB/s GPU-GPU bandwidth in addition to the PCIe lanes connected to the CPU for accessing system [[DRAM]].
+
NVLink 1.0 was first introduced with the {{nvidia|P100}} [[GPGPU]] based on the {{nvidia|Pascal|l=arch}} microarchitecture. {{nvidia|P100}} comes with its own [[HBM]] memory in addition to being able to being able to access system memory from the CPU side. The P100 has four NVLinks, which supports up to 20 GB/s for a bidrectional bandwidth of 40 GB/s for a total aggregated bandwidth of 160 GB/s. In the most basic configuration, all four links are connected between the two GPUs for 160 GB/s GPU-GPU bandwidth in addition to the PCIe lanes connected to the CPU for accessing system [[DRAM]].
  
  
Line 69: Line 69:
 
=== DGX-1 Configuration ===
 
=== DGX-1 Configuration ===
 
[[File:nvidia dgx-1 nvlink hybrid cube-mesh.svg|right|300px]]
 
[[File:nvidia dgx-1 nvlink hybrid cube-mesh.svg|right|300px]]
In 2017 Nvidia introduced the DGX-1 system which takes full advantage of NVLink. The DGX-1 consists of eight Tesla P100 GPUs connected in a hybrid cube-mesh NVLink network topology along with dual-socket {{intel|Xeon}} CPUs. The two Xeons communicate with each other over [[Intel]]'s {{intel|QPI}} while the GPUs communicate via the NVLink.  
+
In 2017 Nvidia introduced the DGX-1 system which takes full advantage of NVLink. The DGX-1 consists of eight Tela P100 GPUs connected in a hybrid cube-mesh NVLink network topology along with dual-socket {{intel|Xeon}} CPUs. The two Xeons communicate with each other over [[Intel]]'s {{intel|QPU}} while the GPUs communicate via the NVLink.  
  
 
:[[File:nvidia dgx-1 nvlink-gpu-xeon config.svg|600px]]
 
:[[File:nvidia dgx-1 nvlink-gpu-xeon config.svg|600px]]
Line 78: Line 78:
 
NVLink 2.0 was introduced with the second-generation {{nvidia|DGX-1}}, but the full topology change took place with the {{nvidia|DGX-2}}. Nvidia also introduced the {{nvidia|NVSwitch}} with the DGX-2 which is an 18 NVLink ports switch. The 2-billion transistor switch can route traffic from nine ports to any of the other nine ports. With 50 GB/s per port, the switch is capable of a total of 900 GB/s of bandwidth.
 
NVLink 2.0 was introduced with the second-generation {{nvidia|DGX-1}}, but the full topology change took place with the {{nvidia|DGX-2}}. Nvidia also introduced the {{nvidia|NVSwitch}} with the DGX-2 which is an 18 NVLink ports switch. The 2-billion transistor switch can route traffic from nine ports to any of the other nine ports. With 50 GB/s per port, the switch is capable of a total of 900 GB/s of bandwidth.
  
For the DGX-2, Nvidia uses six {{nvidia|NVSwitches}} to fully connect every one of the eight GPUs to all the other seven GPUs on the same baseboard.
+
For the DGX-2, Nvidia uses six {{nvlink|NVSwitches}} to fully connect every one of the eight GPUs to all the other seven GPUs on the same baseboard.
 
 
 
 
:[[File:dgx2 nvswitch baseboard diagram.svg|800px]]
 
 
 
 
 
Two baseboards are then connected to each other to fully connect all 16 GPUs to each other.
 
 
 
 
 
:[[File:dgx2 nvswitch baseboard diagram with two boards connected.svg|800px]]
 
 
 
== NVLink 3.0 ==
 
NVLink 3.0 was first introduced with the {{nvidia|A100}} [[GPGPU]] based on the {{nvidia|Ampere|l=arch}} microarchitecture. NVLink 3.0 uses 50 Gbps signaling rate
 
  
 
== References ==
 
== References ==
*  IEEE HotChips 34 (HC28), 2022
 
*  IEEE HotChips 30 (HC28), 2018
 
 
*  IEEE HotChips 29 (HC29), 2017
 
*  IEEE HotChips 29 (HC29), 2017
 
*  IEEE HotChips 28 (HC28), 2016
 
*  IEEE HotChips 28 (HC28), 2016
 
 
  
 
[[category:nvidia]]
 
[[category:nvidia]]
 
[[Category:interconnect_architectures]]
 
[[Category:interconnect_architectures]]

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)