From WikiChip
Editing nvidia/nvlink
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 9: | Line 9: | ||
− | NVLink is designed to replace the inter-GPU-GPU communication from going over the PCIe lanes. It's worth noting that NVLink was also designed for CPU-GPU communication with higher bandwidth than PCIe. Although it's unlikely that NVLink would be implemented on an x86 system by either [[AMD]] or [[Intel]], [[IBM]] has | + | NVLink is designed to replace the inter-GPU-GPU communication from going over the PCIe lanes. It's worth noting that NVLink was also designed for CPU-GPU communication with higher bandwidth than PCIe. Although it's unlikely that NVLink would be implemented on an x86 system by either [[AMD]] or [[Intel]], [[IBM]] has collaborate with Nvidia to support NVLink on their [[POWER]] microprocessors. For support microprocessors, the NVLink can eliminate PCIe entirely for all links. |
Line 15: | Line 15: | ||
=== Links === | === Links === | ||
− | + | A single NVLink is a bidirectional interface which comprises 8 differential pairs in each direction for a total of 32 wires. The pairs are DC coupled an use an 85Ω differential termination with an embedded clock. To ease routing, NVLink supports [[lane reversal]] and [[lane polarity]], meaning the physical lane ordering and their polarity between the two devices may be reversed. | |
Line 40: | Line 40: | ||
=== Data Rates === | === Data Rates === | ||
<table class="wikitable"> | <table class="wikitable"> | ||
− | <tr><th> </th><th>[[#NVLink 1.0|NVLink 1.0]]</th><th>[[#NVLink 2.0|NVLink 2 | + | <tr><th> </th><th>[[#NVLink 1.0|NVLink 1.0]]</th><th>[[#NVLink 2.0|NVLink 2.0]]</th></tr> |
− | <tr><th>Signaling Rate</th><td>20 GT/s</td><td>25 | + | <tr><th>Signaling Rate</th><td>20 GT/s</td><td>25 GT/s</td></tr> |
− | <tr><th>Lanes/Link</th><td>8</td><td>8 | + | <tr><th>Lanes/Link</th><td>8</td><td>8</td></tr> |
− | <tr><th>Rate/Link</th><td>20 | + | <tr><th>Rate/Link</th><td>20 GB/s</td><td>25 GB/s</td></tr> |
− | <tr><th>BiDir BW/Link</th><td>40 | + | <tr><th>BiDir BW/Link</th><td>40 GB/s</td><td>50 GB/s</td></tr> |
− | <tr><th>Links/Chip</th><td>4 (P100)</td><td>6 (V100 | + | <tr><th>Links/Chip</th><td>4 (P100)</td><td>6 (V100)</td></tr> |
− | <tr><th>BiDir BW/Chip</th><td>160 GB/s (P100)</td><td>300 GB/s (V100 | + | <tr><th>BiDir BW/Chip</th><td>160 GB/s (P100)</td><td>300 GB/s (V100)</td></tr> |
</table> | </table> | ||
== NVLink 1.0 == | == NVLink 1.0 == | ||
− | NVLink 1.0 was first introduced with the {{nvidia|P100}} [[GPGPU]] based on the {{nvidia|Pascal|l=arch}} microarchitecture. {{nvidia|P100}} comes with its own [[HBM]] memory in addition to being able to access system memory from the CPU side. The P100 has four NVLinks, which supports up to 20 GB/s for a bidrectional bandwidth of 40 GB/s for a total aggregated bandwidth of 160 GB/s. In the most basic configuration, all four links are connected between the two GPUs for 160 GB/s GPU-GPU bandwidth in addition to the PCIe lanes connected to the CPU for accessing system [[DRAM]]. | + | NVLink 1.0 was first introduced with the {{nvidia|P100}} [[GPGPU]] based on the {{nvidia|Pascal|l=arch}} microarchitecture. {{nvidia|P100}} comes with its own [[HBM]] memory in addition to being able to being able to access system memory from the CPU side. The P100 has four NVLinks, which supports up to 20 GB/s for a bidrectional bandwidth of 40 GB/s for a total aggregated bandwidth of 160 GB/s. In the most basic configuration, all four links are connected between the two GPUs for 160 GB/s GPU-GPU bandwidth in addition to the PCIe lanes connected to the CPU for accessing system [[DRAM]]. |
Line 69: | Line 69: | ||
=== DGX-1 Configuration === | === DGX-1 Configuration === | ||
[[File:nvidia dgx-1 nvlink hybrid cube-mesh.svg|right|300px]] | [[File:nvidia dgx-1 nvlink hybrid cube-mesh.svg|right|300px]] | ||
− | In 2017 Nvidia introduced the DGX-1 system which takes full advantage of NVLink. The DGX-1 consists of eight | + | In 2017 Nvidia introduced the DGX-1 system which takes full advantage of NVLink. The DGX-1 consists of eight Tela P100 GPUs connected in a hybrid cube-mesh NVLink network topology along with dual-socket {{intel|Xeon}} CPUs. The two Xeons communicate with each other over [[Intel]]'s {{intel|QPU}} while the GPUs communicate via the NVLink. |
:[[File:nvidia dgx-1 nvlink-gpu-xeon config.svg|600px]] | :[[File:nvidia dgx-1 nvlink-gpu-xeon config.svg|600px]] | ||
== NVLink 2.0 == | == NVLink 2.0 == | ||
− | NVLink 2.0 was first introduced with the {{nvidia|V100}} [[GPGPU]] based on the {{nvidia|Volta|l=arch}} microarchitecture along with [[IBM]]'s {{ibm|POWER9|l=arch}}. | + | NVLink 2.0 was first introduced with the {{nvidia|V100}} [[GPGPU]] based on the {{nvidia|Volta|l=arch}} microarchitecture along with [[IBM]]'s {{ibm|POWER9|l=arch}}. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== References == | == References == | ||
− | |||
− | |||
* IEEE HotChips 29 (HC29), 2017 | * IEEE HotChips 29 (HC29), 2017 | ||
* IEEE HotChips 28 (HC28), 2016 | * IEEE HotChips 28 (HC28), 2016 | ||
− | |||
− | |||
[[category:nvidia]] | [[category:nvidia]] | ||
[[Category:interconnect_architectures]] | [[Category:interconnect_architectures]] |