From WikiChip
Slingshot - Interconnects - Cray
Revision as of 04:45, 8 September 2020 by 31.10.144.190 (talk) (Added link to SC20 paper with architecture description and experimental evaluation)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Slingshot is the sucessor to Aries, Cray's eighth-generation high-performance interconnect.

Architecture

Key changes from Aries

  • Builds on the Ethernet specification
  • Improved QoS
  • New congestion control

This list is incomplete; you can help by expanding it.

Overview

Slingshot is Cray's 8th major generation of high-performance network interconnects. Slingshot is the underlying interconnect of the Cray Shasta system. Slingshot diverges from prior interconnects in a number of ways including the switch to Ethernet. Cray introduced a customized version of Ethernet specifically optimized for high-performance computing (named 'HPC Ethernet'). HPC Ethernet adds special protocol optimizations while allowing intermix standard Ethernet traffic. This allows Slingshot to remain Ethernet-compatible, allowing it to directly connect to third-party Ethernet-based devices such as accelerators and storage devices.

HPC Ethernet

Cray implements a custom variant of Ethernet in Slingshot called HPC Ethernet. This is a proprietary protocol that addresses some of the shortcomings found in the standard ethernet protocol - namely big headers and large packet and it's the inability to scale to large HPC workloads. Slingshot borrows some of the characteristics found in a typical HPC network into Ethernet.

Slingshot makes use of Rosetta, Cray's custom ASIC switch. Rosetta uses the standard Ethernet physical layer in order to support standard Ethernet protocols as well as serve as the baseline for HPC Ethernet. On per-device connected, Rosetta will negotiate the enhanced features found in HPC Ethernet. The intention is to easily allow standard Ethernet-attached devices to work over the Slingshot network while enabling the HPC-optimized traffic to work reliably internally

Switch

See also: Rosetta

The Slingshot switch is a 64-port switch. Each of the 64 ports is a 4-lane 56G PAM4 port (200 Gbps/port). The switch can scale up to around 250,000 endpoints with a diameter of just three switch-to-switch hops. The diameter is fixed, therefore it doesn't grow with the number of nodes. It is Ethernet-compliant but uses the enhanced HPC Ethernet protocol with compatible devices when possible.


cray slingshot switch.png

See also

Bibliography

  • Cray, 2019 IEEE Symposium on High-Performance Interconnects (HOTI).
  • An In-Depth Analysis of the Slingshot Interconnect - Daniele De Sensi, Salvatore Di Girolamo, Kim H. McMahon, Duncan Roweth, Torsten Hoefler - In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20) - https://arxiv.org/abs/2008.08886