From WikiChip
Editing nec/microarchitectures/sx-aurora
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 80: | Line 80: | ||
== Overview == | == Overview == | ||
− | [[File:sx-aurora overview.svg|thumb|right|350px|Overview of the SX-Aurora | + | [[File:sx-aurora overview.svg|thumb|right|350px|Overview of the SX-Aurora]] |
The SX-Aurora is [[NEC]]'s successor to the {{\\|SX-ACE}}, a [[vector processor]] designed for [[high-performance]] scientific/research applications and supercomputers. The SX-Aurora deviates from all prior chips in the kind of markets it's designed to address. Therefore, NEC made slightly different design choice compared to prior generations of vector processors. In an attempt to broaden their market, NEC extended beyond supercomputers to the conventional server and workstation market. This is done through the use of [[PCIe]]-based [[accelerator cards]]. | The SX-Aurora is [[NEC]]'s successor to the {{\\|SX-ACE}}, a [[vector processor]] designed for [[high-performance]] scientific/research applications and supercomputers. The SX-Aurora deviates from all prior chips in the kind of markets it's designed to address. Therefore, NEC made slightly different design choice compared to prior generations of vector processors. In an attempt to broaden their market, NEC extended beyond supercomputers to the conventional server and workstation market. This is done through the use of [[PCIe]]-based [[accelerator cards]]. | ||
Line 118: | Line 118: | ||
Maintaining a high [[bytes Per FLOP]] is important for vector operations that rely on large data sets. With over five times the FLOPS per core, the SX-Aurora had to significantly improve the memory subsystem to prevent workloads from memory bottlenecking, thereby preventing them from reaching the peak compute power of the chip. The {{\\|SX-Ace}} reached 256 GB/s of [[memory bandwidth]] using a whopping 16 channels of DDR3 memory. It becomes impossible to increase this further to bring a sufficiently large improvement in bandwidth. For this reason, NEC opted to use [[HBM2]] memory instead. The SX-Aurora has six HBM2 modules delivering 1.22 TB/s of bandwidth, nearly 5-fold improvement over the {{\\|SX-Ace}}. However, despite the large memory bandwidth improvement, the SX-Aurora achieves 0.5 [[bytes/FLOPs]] which is half of the {{\\|SX-Ace}}. | Maintaining a high [[bytes Per FLOP]] is important for vector operations that rely on large data sets. With over five times the FLOPS per core, the SX-Aurora had to significantly improve the memory subsystem to prevent workloads from memory bottlenecking, thereby preventing them from reaching the peak compute power of the chip. The {{\\|SX-Ace}} reached 256 GB/s of [[memory bandwidth]] using a whopping 16 channels of DDR3 memory. It becomes impossible to increase this further to bring a sufficiently large improvement in bandwidth. For this reason, NEC opted to use [[HBM2]] memory instead. The SX-Aurora has six HBM2 modules delivering 1.22 TB/s of bandwidth, nearly 5-fold improvement over the {{\\|SX-Ace}}. However, despite the large memory bandwidth improvement, the SX-Aurora achieves 0.5 [[bytes/FLOPs]] which is half of the {{\\|SX-Ace}}. | ||
− | + | The caches are sliced into eight 2 MiB chunks which consist of 16 [[memory banks]] each. The [[last level cache|LLC]] interfaces with the IMC at 200 GB/s per chunk (1600 TB/s in total) and those provide a memory bandwidth of 1.22 TB/s through it's 6 HBM2 modules. | |
− | + | The SX-Aurora features a 16-layer 2D mesh network. With the {{\\|SX-Ace}}, replies to the cores were slightly unbalanced. In the SX-Aurora, the requests crossbars and the reply crossbars, each, have the same bandwidth. With each request being 16B, this works out to around 410 GB/s for each crossbar or 820 GB/s per core. Coming from the cache size, there is a total of 3.05 GB/s of available bandwidth distributed across all eight LLC chunks. | |
− | + | :[[File:sx-aurora memory subsystem.svg|thumb|700px|center|SX-Aurora Memory Subsystem.]] | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Mesh interconnect == | == Mesh interconnect == | ||
Line 137: | Line 129: | ||
− | [[File:sx-aurora 2d 16-layer mesh.svg|900px|center]] | + | :[[File:sx-aurora 2d 16-layer mesh.svg|thumb|900px|center|SX-Aurora utilizes a 16-layer 2D mesh.]] |
== NUMA Mode == | == NUMA Mode == |
Facts about "SX-Aurora - Microarchitectures - NEC"
codename | SX-Aurora + |
core count | 8 + |
designer | NEC + |
first launched | 2018 + |
full page name | nec/microarchitectures/sx-aurora + |
instance of | microarchitecture + |
manufacturer | TSMC + |
name | SX-Aurora + |
pipeline stages | 8 + |