From WikiChip
Editing nudt/matrix-2000
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
{{nudt title|Matrix-2000}} | {{nudt title|Matrix-2000}} | ||
− | {{ | + | {{mpu |
|name=Matrix-2000 | |name=Matrix-2000 | ||
|image=matrix-2000 (front).png | |image=matrix-2000 (front).png | ||
Line 6: | Line 6: | ||
|designer=NUDT | |designer=NUDT | ||
|model number=Matrix-2000 | |model number=Matrix-2000 | ||
− | |first announced= | + | |first announced=2016 |
|first launched=2017 | |first launched=2017 | ||
|frequency=1,200 MHz | |frequency=1,200 MHz | ||
|technology=CMOS | |technology=CMOS | ||
− | |||
|core count=128 | |core count=128 | ||
|thread count=128 | |thread count=128 | ||
Line 17: | Line 16: | ||
}} | }} | ||
[[File:matrix-2000 (back).png|300px|thumb|right|Matrix-2000 Ceramic LGA package back side.]] | [[File:matrix-2000 (back).png|300px|thumb|right|Matrix-2000 Ceramic LGA package back side.]] | ||
− | '''Matrix-2000''' | + | '''Matrix-2000''' is a [[128-core]] [[many-core processor]] designed by [[NUDT]] and introduced in [[2017]]. This chip was designed exclusively as an accelerator for [[China]]'s [[Tianhe-2]] supercomputer in order to upgrade and replace the aging [[Intel]]'s {{intel|Xeon Phi|Knights Corner}} accelerators after the Obama administration banned the sale of high-performance accelerators to China. The Matrix-2000 features 128 [[RISC]] cores operating at 1.2 GHz achieving 2.46 [[TFLOPS]] with a peak power dissipation of 240 W. |
− | |||
− | |||
== Overview == | == Overview == | ||
Line 28: | Line 25: | ||
<blockquote>Intel was informed in August by the U.S Department of Commerce that an export license was required for the shipment of Xeon and Xeon Phi parts for use in specific previously disclosed supercomputer projects with Chinese customer INSPUR. Intel complied with the notification and applied for the license which was denied. We are in compliance with the U.S. law.</blockquote> | <blockquote>Intel was informed in August by the U.S Department of Commerce that an export license was required for the shipment of Xeon and Xeon Phi parts for use in specific previously disclosed supercomputer projects with Chinese customer INSPUR. Intel complied with the notification and applied for the license which was denied. We are in compliance with the U.S. law.</blockquote> | ||
− | Due to the ban | + | Due to the ban NUDT was unable to obtain the Xeon Phis they've hoped for in order to upgrade the system. To achieve the desired upgrades without the embargoed parts, NUDT developed the Matrix-2000 accelerators. While not nearly as powerful as {{intel|Knights Landing|l=arch}}, the chips were more powerful than the first-generate {{intel|Knights Corner|l=arch}} parts they have replaced. While original (KL) system was planned to exceed 110 [[PFLOPS]] using the Intel parts, the Matrix-2000 managed to achieve 94.97 PFLOPS. |
== Architecture == | == Architecture == | ||
The Matrix-2000 consists 128 [[physical cores|cores]], eight [[DDR4]] memory channels, and x16 PCIe lanes. The chip consists of four supernodes (SN) consisting of 32 cores each operating at 1.2 GHz with a peak power dissipation of 240 Watts. | The Matrix-2000 consists 128 [[physical cores|cores]], eight [[DDR4]] memory channels, and x16 PCIe lanes. The chip consists of four supernodes (SN) consisting of 32 cores each operating at 1.2 GHz with a peak power dissipation of 240 Watts. | ||
− | |||
− | |||
=== NoC === | === NoC === | ||
− | Four SuperNodes make up the chip. Each SN features three Fast Interconnect Transport (FIT) links. FITs are a point-to-point interconnect with a bidirectional bandwidth of 25.6 GB/s per link and a reported round-trip delay of roughly 20 ns. Each FIT includes a cyclic redundancy check (CRC) and | + | Four SuperNodes make up the chip. Each SN features three Fast Interconnect Transport (FIT) links. FITs are a point-to-point interconnect with a bidirectional bandwidth of 25.6 GB/s per link and a reported round-trip delay of roughly 20 ns. Each FIT includes a cyclic redundancy check (CRC) and retry mechanism to ensure correct transmission. Each port is used to connect to each of the other SNs. The Matrix-2000 supports DMA mode in order to improve the FIT link bandwidth utilization with a reported utilization of 93.8% reported in said mode. |
− | |||
− | |||
− | |||
=== SuperNode (SN) === | === SuperNode (SN) === | ||
− | Each SuperNode [[network on a chip]] (NoC) is implemented as a 4 by 2 mesh topology for a total of 8 CPU Clusters. Each cluster | + | Each SuperNode [[network on a chip]] (NoC) is implemented as a 4 by 2 mesh topology for a total of 8 CPU Clusters. Each cluster consist of a router, a directory control unit (DCU), 4 CPU [[physical core|cores]] and a shared cache. Attached to each SuperNode are two DDR4 memory controllers at opposite ends. With 4 cores per node and 8 nodes per SuperNode, there are a total of 32 cores per SN. Compliance to cache coherence is done by the core. |
Routing is done via the router at every one of the clusters. The router has four communication channels: Response, Request, Snoop, and Acknowledgement. Each channel is 128-bit wide. | Routing is done via the router at every one of the clusters. The router has four communication channels: Response, Request, Snoop, and Acknowledgement. Each channel is 128-bit wide. | ||
− | |||
− | |||
− | |||
− | |||
==== Core ==== | ==== Core ==== | ||
Each core is a reduced instruction set computer (RISC) featuring an [[in-order]] [[pipeline]] with 8 to 12 stages. The core incorporates an extended 256-bit vector instruction set architecture along with two 256-bit [[vector processing units]] (VPU). Each core is capable of performing 16 double-precision floating point operations each cycle. | Each core is a reduced instruction set computer (RISC) featuring an [[in-order]] [[pipeline]] with 8 to 12 stages. The core incorporates an extended 256-bit vector instruction set architecture along with two 256-bit [[vector processing units]] (VPU). Each core is capable of performing 16 double-precision floating point operations each cycle. | ||
− | Operating at 1.2 GHz, each core has a peak performance of 19.2 GFLOPs (1.2 GHz * 16 FLOP/cycle). With 32 such cores in each SuperNode, the peak performance of each SN is 614.4 GFLOPS. Likewise, with four SN per chip, the peak chip performance is 2.458 TFLOPS | + | Operating at 1.2 GHz, each core has a peak performance of 19.2 GFLOPs (1.2 GHz * 16 FLOP/cycle). With 32 such cores in each SuperNode, the peak performance of each SN is 614.4 GFLOPS. Likewise, with four SN per chip, the peak chip performance is 2.458 TFLOPS. |
− | |||
− | |||
== Memory controller == | == Memory controller == | ||
− | |||
{{memory controller | {{memory controller | ||
|type=DDR4-2400 | |type=DDR4-2400 | ||
Line 74: | Line 59: | ||
== References == | == References == | ||
* Third International High Performance Computing Forum 2017 (IHPCF2017) | * Third International High Performance Computing Forum 2017 (IHPCF2017) | ||
− | * [http://www.icl.utk.edu/files/publications/2017/icl-utk-970-2017.pdf REPORT ON THE TIANHE-2A SYSTEM], Tech Report No. ICL-UT-17-04, Jack Dongarra, University of Tennessee, | + | * [http://www.icl.utk.edu/files/publications/2017/icl-utk-970-2017.pdf REPORT ON THE TIANHE-2A SYSTEM], Tech Report No. ICL-UT-17-04, Jack Dongarra, University of Tennessee, KnoxvilleOak Ridge National Laboratory, September 24 2017 |
− | |||
− |
Facts about "Matrix-2000 - NUDT"
Has subobject "Has subobject" is a predefined property representing a container construct and is provided by Semantic MediaWiki. | Matrix-2000 - NUDT#package + |
base frequency | 1,200 MHz (1.2 GHz, 1,200,000 kHz) + |
core count | 128 + |
designer | NUDT + |
first announced | 2015 + |
first launched | 2017 + |
full page name | nudt/matrix-2000 + |
has ecc memory support | true + |
instance of | microprocessor + |
ldate | 2017 + |
main image | + |
main image caption | Matrix-2000, package front + |
max memory bandwidth | 143.1 GiB/s (146,534.4 MiB/s, 153.652 GB/s, 153,652.455 MB/s, 0.14 TiB/s, 0.154 TB/s) + |
max memory channels | 8 + |
model number | Matrix-2000 + |
name | Matrix-2000 + |
package | FCCLGA-4201 + |
power dissipation | 240 W (240,000 mW, 0.322 hp, 0.24 kW) + |
supported memory type | DDR4-2400 + |
technology | CMOS + |
thread count | 128 + |
word size | 64 bit (8 octets, 16 nibbles) + |