

# Ushering in a New Era

Argonne National Laboratory's Aurora System

April 2015



#### ANL Selects Intel for World's Biggest Supercomputer 2-system CORAL award extends IA leadership in extreme scale HPC

# Cori<br/>>30PFTrinity<br/>NNSA†<br/>>40PFArgonne National Laboratory<br/>>1001115Kenset<br/>>30PFJuly'14Argonne National Laboratory<br/>>1001115

Argonne National Laboratory >8.5PF

>\$200M

‡ Cray\* XC\* Series at National Energy Research Scientific Computing Center (NERSC). † Cray XC Series at National Nuclear Security Administration (NNSA).

April '14

**inte** 

## The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery & innovation

## >180 PFLOPS

(option to increase up to 450 PF)

>50,000 nodes 13MW 2018 <sub>delivery</sub> 18X higher performance<sup>†</sup>

>6X more energy efficient<sup>†</sup>

Argonne



**Prime Contractor** 



Subcontractor

Source: Argonne National Laboratory and Intel. <sup>†</sup>Comparison of theoretical peak double precision FLOPS and power consumption to ANL's largest current system, MIRA (10PFs and 4.8MW)

inte

## Aurora | Science From Day One! Extreme performance for a broad range of compute and data-centric workloads



<sup>-</sup>ocus Areas

(in

### Aurora | Built on a Powerful Foundation Breakthrough technologies that deliver massive benefits

Compute

>17X performance<sup>†</sup>

**FLOPS** per node

>12X memory bandwidth<sup>†</sup>

in-package memory bandwidth

Integrated Intel<sup>®</sup> Omni-Path Architecture

>30PB/s aggregate

3<sup>rd</sup> Generation

Intel<sup>®</sup> Xeon Phi™

#### Interconnect

2<sup>nd</sup> Generation Intel<sup>®</sup> Omni-Path Architecture

>20X faster<sup>†</sup>

>500 TB/s bi-section bandwidth

>2.5 PB/s aggregate node link bandwidth

Intel<sup>®</sup> Lustre\*

Software

>3X faster<sup>†</sup> >1 TB/s file system throughput

File System

>5X capacity<sup>†</sup>

>150TB file system capacity

Processor code name: Knights Hill

Source: Argonne National Laboratory and Intel. <sup>†</sup> Comparison to ANL's largest current system, MIRA. See the Aurora Fact Sheet for further details.

intel

inside™

**XEON PHI** 

#### Aurora | Uses New Intel Memory-Storage Hierarchy Keeping data closer to compute -> better data-intensive app performance and energy efficiency



#### Cray: A Strategic Subcontracting Role Working with Intel to create a state of the art system

#### Cray assists Intel by providing:

- Next-generation "Shasta" supercomputer using new technologies from Intel and Cray
- Scalable software stack with new capabilities from Intel and Cray
- Proven system manufacturing capability

Compute

Store

On-site system support

"Cray is honored and proud to be a part of this partnership with Argonne and Intel to build and deliver one of the world's most innovative supercomputers" – Peter Ungaro President and Chief Executive

Analyze

Itel

Officer, Cray

### Implications Beyond Aurora...*HPC is Entering New Era* Current and future Intel innovations aim at overcoming architectural challenges



#### Breaking Down "The Walls"

Memory | I/O | Storage Energy Efficient Performance Space | Resiliency | Unoptimized Software



Fast and Efficient Data Mobility

> Rapidly Growing Big Data Analytics



#### Extending HPC's Reach

Democratization at Every Scale Cloud Access | Exploration of New Parallel Programming Models

8

#### Intel-led Collaboration: Unprecedented Breakthroughs Brings innovations, holistic designs, and the means to deliver the full benefits to users

Users System **Builders** Software Community

Expanding portfolio of game changing technologies in a scalable system design framework

Co-design approach that optimizes for overall workload performance, efficiency and reliability

Thriving, open, enabled, and innovating ecosystem

## Intel's HPC Scalable System Framework A design foundation enabling wide range of highly workload-optimized solutions



Aurora Small clusters through Supercomputers

Compute and Data-Centric Computing

**Standards-Based Programmability** 

Intel<sup>®</sup> Xeon Phi<sup>™</sup> Processors

Intel<sup>®</sup> Ethernet

Intel<sup>®</sup> SSDs Intel<sup>®</sup> Lustre-based Solutions Intel<sup>®</sup> Silicon Photonics Technology

Intel<sup>®</sup> Sofware Tools Intel<sup>®</sup> Cluster Ready Program



# Aurora..... It's one more landmark. It's the next one we have to reach. But the journey does not stop there.





## Legal Disclaimers

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at <a href="https://www-ssl.intel.com/content/www/us/en/high-performance-computing/path-to-aurora.html">https://www-ssl.intel.com/content/www/us/en/high-performance-computing/path-to-aurora.html</a>.

Intel, the Intel logo, Xeon, Intel Xeon Phi and Intel Inside are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries.

\*Other names and brands may be claimed as the property of others.

© 2015 Intel Corporation



#### Aurora Fact Sheet

| System Feature                                                                    | AURORA                                                                                                                       | MIRA (Argonne)                                    |
|-----------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------|
| Peak System Performance                                                           | 180 - 450 PetaFLOP/s                                                                                                         | 10 PetaFLOP/s                                     |
| Processor                                                                         | Future Generation Intel® Xeon Phi™ Processor (Code name: Knights Hill)                                                       | IBM* PowerPC* A2 1600 MHz processor               |
| Number of Nodes                                                                   | >50,000                                                                                                                      | 49,152                                            |
| Compute Platform                                                                  | Intel system based on Cray* Shasta next generation supercomputing platform                                                   | IBM Blue Gene/Q*                                  |
| Aggregate High Bandwidth On-Package<br>Memory, local Memory and Persistent Memory | >7,000 Terabytes                                                                                                             | 768 Terabytes                                     |
| Aggregate High Bandwidth On-Package<br>Memory Bandwidth                           | >30 Petabytes/s                                                                                                              | 2.5 Petabytes/s                                   |
| System Interconnect                                                               | 2 <sup>nd</sup> Generation Intel <sup>®</sup> Omni-Path Architecture with silicon photonics                                  | IBM 5D torus interconnect<br>with VCSEL photonics |
| Interconnect Aggregate Node Link Bandwidth                                        | >2.5 Petabytes/s                                                                                                             | 2 Petabytes/s                                     |
| Interconnect Bisection Bandwidth                                                  | >500 Terabytes/s                                                                                                             | 24 Terabytes/s                                    |
| Interconnect Interface                                                            | Integrated                                                                                                                   | Integrated                                        |
| Burst Buffer Storage                                                              | Intel <sup>®</sup> SSDs, using both 1 <sup>st</sup> and 2 <sup>nd</sup> Generation Intel <sup>®</sup> Omni-Path Architecture | None                                              |
| File System                                                                       | Intel® Lustre* File System                                                                                                   | IBM GPFS* File System                             |
| File System Capacity                                                              | >150 Petabytes                                                                                                               | 26 Petabytes                                      |
| File System Throughput                                                            | >1 Terabyte/s                                                                                                                | 300 Gigabyte/s                                    |
| Intel Architecture (Intel® 64) Compatibility                                      | Yes                                                                                                                          | No                                                |
| Peak Power Consumption                                                            | 13 Megawatts                                                                                                                 | 4.8 Megawatts                                     |
| FLOP/s Per Watt                                                                   | >13 GigaFLOP/s per watt                                                                                                      | >2 GigaFLOP/s per watt                            |
| Delivery Timeline                                                                 | 2018                                                                                                                         | 2012                                              |
| Facility Area for Compute Clusters                                                | ~3,000 sq. ft.                                                                                                               | ~1,536 sq. ft.                                    |
| Other names and brands may be claimed as the property of others                   |                                                                                                                              |                                                   |

#### Aurora's High Performance Software Stack

#### System and Infrastructure: focused on scalability and reliability

- Low-jitter, high scalability Linux environment
- Integrated RAS and system management, with centralized system database
- Lustre\*& distributed file system with efficient user-space I/O offload
- Resource management: Cobalt

#### Communication: optimized for high performance and scalability

Multiple MPI options: MPICH3, Intel<sup>®</sup> MPI, Cray MPI

#### Standards-based Development Environment:

- Compilers: Intel, Cray, and GNU
- Languages: C, C++, Fortran, Coarray Fortran, UPC, Chapel
- Programming Models: MPI, OpenMP\*, SHMEM

#### **Performance libraries:**

- Intel<sup>®</sup> Math Kernel Library
- Cray Scientific & Math Libraries
- BLAS, ScaLAPACK, FFTW, PETSc, Trilinos

#### Application analysis tools:

- Intel<sup>®</sup> Parallel Studio XE
- Cray Performance Analysis Suite
- GDB, Open|SpeedShop, TAU, HPCToolkit, VampirTrace, and Darshan

#### Aurora's High Performance Software Stack



′inte

### Theta System Fact Sheet

| System Feature                             | Theta Details                                                                                                                                                               |  |
|--------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Peak System Performance                    | >8.5 PetaFLOP/s                                                                                                                                                             |  |
| Compute Node CPU                           | Next Generation Intel® Xeon Phi™ processors (Code name: Knights Landing)<br>https://software.intel.com/en-us/articles/what-disclosures-has-intel-made-about-knights-landing |  |
| Compute Node Count                         | >2,500                                                                                                                                                                      |  |
| Compute Platform                           | Intel system based on Cray* XC* supercomputing platform                                                                                                                     |  |
| Compute Node Peak Performance              | >3 TeraFLOP/s per compute node                                                                                                                                              |  |
| Cores Per Node                             | >60 cores with four hardware threads per core                                                                                                                               |  |
| High Bandwidth On-Package Memory           | Up to 16 Gigabytes per compute node                                                                                                                                         |  |
| High Bandwidth On-Package Memory Bandwidth | projected to be 5X the bandwidth of DDR4 DRAM memory, >400 Gigabytes/sec                                                                                                    |  |
| DDR4 Memory                                | 192 Gigabytes using 6 channels per compute node                                                                                                                             |  |
| Lustre* File System                        | 10 Petabytes                                                                                                                                                                |  |
| Lustre* File System throughput             | 210 Gigabytes/s                                                                                                                                                             |  |
| System Interconnect                        | Cray Aries* high speed Dragonfly* topology interconnect                                                                                                                     |  |
| Peak Power Consumption                     | 1.7 Megawatts                                                                                                                                                               |  |
| Delivery Timeline                          | Mid-2016                                                                                                                                                                    |  |
| Programming Environments                   | Intel, Cray, and GNU                                                                                                                                                        |  |
| Programming models                         | MPI + OpenMP                                                                                                                                                                |  |

(intel

## Legal Disclaimers

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at <a href="https://www-ssl.intel.com/content/www/us/en/high-performance-computing/path-to-aurora.html">https://www-ssl.intel.com/content/www/us/en/high-performance-computing/path-to-aurora.html</a>.

Intel, the Intel logo, Xeon, Intel Xeon Phi and Intel Inside are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries.

\*Other names and brands may be claimed as the property of others.

© 2015 Intel Corporation