**Floating-point operations per second** (**FLOPS**) is a measure of compute performance used to quantify the number of floating-point operations a core, machine, or system is capable of in a one second.

## Overview[edit]

FLOPS are a measure of performance used for comparing the peak theoretical performance of a core, microprocessor, or system using floating point operations. This unit is often used in the field of high-performance computing (e.g., supercomputers) in order to evaluate the peak theoretical performance of various scientific workloads. Traditionally, the FLOPS of a microprocessor could be calculated using the following equation:

With the advent of multi-socket and multi-core architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:

and,

Modern microprocessors exploit data parallelism further through the introduction of various vector extensions such as x86's AVX and ARM's SVE. With those extensions, it's possible to perform multiple floating-point operations within a single instruction. For example, a typical fused multiply-accumulate (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as

And for a full system, this can be extended to:

### Nomenclature[edit]

- KiloFLOPS / KFLOPS: 10
^{3}FLOPS - MegaFLOPS / MFLOPS: 10
^{6}FLOPS - GigaFLOPS / GFLOPS: 10
^{9}FLOPS - TeraFLOPS / TFLOPS: 10
^{12}FLOPS - PetaFLOPS / PFLOPS: 10
^{15}FLOPS - ExaFLOPS / EFLOPS: 10
^{18}FLOPS - ZettaFLOPS / ZFLOPS: 10
^{21}FLOPS - YottaFLOPS / YFLOPS: 10
^{24}FLOPS

## FLOPs by microarchitecture[edit]

### x86[edit]

Microarchitecture | FLOPs | ISA | ||
---|---|---|---|---|

Intel Microarchitectures | ||||

Core Penryn Nehalem |
EUs |
1 × 128-bit Multiplication + 1 × 128-bit Addition | SSE (128-bit) | |

DP |
4 FLOPs/cycle | 2 FLOPs + 2 FLOPs | ||

SP |
8 FLOPs/cycle | 4 FLOPs + 4 FLOPs | ||

Sandy Bridge Ivy Bridge |
EUs |
1 × 256-bit Multiplication + 1 × 256-bit Addition | AVX (256-bit) | |

DP |
8 FLOPs/cycle | 4 FLOPs + 4 FLOPs | ||

SP |
16 FLOPs/cycle | 8 FLOPs + 8 FLOPs | ||

Haswell Broadwell Skylake Kaby Lake Amber Lake Coffee Lake Whiskey Lake |
EUs |
2 × 256-bit FMA | AVX2 & FMA (256-bit) | |

DP |
16 FLOPs/cycle | 2 × 8 FLOPs | ||

SP |
32 FLOPs/cycle | 2 × 16 FLOPs | ||

Skylake (server) | EUs |
2 × 512-bit FMA (varies by SKU) | AVX-512 & FMA (512-bit) | |

DP |
32 FLOPs/cycle | 2 × 16 FLOPs | ||

SP |
64 FLOPs/cycle | 2 × 32 FLOPs | ||

Intel MIC Microarchitectures | ||||

Knights Landing | EUs |
2 × 512-bit FMA (varies by SKU) | AVX-512 & FMA (512-bit) | |

DP |
32 FLOPs/cycle | 2 × 16 FLOPs | ||

SP |
64 FLOPs/cycle | 2 × 32 FLOPs | ||

AMD Microarchitectures | ||||

K10 | EUs |
1 × 128-bit Multiplication + 1 × 128-bit Addition | SSE (128-bit) | |

DP |
4 FLOPs/cycle | 2 FLOPs + 2 FLOPs | ||

SP |
8 FLOPs/cycle | 4 FLOPs + 4 FLOPs | ||

Bulldozer Piledriver Steamroller Excavator |
EUs |
2 × 128-bit FMA (per two cores) | AVX & FMA (128-bit) | |

DP |
8 FLOPs/cycle | 2 x 4 FLOPs | ||

SP |
16 FLOPs/cycle | 2 x 8 FLOPs | ||

Zen Zen+ |
EUs |
2 × 128-bit FMA | AVX2 & FMA (256-bit) | |

DP |
8 FLOPs/cycle | 2 x 4 FLOPs | ||

SP |
16 FLOPs/cycle | 2 x 8 FLOPs | ||

Zen 2 | EUs |
2 × 256-bit FMA | AVX2 & FMA (256-bit) | |

DP |
16 FLOPs/cycle | 2 x 8 FLOPs | ||

SP |
32 FLOPs/cycle | 2 x 16 FLOPs | ||

Centaur Microarchitectures | ||||

CHA | EUs |
2 × 256-bit FMA | AVX-512 & FMA (512-bit) | |

DP |
16 FLOPs/cycle | 2 x 8 FLOPs | ||

SP |
32 FLOPs/cycle | 2 x 16 FLOPs |

### ARM[edit]

Microarchitecture | FLOPs | ISA | ||
---|---|---|---|---|

ARM Microarchitectures | ||||

Cortex-A57 | EUs |
1 × 128-bit FMA | ARMv8 (128-bit) | |

DP |
4 FLOPs/cycle | 4 FLOPs | ||

SP |
8 FLOPs/cycle | 8 FLOPs | ||

Cortex-A76 | EUs |
2 × 128-bit FMA | ARMv8 (128-bit) | |

DP |
8 FLOPs/cycle | 2 x 4 FLOPs | ||

SP |
16 FLOPs/cycle | 2 x 8 FLOPs | ||

AppliedMicro/Ampere Computing Microarchitectures | ||||

Storm Shadowcat Skylark |
EUs |
1 × 64-bit FMA | ARMv8 (128-bit) | |

DP |
2 FLOPs/cycle | 2 FLOPs | ||

SP |
4 FLOPs/cycle | 4 FLOPs | ||

Cavium Microarchitectures | ||||

Vulcan | EUs |
2 × 128-bit FMA | ARMv8 (128-bit) | |

DP |
8 FLOPs/cycle | 2 x 4 FLOPs | ||

SP |
16 FLOPs/cycle | 2 x 8 FLOPs | ||

Samsung Microarchitectures | ||||

M1 M2 |
EUs |
1 × 128-bit FMA + 1 × 128-bit Addition | ARMv8 (128-bit) | |

DP |
6 FLOPs/cycle | 1 x 4 FLOPs + 1 x 2 FLOPs | ||

SP |
12 FLOPs/cycle | 1 x 8 FLOPs + 1 x 4 FLOPs | ||

M3 | EUs |
3 × 128-bit FMA | ARMv8 (128-bit) | |

DP |
12 FLOPs/cycle | 3 x 4 FLOPs | ||

SP |
24 FLOPs/cycle | 3 x 8 FLOPs | ||

Phytium Microarchitectures | ||||

Xiaomi | EUs |
1 × 128-bit FMA | ARMv8 (128-bit) | |

DP |
4 FLOPs/cycle | 1 x 4 FLOPs | ||

SP |
8 FLOPs/cycle | 1 x 8 FLOPs | ||

HiSilicon Microarchitectures | ||||

TaiShan v110 | EUs |
1 × 128-bit FMA | ARMv8 (128-bit) | |

DP |
4 FLOPs/cycle | 1 x 4 FLOPs | ||

SP |
8 FLOPs/cycle | 1 x 8 FLOPs |