Floating-Point Operations Per Second (FLOPS)
(Redirected from FLOPs)

Floating-point operations per second (FLOPS) is a measure of compute performance used to quantify the number of floating-point operations a core, machine, or system is capable of in a one second.

## Overview

FLOPS are a measure of performance used for comparing the peak theoretical performance of a core, microprocessor, or system using floating point operations. This unit is often used in the field of high-performance computing (e.g., supercomputers) in order to evaluate the peak theoretical performance of various scientific workloads. Traditionally, the FLOPS of a microprocessor could be calculated using the following equation:

With the advent of multi-socket and multi-core architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:

and,

Modern microprocessors exploit data parallelism further through the introduction of various vector extensions such as x86's AVX and ARM's SVE. With those extensions, it's possible to perform multiple floating-point operations within a single instruction. For example, a typical fused multiply-accumulate (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as

And for a full system, this can be extended to:

### Nomenclature

• KiloFLOPS / KFLOPS: 103 FLOPS
• MegaFLOPS / MFLOPS: 106 FLOPS
• GigaFLOPS / GFLOPS: 109 FLOPS
• TeraFLOPS / TFLOPS: 1012 FLOPS
• PetaFLOPS / PFLOPS: 1015 FLOPS
• ExaFLOPS / EFLOPS: 1018 FLOPS
• ZettaFLOPS / ZFLOPS: 1021 FLOPS
• YottaFLOPS / YFLOPS: 1024 FLOPS

## FLOPs by microarchitecture

### x86

Microarchitecture FLOPs ISA
Intel Microarchitectures
Core
Penryn
Nehalem
EUs 1 × 128-bit Multiplication + 1 × 128-bit Addition SSE (128-bit)
DP 4 FLOPs/cycle 2 FLOPs + 2 FLOPs
SP 8 FLOPs/cycle 4 FLOPs + 4 FLOPs
Sandy Bridge
Ivy Bridge
EUs 1 × 256-bit Multiplication + 1 × 256-bit Addition AVX (256-bit)
DP 8 FLOPs/cycle 4 FLOPs + 4 FLOPs
SP 16 FLOPs/cycle 8 FLOPs + 8 FLOPs
Haswell
Skylake
Kaby Lake
Amber Lake
Coffee Lake
Whiskey Lake
EUs 2 × 256-bit FMA AVX2 & FMA (256-bit)
DP 16 FLOPs/cycle 2 × 8 FLOPs
SP 32 FLOPs/cycle 2 × 16 FLOPs
Skylake (server) EUs 2 × 512-bit FMA (varies by SKU) AVX-512 & FMA (512-bit)
DP 32 FLOPs/cycle 2 × 16 FLOPs
SP 64 FLOPs/cycle 2 × 32 FLOPs
Intel MIC Microarchitectures
Knights Landing EUs 2 × 512-bit FMA (varies by SKU) AVX-512 & FMA (512-bit)
DP 32 FLOPs/cycle 2 × 16 FLOPs
SP 64 FLOPs/cycle 2 × 32 FLOPs
AMD Microarchitectures
K10 EUs 1 × 128-bit Multiplication + 1 × 128-bit Addition SSE (128-bit)
DP 4 FLOPs/cycle 2 FLOPs + 2 FLOPs
SP 8 FLOPs/cycle 4 FLOPs + 4 FLOPs
Bulldozer
Piledriver
Steamroller
Excavator
EUs 2 × 128-bit FMA (per two cores) AVX & FMA (128-bit)
DP 8 FLOPs/cycle 2 x 4 FLOPs
SP 16 FLOPs/cycle 2 x 8 FLOPs
Zen
Zen+
EUs 2 × 128-bit FMA AVX2 & FMA (256-bit)
DP 8 FLOPs/cycle 2 x 4 FLOPs
SP 16 FLOPs/cycle 2 x 8 FLOPs
Zen 2 EUs 2 × 256-bit FMA AVX2 & FMA (256-bit)
DP 16 FLOPs/cycle 2 x 8 FLOPs
SP 32 FLOPs/cycle 2 x 16 FLOPs
Centaur Microarchitectures
CHA EUs 2 × 256-bit FMA AVX-512 & FMA (512-bit)
DP 16 FLOPs/cycle 2 x 8 FLOPs
SP 32 FLOPs/cycle 2 x 16 FLOPs

### ARM

Microarchitecture FLOPs ISA
ARM Microarchitectures
Cortex-A57 EUs 1 × 128-bit FMA ARMv8 (128-bit)
DP 4 FLOPs/cycle 4 FLOPs
SP 8 FLOPs/cycle 8 FLOPs
Cortex-A76 EUs 2 × 128-bit FMA ARMv8 (128-bit)
DP 8 FLOPs/cycle 2 x 4 FLOPs
SP 16 FLOPs/cycle 2 x 8 FLOPs
AppliedMicro/Ampere Computing Microarchitectures
Storm
Skylark
EUs 1 × 64-bit FMA ARMv8 (128-bit)
DP 2 FLOPs/cycle 2 FLOPs
SP 4 FLOPs/cycle 4 FLOPs
Cavium Microarchitectures
Vulcan EUs 2 × 128-bit FMA ARMv8 (128-bit)
DP 8 FLOPs/cycle 2 x 4 FLOPs
SP 16 FLOPs/cycle 2 x 8 FLOPs
Samsung Microarchitectures
M1
M2
EUs 1 × 128-bit FMA + 1 × 128-bit Addition ARMv8 (128-bit)
DP 6 FLOPs/cycle 1 x 4 FLOPs + 1 x 2 FLOPs
SP 12 FLOPs/cycle 1 x 8 FLOPs + 1 x 4 FLOPs
M3 EUs 3 × 128-bit FMA ARMv8 (128-bit)
DP 12 FLOPs/cycle 3 x 4 FLOPs
SP 24 FLOPs/cycle 3 x 8 FLOPs
Phytium Microarchitectures
Xiaomi EUs 1 × 128-bit FMA ARMv8 (128-bit)
DP 4 FLOPs/cycle 1 x 4 FLOPs
SP 8 FLOPs/cycle 1 x 8 FLOPs
HiSilicon Microarchitectures
TaiShan v110 EUs 1 × 128-bit FMA ARMv8 (128-bit)
DP 4 FLOPs/cycle 1 x 4 FLOPs
SP 8 FLOPs/cycle 1 x 8 FLOPs