Difference between revisions of "flops"

Latest revision as of 15:03, 25 January 2023

Floating-point operations per second (FLOPS) is a measure of compute performance used to quantify the number of floating-point operations a core, machine, or system is capable of in a one second.

Overview[edit]

FLOPS are a measure of performance used for comparing the peak theoretical performance of a core, microprocessor, or system using floating point operations. This unit is often used in the field of high-performance computing (e.g., supercomputers) in order to evaluate the peak theoretical performance of various scientific workloads. Traditionally, the FLOPS of a microprocessor could be calculated using the following equation:

With the advent of multi-socket and multi-core architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:

and,

Modern microprocessors exploit data parallelism further through the introduction of various vector extensions such as x86's AVX and ARM's SVE. With those extensions, it's possible to perform multiple floating-point operations within a single instruction. For example, a typical fused multiply-accumulate (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as

And for a full system, this can be extended to:

Nomenclature[edit]

KiloFLOPS / KFLOPS: 10³ FLOPS
MegaFLOPS / MFLOPS: 10⁶ FLOPS
GigaFLOPS / GFLOPS: 10⁹ FLOPS
TeraFLOPS / TFLOPS: 10¹² FLOPS
PetaFLOPS / PFLOPS: 10¹⁵ FLOPS
ExaFLOPS / EFLOPS: 10¹⁸ FLOPS
ZettaFLOPS / ZFLOPS: 10²¹ FLOPS
YottaFLOPS / YFLOPS: 10²⁴ FLOPS

FLOPs by microarchitecture[edit]

x86[edit]

Microarchitecture	FLOPs			ISA
Intel Microarchitectures
Core Penryn Nehalem	EUs	1 × 128-bit Multiplication + 1 × 128-bit Addition		SSE (128-bit)
	DP	4 FLOPs/cycle	2 FLOPs + 2 FLOPs
	SP	8 FLOPs/cycle	4 FLOPs + 4 FLOPs
Sandy Bridge Ivy Bridge	EUs	1 × 256-bit Multiplication + 1 × 256-bit Addition		AVX (256-bit)
	DP	8 FLOPs/cycle	4 FLOPs + 4 FLOPs
	SP	16 FLOPs/cycle	8 FLOPs + 8 FLOPs
Haswell Broadwell Skylake Kaby Lake Amber Lake Coffee Lake Whiskey Lake	EUs	2 × 256-bit FMA		AVX2 & FMA (256-bit)
	DP	16 FLOPs/cycle	2 × 8 FLOPs
	SP	32 FLOPs/cycle	2 × 16 FLOPs
Skylake (server)	EUs	2 × 512-bit FMA (varies by SKU)		AVX-512 & FMA (512-bit)
	DP	32 FLOPs/cycle	2 × 16 FLOPs
	SP	64 FLOPs/cycle	2 × 32 FLOPs
Rocket Lake Ice Lake Tiger Lake	EUs	2 × 512-bit FMA		AVX-512 & FMA (512-bit)
	DP	32 FLOPs/cycle	2 × 16 FLOPs
	SP	64 FLOPs/cycle	2 × 32 FLOPs
Intel MIC Microarchitectures
Knights Landing	EUs	2 × 512-bit FMA		AVX-512 & FMA (512-bit)
	DP	32 FLOPs/cycle	2 × 16 FLOPs
	SP	64 FLOPs/cycle	2 × 32 FLOPs
AMD Microarchitectures
K10	EUs	1 × 128-bit Multiplication + 1 × 128-bit Addition		SSE (128-bit)
	DP	4 FLOPs/cycle	2 FLOPs + 2 FLOPs
	SP	8 FLOPs/cycle	4 FLOPs + 4 FLOPs
Bulldozer Piledriver Steamroller Excavator	EUs	2 × 128-bit FMA (per two cores)		AVX & FMA (128-bit)
	DP	8 FLOPs/cycle	2 x 4 FLOPs
	SP	16 FLOPs/cycle	2 x 8 FLOPs
Zen Zen+	EUs	2 × 128-bit FMA		AVX2 & FMA (256-bit)
	DP	8 FLOPs/cycle	2 x 4 FLOPs
	SP	16 FLOPs/cycle	2 x 8 FLOPs
Zen 2 Zen 3	EUs	2 × 256-bit FMA		AVX2 & FMA (256-bit)
	DP	16 FLOPs/cycle	2 x 8 FLOPs
	SP	32 FLOPs/cycle	2 x 16 FLOPs
Centaur Microarchitectures
CHA	EUs	2 × 256-bit FMA		AVX-512 & FMA (512-bit)
	DP	16 FLOPs/cycle	2 x 8 FLOPs
	SP	32 FLOPs/cycle	2 x 16 FLOPs

ARM[edit]

Microarchitecture	FLOPs			ISA
ARM Microarchitectures
Cortex-A57	EUs	1 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	4 FLOPs/cycle	4 FLOPs
	SP	8 FLOPs/cycle	8 FLOPs
Cortex-A76 Cortex-A77 Cortex-A78 Neoverse N1	EUs	2 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	8 FLOPs/cycle	2 x 4 FLOPs
	SP	16 FLOPs/cycle	2 x 8 FLOPs
Neoverse N2	EUs	2 × 128-bit FMA		ARMv9 SVE2 (128-bit)
	DP	8 FLOPs/cycle	2 x 4 FLOPs
	SP	16 FLOPs/cycle	2 x 8 FLOPs
Neoverse V1	EUs	2 × 256-bit FMA		ARMv8 SVE (256-bit)
	DP	16 FLOPs/cycle	2 x 8 FLOPs
	SP	32 FLOPs/cycle	2 x 16 FLOPs
Cortex-A510	EUs	1-2 × 128-bit FMA		ARMv9 SVE2 (128-bit)
	DP	2-4 FLOPs/cycle	2-4 FLOPs
	SP	4-8 FLOPs/cycle	4-8 FLOPs
AppliedMicro/Ampere Computing Microarchitectures
Storm Shadowcat Skylark	EUs	1 × 64-bit FMA		ARMv8 NEON (128-bit)
	DP	2 FLOPs/cycle	2 FLOPs
	SP	4 FLOPs/cycle	4 FLOPs
Cavium Microarchitectures
Vulcan	EUs	2 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	8 FLOPs/cycle	2 x 4 FLOPs
	SP	16 FLOPs/cycle	2 x 8 FLOPs
Samsung Microarchitectures
M1 M2	EUs	1 × 128-bit FMA + 1 × 128-bit Addition		ARMv8 NEON (128-bit)
	DP	6 FLOPs/cycle	1 x 4 FLOPs + 1 x 2 FLOPs
	SP	12 FLOPs/cycle	1 x 8 FLOPs + 1 x 4 FLOPs
M3	EUs	3 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	12 FLOPs/cycle	3 x 4 FLOPs
	SP	24 FLOPs/cycle	3 x 8 FLOPs
Phytium Microarchitectures
Xiaomi	EUs	1 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	4 FLOPs/cycle	1 x 4 FLOPs
	SP	8 FLOPs/cycle	1 x 8 FLOPs
HiSilicon Microarchitectures
TaiShan v110	EUs	1 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	4 FLOPs/cycle	1 x 4 FLOPs
	SP	8 FLOPs/cycle	1 x 8 FLOPs

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas