Difference between revisions of "flops"

Revision as of 21:47, 26 April 2021

Floating-point operations per second (FLOPS) is a measure of compute performance used to quantify the number of floating-point operations a core, machine, or system is capable of in a one second.

Overview

FLOPS are a measure of performance used for comparing the peak theoretical performance of a core, microprocessor, or system using floating point operations. This unit is often used in the field of high-performance computing (e.g., supercomputers) in order to evaluate the peak theoretical performance of various scientific workloads. Traditionally, the FLOPS of a microprocessor could be calculated using the following equation:

With the advent of multi-socket and multi-core architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:

and,

Modern microprocessors exploit data parallelism further through the introduction of various vector extensions such as x86's AVX and ARM's SVE. With those extensions, it's possible to perform multiple floating-point operations within a single instruction. For example, a typical fused multiply-accumulate (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as

And for a full system, this can be extended to:

Nomenclature

KiloFLOPS / KFLOPS: 10³ FLOPS
MegaFLOPS / MFLOPS: 10⁶ FLOPS
GigaFLOPS / GFLOPS: 10⁹ FLOPS
TeraFLOPS / TFLOPS: 10¹² FLOPS
PetaFLOPS / PFLOPS: 10¹⁵ FLOPS
ExaFLOPS / EFLOPS: 10¹⁸ FLOPS
ZettaFLOPS / ZFLOPS: 10²¹ FLOPS
YottaFLOPS / YFLOPS: 10²⁴ FLOPS

FLOPs by microarchitecture

x86

Microarchitecture	FLOPs			ISA
Intel Microarchitectures
Core Penryn Nehalem	EUs	1 × 128-bit Multiplication + 1 × 128-bit Addition		SSE (128-bit)
	DP	4 FLOPs/cycle	2 FLOPs + 2 FLOPs
	SP	8 FLOPs/cycle	4 FLOPs + 4 FLOPs
Sandy Bridge Ivy Bridge	EUs	1 × 256-bit Multiplication + 1 × 256-bit Addition		AVX (256-bit)
	DP	8 FLOPs/cycle	4 FLOPs + 4 FLOPs
	SP	16 FLOPs/cycle	8 FLOPs + 8 FLOPs
Haswell Broadwell Skylake Kaby Lake Amber Lake Coffee Lake Whiskey Lake	EUs	2 × 256-bit FMA		AVX2 & FMA (256-bit)
	DP	16 FLOPs/cycle	2 × 8 FLOPs
	SP	32 FLOPs/cycle	2 × 16 FLOPs
Skylake (server)	EUs	2 × 512-bit FMA (varies by SKU)		AVX-512 & FMA (512-bit)
	DP	32 FLOPs/cycle	2 × 16 FLOPs
	SP	64 FLOPs/cycle	2 × 32 FLOPs
Intel MIC Microarchitectures
Knights Landing	EUs	2 × 512-bit FMA (varies by SKU)		AVX-512 & FMA (512-bit)
	DP	32 FLOPs/cycle	2 × 16 FLOPs
	SP	64 FLOPs/cycle	2 × 32 FLOPs
AMD Microarchitectures
K10	EUs	1 × 128-bit Multiplication + 1 × 128-bit Addition		SSE (128-bit)
	DP	4 FLOPs/cycle	2 FLOPs + 2 FLOPs
	SP	8 FLOPs/cycle	4 FLOPs + 4 FLOPs
Bulldozer Piledriver Steamroller Excavator	EUs	2 × 128-bit FMA (per two cores)		AVX & FMA (128-bit)
	DP	8 FLOPs/cycle	2 x 4 FLOPs
	SP	16 FLOPs/cycle	2 x 8 FLOPs
Zen Zen+	EUs	2 × 128-bit FMA		AVX2 & FMA (256-bit)
	DP	8 FLOPs/cycle	2 x 4 FLOPs
	SP	16 FLOPs/cycle	2 x 8 FLOPs
Zen 2 Zen 3	EUs	2 × 256-bit FMA		AVX2 & FMA (256-bit)
	DP	16 FLOPs/cycle	2 x 8 FLOPs
	SP	32 FLOPs/cycle	2 x 16 FLOPs
Centaur Microarchitectures
CHA	EUs	2 × 256-bit FMA		AVX-512 & FMA (512-bit)
	DP	16 FLOPs/cycle	2 x 8 FLOPs
	SP	32 FLOPs/cycle	2 x 16 FLOPs

ARM

Microarchitecture	FLOPs			ISA
ARM Microarchitectures
Cortex-A57	EUs	1 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	4 FLOPs/cycle	4 FLOPs
	SP	8 FLOPs/cycle	8 FLOPs
Cortex-A76 Cortex-A77 Cortex-A78 Neoverse N1	EUs	2 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	8 FLOPs/cycle	2 x 4 FLOPs
	SP	16 FLOPs/cycle	2 x 8 FLOPs
Neoverse N2	EUs	2 × 128-bit FMA		ARMv9 SVE2 (128-bit)
	DP	8 FLOPs/cycle	2 x 4 FLOPs
	SP	16 FLOPs/cycle	2 x 8 FLOPs
Neoverse V1	EUs	2 × 256-bit FMA		ARMv8 SVE (256-bit)
	DP	16 FLOPs/cycle	2 x 8 FLOPs
	SP	32 FLOPs/cycle	2 x 16 FLOPs
AppliedMicro/Ampere Computing Microarchitectures
Storm Shadowcat Skylark	EUs	1 × 64-bit FMA		ARMv8 NEON (128-bit)
	DP	2 FLOPs/cycle	2 FLOPs
	SP	4 FLOPs/cycle	4 FLOPs
Cavium Microarchitectures
Vulcan	EUs	2 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	8 FLOPs/cycle	2 x 4 FLOPs
	SP	16 FLOPs/cycle	2 x 8 FLOPs
Samsung Microarchitectures
M1 M2	EUs	1 × 128-bit FMA + 1 × 128-bit Addition		ARMv8 NEON (128-bit)
	DP	6 FLOPs/cycle	1 x 4 FLOPs + 1 x 2 FLOPs
	SP	12 FLOPs/cycle	1 x 8 FLOPs + 1 x 4 FLOPs
M3	EUs	3 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	12 FLOPs/cycle	3 x 4 FLOPs
	SP	24 FLOPs/cycle	3 x 8 FLOPs
Phytium Microarchitectures
Xiaomi	EUs	1 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	4 FLOPs/cycle	1 x 4 FLOPs
	SP	8 FLOPs/cycle	1 x 8 FLOPs
HiSilicon Microarchitectures
TaiShan v110	EUs	1 × 128-bit FMA		ARMv8 NEON (128-bit)
	DP	4 FLOPs/cycle	1 x 4 FLOPs
	SP	8 FLOPs/cycle	1 x 8 FLOPs

@@ Line 93: / Line 93: @@
 | '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
 |-
-| rowspan="3" | {{amd|Zen 2|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{x86|AVX2}} & FMA (256-bit)
+| rowspan="3" | {{amd|Zen 2|l=arch}}<br>{{amd|Zen 3|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{x86|AVX2}} & FMA (256-bit)
 |-
 | '''DP''' || 16 FLOPs/cycle || 2 x 8 FLOPs

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas

Samsung

Revision as of 21:47, 26 April 2021

Contents

Overview

Nomenclature

FLOPs by microarchitecture

x86

ARM

See also