Difference between revisions of "flops"

Revision as of 06:50, 22 September 2018

Floating-point operations per second (FLOPS) is a microprocessor performance unit used to quantify the number of floating-point operations a core, machine, or system is capable of in a one second.

Overview

FLOPS are a measure of performance used for comparing the peak theoretical performance of a core, microprocessor, or system using floating point operations. This unit is often used in the field of high-performance computing (e.g., supercomputers) in order to evaluate the peak theoretical performance of various scientific workloads. Traditionally, the FLOPS of a microprocessor could be calculated using the following equation:

With the advent of multi-socket and multi-core architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:

and,

Modern microprocessors exploit data parallelism further through the introduction of various vector extensions such as x86's AVX and ARM's SVE. With those extensions, it's possible to performance multiple floating-point operations within a single instruction. For example, a typical fused multiply-accumulate (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as

And for a full system, this can be extended to:

FLOPS by microarchitecture

x86

Microarchitecture	FLOPS			ISA
Intel Microarchitectures
Core Penryn Nehalem	EUs	1 × 128-bit Multiplication + 1 × 128-bit Addition		SSE (128-bit)
	DP	4 FLOPS/cycle	2 FLOPS + 2 FLOPS
	SP	8 FLOPS/cycle	4 FLOPS + 4 FLOPS
Sandy Bridge Ivy Bridge	EUs	1 × 256-bit Multiplication + 1 × 256-bit Addition		AVX (265-bit)
	DP	8 FLOPS/cycle	4 FLOPS + 4 FLOPS
	SP	16 FLOPS/cycle	8 FLOPS + 8 FLOPS
Haswell Broadwell Skylake Kaby Lake Coffee Lake Whiskey Lake Amber Lake	EUs	2 × 256-bit FMA		AVX2 & FMA (265-bit)
	DP	16 FLOPS/cycle	2 × 8 FLOPS
	SP	32 FLOPS/cycle	2 × 16 FLOPS
Skylake (server)	EUs	2 × 512-bit FMA (varies by SKU)		AVX-512 & FMA (512-bit)
	DP	32 FLOPS/cycle	2 × 16 FLOPS
	SP	64 FLOPS/cycle	2 × 32 FLOPS
Intel MIC Microarchitectures
Knights Landing	EUs	2 × 512-bit FMA (varies by SKU)		AVX-512 & FMA (512-bit)
	DP	32 FLOPS/cycle	2 × 16 FLOPS
	SP	64 FLOPS/cycle	2 × 32 FLOPS
AMD Microarchitectures
K10	EUs	1 × 128-bit Multiplication + 1 × 128-bit Addition		SSE (128-bit)
	DP	4 FLOPS/cycle	2 FLOPS + 2 FLOPS
	SP	8 FLOPS/cycle	4 FLOPS + 4 FLOPS
Bulldozer Piledriver Steamroller Excavator	EUs	2 × 128-bit FMA (per two cores)		AVX & FMA (128-bit)
	DP	8 FLOPS/cycle	2 x 4 FLOPS
	SP	16 FLOPS/cycle	2 x 8 FLOPS
Zen Zen+	EUs	2 × 128-bit FMA		AVX2 & FMA (128-bit)
	DP	8 FLOPS/cycle	2 x 4 FLOPS
	SP	16 FLOPS/cycle	2 x 8 FLOPS

ARM

Microarchitecture	FLOPS			ISA
ARM Microarchitectures
Cortex-A57	EUs	1 × 128-bit FMA		ARMv8 (128-bit)
	DP	4 FLOPS/cycle	4 FLOPS
	SP	8 FLOPS/cycle	8 FLOPS
AppliedMicro/Ampere Computing Microarchitectures
Storm Shadowcat Skylark	EUs	1 × 64-bit FMA		ARMv8 (128-bit)
	DP	2 FLOPS/cycle	2 FLOPS
	SP	4 FLOPS/cycle	4 FLOPS
Cavium Microarchitectures
Vulcan	EUs	2 × 128-bit FMA		ARMv8 (128-bit)
	DP	8 FLOPS/cycle	2 x 4 FLOPS
	SP	16 FLOPS/cycle	2 x 8 FLOPS
Samsung Microarchitectures
M1 M2	EUs	1 × 128-bit FMA + 1 × 128-bit Addition		ARMv8 (128-bit)
	DP	6 FLOPS/cycle	1 x 4 FLOPS + 1 x 2 FLOPS
	SP	12 FLOPS/cycle	1 x 8 FLOPS + 1 x 4 FLOPS
M3	EUs	3 × 128-bit FMA		ARMv8 (128-bit)
	DP	12 FLOPS/cycle	3 x 4 FLOPS
	SP	24 FLOPS/cycle	3 x 8 FLOPS

@@ Line 33: / Line 33: @@
 | rowspan="3" | {{intel|Core|l=arch}}<br>{{intel|Penryn|l=arch}}<br>{{intel|Nehalem|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit Multiplication + 1 × 128-bit Addition || rowspan="3" | {{x86|SSE}} (128-bit)
 |-
-| '''DP''' || 4 FLOPS/cycle || 2 x FLOPS + 2 × FLOPS
+| '''DP''' || 4 FLOPS/cycle || 2 FLOPS + 2 FLOPS
 |-
-| '''SP''' || 8 FLOPS/cycle || 4 x FLOPS + 4 × FLOPS
+| '''SP''' || 8 FLOPS/cycle || 4 FLOPS + 4 FLOPS
 |-
 | rowspan="3" | {{intel|Sandy Bridge|l=arch}}<br>{{intel|Ivy Bridge|l=arch}} || '''EUs''' || colspan="2" | 1 × 256-bit Multiplication + 1 × 256-bit Addition || rowspan="3" | {{x86|AVX}} (265-bit)
 |-
-| '''DP''' || 8 FLOPS/cycle || 4 × FLOPS + 4 × FLOPS
+| '''DP''' || 8 FLOPS/cycle || 4 FLOPS + 4 FLOPS
 |-
-| '''SP''' || 16 FLOPS/cycle || 8 × FLOPS + 8 × FLOPS
+| '''SP''' || 16 FLOPS/cycle || 8 FLOPS + 8 FLOPS
 |-
 | rowspan="3" | {{intel|Haswell|l=arch}}<br>{{intel|Broadwell|l=arch}}<br>{{intel|Skylake|l=arch}}<br>{{intel|Kaby Lake|l=arch}}<br>{{intel|Coffee Lake|l=arch}}<br>{{intel|Whiskey Lake|l=arch}}<br>{{intel|Amber Lake|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{x86|AVX2}} & FMA (265-bit)
 |-
-| '''DP''' || 16 FLOPS/cycle || 2 × 8 × FLOPS
+| '''DP''' || 16 FLOPS/cycle || 2 × 8 FLOPS
 |-
-| '''SP''' || 32 FLOPS/cycle || 2 × 16 × FLOPS
+| '''SP''' || 32 FLOPS/cycle || 2 × 16 FLOPS
 |-
 | rowspan="3" | {{intel|Skylake (server)|l=arch}} || '''EUs''' || colspan="2" | 2 × 512-bit FMA (varies by SKU) || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
 |-
-| '''DP''' || 32 FLOPS/cycle || 2 × 16 × FLOPS
+| '''DP''' || 32 FLOPS/cycle || 2 × 16 FLOPS
 |-
-| '''SP''' || 64 FLOPS/cycle || 2 × 32 × FLOPS
+| '''SP''' || 64 FLOPS/cycle || 2 × 32 FLOPS
 |-
 ! colspan="5" | [[Intel]] {{intel|MIC}} Microarchitectures
@@ Line 59: / Line 59: @@
 | rowspan="3" | {{intel|Knights Landing|l=arch}} || '''EUs''' || colspan="2" | 2 × 512-bit FMA (varies by SKU) || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
 |-
-| '''DP''' || 32 FLOPS/cycle || 2 × 16 × FLOPS
+| '''DP''' || 32 FLOPS/cycle || 2 × 16 FLOPS
 |-
-| '''SP''' || 64 FLOPS/cycle || 2 × 32 × FLOPS
+| '''SP''' || 64 FLOPS/cycle || 2 × 32 FLOPS
 |-
 ! colspan="5" | [[AMD]] Microarchitectures
@@ Line 67: / Line 67: @@
 | rowspan="3" | {{amd|K10|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit Multiplication + 1 × 128-bit Addition || rowspan="3" | {{x86|SSE}} (128-bit)
 |-
-| '''DP''' || 4 FLOPS/cycle || 2 x FLOPS + 2 × FLOPS
+| '''DP''' || 4 FLOPS/cycle || 2 FLOPS + 2 FLOPS
 |-
-| '''SP''' || 8 FLOPS/cycle || 4 x FLOPS + 4 × FLOPS
+| '''SP''' || 8 FLOPS/cycle || 4 FLOPS + 4 FLOPS
 |-
 | rowspan="3" | {{amd|Bulldozer|l=arch}}<br>{{amd|Piledriver|l=arch}}<br>{{amd|Steamroller|l=arch}}<br>{{amd|Excavator|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA (per two cores) || rowspan="3" | {{x86|AVX}} & {{x86|FMA4|FMA}} (128-bit)
 |-
-| '''DP''' || 8 FLOPS/cycle || 2 x 4 × FLOPS
+| '''DP''' || 8 FLOPS/cycle || 2 x 4 FLOPS
 |-
-| '''SP''' || 16 FLOPS/cycle || 2 x 8 × FLOPS
+| '''SP''' || 16 FLOPS/cycle || 2 x 8 FLOPS
 |-
 | rowspan="3" | {{amd|Zen|l=arch}}<br>{{amd|Zen+|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA || rowspan="3" | {{x86|AVX2}} & FMA (128-bit)
 |-
-| '''DP''' || 8 FLOPS/cycle || 2 x 4 × FLOPS
+| '''DP''' || 8 FLOPS/cycle || 2 x 4 FLOPS
 |-
-| '''SP''' || 16 FLOPS/cycle || 2 x 8 × FLOPS
+| '''SP''' || 16 FLOPS/cycle || 2 x 8 FLOPS
 |}
 === ARM ===
-{{empty section}}
+{| class="wikitable"
+|-
+! Microarchitecture !! colspan="3" | FLOPS !! ISA
+|-
+! colspan="5" | [[ARM]] Microarchitectures
+|-
+| rowspan="3" | {{armh|Cortex-A57|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} (128-bit)
+|-
+| '''DP''' || 4 FLOPS/cycle || 4 FLOPS
+|-
+| '''SP''' || 8 FLOPS/cycle || 8 FLOPS
+|-
+! colspan="5" | [[AppliedMicro]]/[[Ampere Computing]] Microarchitectures
+|-
+| rowspan="3" | {{apm|Storm|l=arch}}<br>{{arm|Shadowcat|l=arch}}<br>{{arm|Skylark|l=arch}} || '''EUs''' || colspan="2" | 1 × 64-bit FMA || rowspan="3" | {{arm|ARMv8}} (128-bit)
+|-
+| '''DP''' || 2 FLOPS/cycle || 2 FLOPS
+|-
+| '''SP''' || 4 FLOPS/cycle || 4 FLOPS
+|-
+! colspan="5" | [[Cavium]] Microarchitectures
+|-
+| rowspan="3" | {{cavium|Vulcan|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} (128-bit)
+|-
+| '''DP''' || 8 FLOPS/cycle || 2 x 4 FLOPS
+|-
+| '''SP''' || 16 FLOPS/cycle || 2 x 8 FLOPS
+|-
+! colspan="5" | [[Samsung]] Microarchitectures
+|-
+| rowspan="3" | {{samsung|M1|l=arch}}<br>{{samsung|M2|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit FMA + 1 × 128-bit Addition || rowspan="3" | {{arm|ARMv8}} (128-bit)
+|-
+| '''DP''' || 6 FLOPS/cycle || 1 x 4 FLOPS + 1 x 2 FLOPS
+|-
+| '''SP''' || 12 FLOPS/cycle || 1 x 8 FLOPS + 1 x 4 FLOPS
+|-
+| rowspan="3" | {{samsung|M3|l=arch}} || '''EUs''' || colspan="2" | 3 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} (128-bit)
+|-
+| '''DP''' || 12 FLOPS/cycle || 3 x 4 FLOPS
+|-
+| '''SP''' || 24 FLOPS/cycle || 3 x 8 FLOPS
+|}
 == See also ==

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas

Samsung

Revision as of 06:50, 22 September 2018

Contents

Overview

FLOPS by microarchitecture

x86

ARM

See also