Difference between revisions of "flops"

Revision as of 05:27, 22 September 2018

Floating-point operations per second (FLOPS) is a microprocessor performance unit used to quantify the number of floating-point operations a core, machine, or system is capable of in a one second.

Overview

FLOPS are a measure of performance used for comparing the peak theoretical performance of a core, microprocessor, or system using floating point operations. This unit is often used in the field of high-performance computing (e.g., supercomputers) in order to evaluate the peak theoretical performance of various scientific workloads. Traditionally, the FLOPS of a microprocessor could be calculated using the following equation:

With the advent of multi-socket and multi-core architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:

and,

Modern microprocessors exploit data parallelism further through the introduction of various vector extensions such as x86's AVX and ARM's SVE. With those extensions, it's possible to performance multiple floating-point operations within a single instruction. For example, a typical fused multiply-accumulate (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as

And for a full system, this can be extended to:

FLOPS by microarchitecture

x86

Microarchitecture	FLOPS			ISA
Intel Microarchitectures
Core Penryn Nehalem	EUs	1 × 128-bit Multiplication + 1 × 128-bit Addition		SSE (128-bit)
	DP	4 FLOPS/cycle	2 x FLOPS + 2 × FLOPS
	SP	8 FLOPS/cycle	4 x FLOPS + 4 × FLOPS
Sandy Bridge Ivy Bridge	EUs	1 × 256-bit Multiplication + 1 × 256-bit Addition		AVX (265-bit)
	DP	8 FLOPS/cycle	4 × FLOPS + 4 × FLOPS
	SP	16 FLOPS/cycle	8 × FLOPS + 8 × FLOPS
Haswell Broadwell Skylake Kaby Lake Coffee Lake Whiskey Lake Amber Lake	EUs	2 × 256-bit FMA		AVX2 & FMA (265-bit)
	DP	16 FLOPS/cycle	2 × 8 × FLOPS
	SP	32 FLOPS/cycle	2 × 16 × FLOPS
Skylake (server)	EUs	2 × 512-bit FMA (varies by SKU)		AVX-512 & FMA (512-bit)
	DP	32 FLOPS/cycle	2 × 16 × FLOPS
	SP	64 FLOPS/cycle	2 × 32 × FLOPS
Intel MIC Microarchitectures
Knights Landing	EUs	2 × 512-bit FMA (varies by SKU)		AVX-512 & FMA (512-bit)
	DP	32 FLOPS/cycle	2 × 16 × FLOPS
	SP	64 FLOPS/cycle	2 × 32 × FLOPS
AMD Microarchitectures
K10	EUs	1 × 128-bit Multiplication + 1 × 128-bit Addition		SSE (128-bit)
	DP	4 FLOPS/cycle	2 x FLOPS + 2 × FLOPS
	SP	8 FLOPS/cycle	4 x FLOPS + 4 × FLOPS
Bulldozer Piledriver Steamroller Excavator	EUs	2 × 128-bit FMA (per two cores)		AVX & FMA (128-bit)
	DP	8 FLOPS/cycle	2 x 4 × FLOPS
	SP	16 FLOPS/cycle	2 x 8 × FLOPS
Zen Zen+	EUs	2 × 128-bit FMA		AVX2 & FMA (128-bit)
	DP	8 FLOPS/cycle	2 x 4 × FLOPS
	SP	16 FLOPS/cycle	2 x 8 × FLOPS

ARM

This section is empty; you can help add the missing info by editing this page.

@@ Line 50: / Line 50: @@
 |-
 | rowspan="3" | {{intel|Skylake (server)|l=arch}} || '''EUs''' || colspan="2" | 2 × 512-bit FMA (varies by SKU) || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
+|-
+| '''DP''' || 32 FLOPS/cycle || 2 × 16 × FLOPS
+|-
+| '''SP''' || 64 FLOPS/cycle || 2 × 32 × FLOPS
+|-
+! colspan="5" | [[Intel]] {{intel|MIC}} Microarchitectures
+|-
+| rowspan="3" | {{intel|Knights Landing|l=arch}} || '''EUs''' || colspan="2" | 2 × 512-bit FMA (varies by SKU) || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
 |-
 | '''DP''' || 32 FLOPS/cycle || 2 × 16 × FLOPS

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas

Samsung

Revision as of 05:27, 22 September 2018

Contents

Overview

FLOPS by microarchitecture

x86

ARM

See also