Difference between revisions of "flops"

Revision as of 03:45, 22 September 2018

Floating-point operations per second (FLOPS) is a microprocessor performance unit used to quantify the number of floating-point operations a core, machine, or system is capable of in a one second.

Overview

FLOPS are a measure of performance used for comparing the peak theoretical performance of a core, microprocessor, or system using floating point operations. This unit is often used in the field of high-performance computing (e.g., supercomputers) in order to evaluate the peak theoretical performance of various scientific workloads. Traditionally, the FLOPS of a microprocessor could be calculated using the following equation:

With the advent of multi-socket and multi-core architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:

and,

Modern microprocessors exploit data parallelism further through the introduction of various vector extensions such as x86's AVX and ARM's SVE. With those extensions, it's possible to performance multiple floating-point operations within a single instruction. For example, a typical fused multiply-accumulate (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as

And for a full system, this can be extended to:

@@ Line 8: / Line 8: @@
 With the advent of [[multi-socket]] and [[multi-core]] architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:
+:<math>\text{FLOPS}_\text{node} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}}</math>
+and,
 :<math>\text{FLOPS}_\text{system} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>
+Modern microprocessors exploit [[data parallelism]] further through the introduction of various vector extensions such as [[x86]]'s {{x86|AVX}} and [[ARM]]'s {{arm|SVE}}. With those extensions, it's possible to performance multiple floating-point operations within a single instruction. For example, a typical [[fused multiply-accumulate]] (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as
+:<math>\text{FLOPS}_\text{core} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}}</math>
+And for a full system, this can be extended to:
+:<math>\text{FLOPS}_\text{system} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas

Samsung

Revision as of 03:45, 22 September 2018

Overview