From WikiChip
Difference between revisions of "flops"

(a note on SMP and MP)
(extended this to multi-socket and multi-core as well as vectors)
Line 8: Line 8:
  
 
With the advent of [[multi-socket]] and [[multi-core]] architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:
 
With the advent of [[multi-socket]] and [[multi-core]] architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:
 +
 +
:<math>\text{FLOPS}_\text{node} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}}</math>
 +
 +
and,
  
 
:<math>\text{FLOPS}_\text{system} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>
 
:<math>\text{FLOPS}_\text{system} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>
 +
 +
Modern microprocessors exploit [[data parallelism]] further through the introduction of various vector extensions such as [[x86]]'s {{x86|AVX}} and [[ARM]]'s {{arm|SVE}}. With those extensions, it's possible to performance multiple floating-point operations within a single instruction. For example, a typical [[fused multiply-accumulate]] (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as
 +
 +
:<math>\text{FLOPS}_\text{core} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}}</math>
 +
 +
And for a full system, this can be extended to:
 +
 +
:<math>\text{FLOPS}_\text{system} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>

Revision as of 03:45, 22 September 2018

Floating-point operations per second (FLOPS) is a microprocessor performance unit used to quantify the number of floating-point operations a core, machine, or system is capable of in a one second.

Overview

FLOPS are a measure of performance used for comparing the peak theoretical performance of a core, microprocessor, or system using floating point operations. This unit is often used in the field of high-performance computing (e.g., supercomputers) in order to evaluate the peak theoretical performance of various scientific workloads. Traditionally, the FLOPS of a microprocessor could be calculated using the following equation:

Equation FLOPS Subscript core Baseline equals StartFraction FLOPs Over cycle EndFraction times StartFraction cycles Over second EndFraction

With the advent of multi-socket and multi-core architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:

Equation FLOPS Subscript node Baseline equals StartFraction FLOPs Over cycle EndFraction times StartFraction cycles Over second EndFraction times StartFraction cores Over node EndFraction

and,

Equation FLOPS Subscript system Baseline equals StartFraction FLOPs Over cycle EndFraction times StartFraction cycles Over second EndFraction times StartFraction cores Over node EndFraction times StartFraction nodes Over system EndFraction

Modern microprocessors exploit data parallelism further through the introduction of various vector extensions such as x86's AVX and ARM's SVE. With those extensions, it's possible to performance multiple floating-point operations within a single instruction. For example, a typical fused multiply-accumulate (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as

Equation FLOPS Subscript core Baseline equals StartFraction instructions Over cycle EndFraction times StartFraction operations Over instruction EndFraction times StartFraction FLOPs Over operation EndFraction times StartFraction cycles Over second EndFraction

And for a full system, this can be extended to:

Equation FLOPS Subscript system Baseline equals StartFraction instructions Over cycle EndFraction times StartFraction operations Over instruction EndFraction times StartFraction FLOPs Over operation EndFraction times StartFraction cycles Over second EndFraction times StartFraction cores Over node EndFraction times StartFraction nodes Over system EndFraction