Editing flops

{{title|Floating-Point Operations Per Second (FLOPS)}}
'''Floating-point operations per second''' ('''FLOPS''') is a microprocessor performance unit used to quantify the number of [[floating-point]] [[floating-point operations|operations]] a [[physical core|core]], machine, or system is capable of in a one second.

== Overview ==
FLOPS are a measure of performance used for comparing the peak theoretical performance of a [[physical core|core]], [[microprocessor]], or system using [[floating point]] [[floating-point operations|operations]]. This unit is often used in the field of [[high-performance computing]] (e.g., [[supercomputers]]) in order to evaluate the peak theoretical performance of various scientific workloads. Traditionally, the FLOPS of a microprocessor could be calculated using the following equation:

:<math>\text{FLOPS}_\text{core} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}}</math>

With the advent of [[multi-socket]] and [[multi-core]] architectures, additional levels of explicit parallelism have been introduced resulting in the following modified equation:

:<math>\text{FLOPS}_\text{node} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}}</math>

and,

:<math>\text{FLOPS}_\text{system} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>

Modern microprocessors exploit [[data parallelism]] further through the introduction of various vector extensions such as [[x86]]'s {{x86|AVX}} and [[ARM]]'s {{arm|SVE}}. With those extensions, it's possible to performance multiple floating-point operations within a single instruction. For example, a typical [[fused multiply-accumulate]] (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as

:<math>\text{FLOPS}_\text{core} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}}</math>

And for a full system, this can be extended to:

:<math>\text{FLOPS}_\text{system} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>
@@ Line 1: / Line 1: @@
 {{title|Floating-Point Operations Per Second (FLOPS)}}
-'''Floating-point operations per second''' ('''FLOPS''') is a measure of [[compute performance]] used to quantify the number of [[floating-point]] [[floating-point operations|operations]] a [[physical core|core]], machine, or system is capable of in a one second.
+'''Floating-point operations per second''' ('''FLOPS''') is a microprocessor performance unit used to quantify the number of [[floating-point]] [[floating-point operations|operations]] a [[physical core|core]], machine, or system is capable of in a one second.
 == Overview ==
@@ Line 15: / Line 15: @@
 :<math>\text{FLOPS}_\text{system} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>
-Modern microprocessors exploit [[data parallelism]] further through the introduction of various vector extensions such as [[x86]]'s {{x86|AVX}} and [[ARM]]'s {{arm|SVE}}. With those extensions, it's possible to perform multiple floating-point operations within a single instruction. For example, a typical [[fused multiply-accumulate]] (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as
+Modern microprocessors exploit [[data parallelism]] further through the introduction of various vector extensions such as [[x86]]'s {{x86|AVX}} and [[ARM]]'s {{arm|SVE}}. With those extensions, it's possible to performance multiple floating-point operations within a single instruction. For example, a typical [[fused multiply-accumulate]] (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as
 :<math>\text{FLOPS}_\text{core} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}}</math>
@@ Line 22: / Line 22: @@
 :<math>\text{FLOPS}_\text{system} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>
-=== Nomenclature ===
-* KiloFLOPS / KFLOPS: 10<sup>3</sup> FLOPS
-* MegaFLOPS / MFLOPS: 10<sup>6</sup> FLOPS
-* GigaFLOPS / GFLOPS: 10<sup>9</sup> FLOPS
-* TeraFLOPS / TFLOPS: 10<sup>12</sup> FLOPS
-* PetaFLOPS / PFLOPS: 10<sup>15</sup> FLOPS
-* ExaFLOPS / EFLOPS: 10<sup>18</sup> FLOPS
-* ZettaFLOPS / ZFLOPS: 10<sup>21</sup> FLOPS
-* YottaFLOPS / YFLOPS: 10<sup>24</sup> FLOPS
-== FLOPs by microarchitecture ==
-=== x86 ===
-{| class="wikitable"
-|-
-! Microarchitecture !! colspan="3" | FLOPs !! ISA
-|-
-! colspan="5" | [[Intel]] Microarchitectures
-|-
-| rowspan="3" | {{intel|Core|l=arch}}<br>{{intel|Penryn|l=arch}}<br>{{intel|Nehalem|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit Multiplication + 1 × 128-bit Addition || rowspan="3" | {{x86|SSE}} (128-bit)
-|-
-| '''DP''' || 4 FLOPs/cycle || 2 FLOPs + 2 FLOPs
-|-
-| '''SP''' || 8 FLOPs/cycle || 4 FLOPs + 4 FLOPs
-|-
-| rowspan="3" | {{intel|Sandy Bridge|l=arch}}<br>{{intel|Ivy Bridge|l=arch}} || '''EUs''' || colspan="2" | 1 × 256-bit Multiplication + 1 × 256-bit Addition || rowspan="3" | {{x86|AVX}} (256-bit)
-|-
-| '''DP''' || 8 FLOPs/cycle || 4 FLOPs + 4 FLOPs
-|-
-| '''SP''' || 16 FLOPs/cycle || 8 FLOPs + 8 FLOPs
-|-
-| rowspan="3" | {{intel|Haswell|l=arch}}<br>{{intel|Broadwell|l=arch}}<br>{{intel|Skylake|l=arch}}<br>{{intel|Kaby Lake|l=arch}}<br>{{intel|Amber Lake|l=arch}}<br>{{intel|Coffee Lake|l=arch}}<br>{{intel|Whiskey Lake|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{x86|AVX2}} & FMA (256-bit)
-|-
-| '''DP''' || 16 FLOPs/cycle || 2 × 8 FLOPs
-|-
-| '''SP''' || 32 FLOPs/cycle || 2 × 16 FLOPs
-|-
-| rowspan="3" | {{intel|Skylake (server)|l=arch}} || '''EUs''' || colspan="2" | 2 × 512-bit FMA (varies by SKU) || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
-|-
-| '''DP''' || 32 FLOPs/cycle || 2 × 16 FLOPs
-|-
-| '''SP''' || 64 FLOPs/cycle || 2 × 32 FLOPs
-|-
-| rowspan="3" | {{intel|Rocket Lake|l=arch}}<br>{{intel|Ice Lake|l=arch}}<br>{{intel|Tiger Lake|l=arch}} || '''EUs''' || colspan="2" | 2 × 512-bit FMA || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
-|-
-| '''DP''' || 32 FLOPs/cycle || 2 × 16 FLOPs
-|-
-| '''SP''' || 64 FLOPs/cycle || 2 × 32 FLOPs
-|-
-! colspan="5" | [[Intel]] {{intel|MIC}} Microarchitectures
-|-
-| rowspan="3" | {{intel|Knights Landing|l=arch}} || '''EUs''' || colspan="2" | 2 × 512-bit FMA || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
-|-
-| '''DP''' || 32 FLOPs/cycle || 2 × 16 FLOPs
-|-
-| '''SP''' || 64 FLOPs/cycle || 2 × 32 FLOPs
-|-
-! colspan="5" | [[AMD]] Microarchitectures
-|-
-| rowspan="3" | {{amd|K10|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit Multiplication + 1 × 128-bit Addition || rowspan="3" | {{x86|SSE}} (128-bit)
-|-
-| '''DP''' || 4 FLOPs/cycle || 2 FLOPs + 2 FLOPs
-|-
-| '''SP''' || 8 FLOPs/cycle || 4 FLOPs + 4 FLOPs
-|-
-| rowspan="3" | {{amd|Bulldozer|l=arch}}<br>{{amd|Piledriver|l=arch}}<br>{{amd|Steamroller|l=arch}}<br>{{amd|Excavator|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA (per two cores) || rowspan="3" | {{x86|AVX}} & {{x86|FMA4|FMA}} (128-bit)
-|-
-| '''DP''' || 8 FLOPs/cycle || 2 x 4 FLOPs
-|-
-| '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
-|-
-| rowspan="3" | {{amd|Zen|l=arch}}<br>{{amd|Zen+|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA || rowspan="3" | {{x86|AVX2}} & FMA (256-bit)
-|-
-| '''DP''' || 8 FLOPs/cycle || 2 x 4 FLOPs
-|-
-| '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
-|-
-| rowspan="3" | {{amd|Zen 2|l=arch}}<br>{{amd|Zen 3|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{x86|AVX2}} & FMA (256-bit)
-|-
-| '''DP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
-|-
-| '''SP''' || 32 FLOPs/cycle || 2 x 16 FLOPs
-|-
-! colspan="5" | [[Centaur]] Microarchitectures
-|-
-| rowspan="3" | {{centtech|CHA|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
-|-
-| '''DP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
-|-
-| '''SP''' || 32 FLOPs/cycle || 2 x 16 FLOPs
-|}
-=== ARM ===
-{| class="wikitable"
-|-
-! Microarchitecture !! colspan="3" | FLOPs !! ISA
-|-
-! colspan="5" | [[ARM]] Microarchitectures
-|-
-| rowspan="3" | {{armh|Cortex-A57|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
-|-
-| '''DP''' || 4 FLOPs/cycle || 4 FLOPs
-|-
-| '''SP''' || 8 FLOPs/cycle || 8 FLOPs
-|-
-| rowspan="3" | {{armh|Cortex-A76|l=arch}}<br>{{armh|Cortex-A77|l=arch}}<br>{{armh|Cortex-A78|l=arch}}<br>{{armh|Neoverse N1|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
-|-
-| '''DP''' || 8 FLOPs/cycle || 2 x 4 FLOPs
-|-
-| '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
-|-
-| rowspan="3" | {{armh|Neoverse N2|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA || rowspan="3" | {{arm|ARMv9}} {{arm|SVE2}} (128-bit)
-|-
-| '''DP''' || 8 FLOPs/cycle || 2 x 4 FLOPs
-|-
-| '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
-|-
-| rowspan="3" | {{armh|Neoverse V1|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|SVE}} (256-bit)
-|-
-| '''DP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
-|-
-| '''SP''' || 32 FLOPs/cycle || 2 x 16 FLOPs
-|-
-| rowspan="3" | {{armh|Cortex-A510|l=arch}} || '''EUs''' || colspan="2" | 1-2 × 128-bit FMA || rowspan="3" | {{arm|ARMv9}} {{arm|SVE2}} (128-bit)
-|-
-| '''DP''' || 2-4 FLOPs/cycle || 2-4 FLOPs
-|-
-| '''SP''' || 4-8 FLOPs/cycle || 4-8 FLOPs
-|-
-! colspan="5" | [[AppliedMicro]]/[[Ampere Computing]] Microarchitectures
-|-
-| rowspan="3" | {{apm|Storm|l=arch}}<br>{{apm|Shadowcat|l=arch}}<br>{{apm|Skylark|l=arch}} || '''EUs''' || colspan="2" | 1 × 64-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
-|-
-| '''DP''' || 2 FLOPs/cycle || 2 FLOPs
-|-
-| '''SP''' || 4 FLOPs/cycle || 4 FLOPs
-|-
-! colspan="5" | [[Cavium]] Microarchitectures
-|-
-| rowspan="3" | {{cavium|Vulcan|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
-|-
-| '''DP''' || 8 FLOPs/cycle || 2 x 4 FLOPs
-|-
-| '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
-|-
-! colspan="5" | [[Samsung]] Microarchitectures
-|-
-| rowspan="3" | {{samsung|M1|l=arch}}<br>{{samsung|M2|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit FMA + 1 × 128-bit Addition || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
-|-
-| '''DP''' || 6 FLOPs/cycle || 1 x 4 FLOPs + 1 x 2 FLOPs
-|-
-| '''SP''' || 12 FLOPs/cycle || 1 x 8 FLOPs + 1 x 4 FLOPs
-|-
-| rowspan="3" | {{samsung|M3|l=arch}} || '''EUs''' || colspan="2" | 3 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
-|-
-| '''DP''' || 12 FLOPs/cycle || 3 x 4 FLOPs
-|-
-| '''SP''' || 24 FLOPs/cycle || 3 x 8 FLOPs
-|-
-! colspan="5" | [[Phytium]] Microarchitectures
-|-
-| rowspan="3" | {{phytium|Xiaomi|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
-|-
-| '''DP''' || 4 FLOPs/cycle || 1 x 4 FLOPs
-|-
-| '''SP''' || 8 FLOPs/cycle || 1 x 8 FLOPs
-|-
-! colspan="5" | [[HiSilicon]] Microarchitectures
-|-
-| rowspan="3" | {{hisilicon|TaiShan v110|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
-|-
-| '''DP''' || 4 FLOPs/cycle || 1 x 4 FLOPs
-|-
-| '''SP''' || 8 FLOPs/cycle || 1 x 8 FLOPs
-|}
-== See also ==
-* [[bytes per FLOP]]
-* [[floating point]]
-* [[floating point operation]]
-* [[operations per second]] (OPS)
-[[category:floating point]]
-[[Category:computer performance]]