From WikiChip
Editing flops

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 1: Line 1:
 
{{title|Floating-Point Operations Per Second (FLOPS)}}
 
{{title|Floating-Point Operations Per Second (FLOPS)}}
'''Floating-point operations per second''' ('''FLOPS''') is a measure of [[compute performance]] used to quantify the number of [[floating-point]] [[floating-point operations|operations]] a [[physical core|core]], machine, or system is capable of in a one second.
+
'''Floating-point operations per second''' ('''FLOPS''') is a microprocessor performance unit used to quantify the number of [[floating-point]] [[floating-point operations|operations]] a [[physical core|core]], machine, or system is capable of in a one second.
  
 
== Overview ==
 
== Overview ==
Line 15: Line 15:
 
:<math>\text{FLOPS}_\text{system} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>
 
:<math>\text{FLOPS}_\text{system} = \frac{\text{FLOPs}}{\text{cycle}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>
  
Modern microprocessors exploit [[data parallelism]] further through the introduction of various vector extensions such as [[x86]]'s {{x86|AVX}} and [[ARM]]'s {{arm|SVE}}. With those extensions, it's possible to perform multiple floating-point operations within a single instruction. For example, a typical [[fused multiply-accumulate]] (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as
+
Modern microprocessors exploit [[data parallelism]] further through the introduction of various vector extensions such as [[x86]]'s {{x86|AVX}} and [[ARM]]'s {{arm|SVE}}. With those extensions, it's possible to performance multiple floating-point operations within a single instruction. For example, a typical [[fused multiply-accumulate]] (FMAC) operation can perform two floating-point operations at once. For a single core, this can be expressed as
  
 
:<math>\text{FLOPS}_\text{core} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}}</math>
 
:<math>\text{FLOPS}_\text{core} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}}</math>
Line 22: Line 22:
  
 
:<math>\text{FLOPS}_\text{system} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>
 
:<math>\text{FLOPS}_\text{system} = \frac{\text{instructions}}{\text{cycle}} \times \frac{\text{operations}}{\text{instruction}} \times \frac{\text{FLOPs}}{\text{operation}} \times \frac{\text{cycles}}{\text{second}} \times \frac{\text{cores}}{\text{node}} \times \frac{\text{nodes}}{\text{system}}</math>
 
=== Nomenclature ===
 
* KiloFLOPS / KFLOPS: 10<sup>3</sup> FLOPS
 
* MegaFLOPS / MFLOPS: 10<sup>6</sup> FLOPS
 
* GigaFLOPS / GFLOPS: 10<sup>9</sup> FLOPS
 
* TeraFLOPS / TFLOPS: 10<sup>12</sup> FLOPS
 
* PetaFLOPS / PFLOPS: 10<sup>15</sup> FLOPS
 
* ExaFLOPS / EFLOPS: 10<sup>18</sup> FLOPS
 
* ZettaFLOPS / ZFLOPS: 10<sup>21</sup> FLOPS
 
* YottaFLOPS / YFLOPS: 10<sup>24</sup> FLOPS
 
 
== FLOPs by microarchitecture ==
 
=== x86 ===
 
{| class="wikitable"
 
|-
 
! Microarchitecture !! colspan="3" | FLOPs !! ISA
 
|-
 
! colspan="5" | [[Intel]] Microarchitectures
 
|-
 
| rowspan="3" | {{intel|Core|l=arch}}<br>{{intel|Penryn|l=arch}}<br>{{intel|Nehalem|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit Multiplication + 1 × 128-bit Addition || rowspan="3" | {{x86|SSE}} (128-bit)
 
|-
 
| '''DP''' || 4 FLOPs/cycle || 2 FLOPs + 2 FLOPs
 
|-
 
| '''SP''' || 8 FLOPs/cycle || 4 FLOPs + 4 FLOPs
 
|-
 
| rowspan="3" | {{intel|Sandy Bridge|l=arch}}<br>{{intel|Ivy Bridge|l=arch}} || '''EUs''' || colspan="2" | 1 × 256-bit Multiplication + 1 × 256-bit Addition || rowspan="3" | {{x86|AVX}} (256-bit)
 
|-
 
| '''DP''' || 8 FLOPs/cycle || 4 FLOPs + 4 FLOPs
 
|-
 
| '''SP''' || 16 FLOPs/cycle || 8 FLOPs + 8 FLOPs
 
|-
 
| rowspan="3" | {{intel|Haswell|l=arch}}<br>{{intel|Broadwell|l=arch}}<br>{{intel|Skylake|l=arch}}<br>{{intel|Kaby Lake|l=arch}}<br>{{intel|Amber Lake|l=arch}}<br>{{intel|Coffee Lake|l=arch}}<br>{{intel|Whiskey Lake|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{x86|AVX2}} & FMA (256-bit)
 
|-
 
| '''DP''' || 16 FLOPs/cycle || 2 × 8 FLOPs
 
|-
 
| '''SP''' || 32 FLOPs/cycle || 2 × 16 FLOPs
 
|-
 
| rowspan="3" | {{intel|Skylake (server)|l=arch}} || '''EUs''' || colspan="2" | 2 × 512-bit FMA (varies by SKU) || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
 
|-
 
| '''DP''' || 32 FLOPs/cycle || 2 × 16 FLOPs
 
|-
 
| '''SP''' || 64 FLOPs/cycle || 2 × 32 FLOPs
 
|-
 
| rowspan="3" | {{intel|Rocket Lake|l=arch}}<br>{{intel|Ice Lake|l=arch}}<br>{{intel|Tiger Lake|l=arch}} || '''EUs''' || colspan="2" | 2 × 512-bit FMA || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
 
|-
 
| '''DP''' || 32 FLOPs/cycle || 2 × 16 FLOPs
 
|-
 
| '''SP''' || 64 FLOPs/cycle || 2 × 32 FLOPs
 
|-
 
! colspan="5" | [[Intel]] {{intel|MIC}} Microarchitectures
 
|-
 
| rowspan="3" | {{intel|Knights Landing|l=arch}} || '''EUs''' || colspan="2" | 2 × 512-bit FMA || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
 
|-
 
| '''DP''' || 32 FLOPs/cycle || 2 × 16 FLOPs
 
|-
 
| '''SP''' || 64 FLOPs/cycle || 2 × 32 FLOPs
 
|-
 
! colspan="5" | [[AMD]] Microarchitectures
 
|-
 
| rowspan="3" | {{amd|K10|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit Multiplication + 1 × 128-bit Addition || rowspan="3" | {{x86|SSE}} (128-bit)
 
|-
 
| '''DP''' || 4 FLOPs/cycle || 2 FLOPs + 2 FLOPs
 
|-
 
| '''SP''' || 8 FLOPs/cycle || 4 FLOPs + 4 FLOPs
 
|-
 
| rowspan="3" | {{amd|Bulldozer|l=arch}}<br>{{amd|Piledriver|l=arch}}<br>{{amd|Steamroller|l=arch}}<br>{{amd|Excavator|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA (per two cores) || rowspan="3" | {{x86|AVX}} & {{x86|FMA4|FMA}} (128-bit)
 
|-
 
| '''DP''' || 8 FLOPs/cycle || 2 x 4 FLOPs
 
|-
 
| '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
 
|-
 
| rowspan="3" | {{amd|Zen|l=arch}}<br>{{amd|Zen+|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA || rowspan="3" | {{x86|AVX2}} & FMA (256-bit)
 
|-
 
| '''DP''' || 8 FLOPs/cycle || 2 x 4 FLOPs
 
|-
 
| '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
 
|-
 
| rowspan="3" | {{amd|Zen 2|l=arch}}<br>{{amd|Zen 3|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{x86|AVX2}} & FMA (256-bit)
 
|-
 
| '''DP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
 
|-
 
| '''SP''' || 32 FLOPs/cycle || 2 x 16 FLOPs
 
|-
 
! colspan="5" | [[Centaur]] Microarchitectures
 
|-
 
| rowspan="3" | {{centtech|CHA|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{x86|AVX-512}} & FMA (512-bit)
 
|-
 
| '''DP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
 
|-
 
| '''SP''' || 32 FLOPs/cycle || 2 x 16 FLOPs
 
|}
 
 
=== ARM ===
 
{| class="wikitable"
 
|-
 
! Microarchitecture !! colspan="3" | FLOPs !! ISA
 
|-
 
! colspan="5" | [[ARM]] Microarchitectures
 
|-
 
| rowspan="3" | {{armh|Cortex-A57|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
 
|-
 
| '''DP''' || 4 FLOPs/cycle || 4 FLOPs
 
|-
 
| '''SP''' || 8 FLOPs/cycle || 8 FLOPs
 
|-
 
| rowspan="3" | {{armh|Cortex-A76|l=arch}}<br>{{armh|Cortex-A77|l=arch}}<br>{{armh|Cortex-A78|l=arch}}<br>{{armh|Neoverse N1|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
 
|-
 
| '''DP''' || 8 FLOPs/cycle || 2 x 4 FLOPs
 
|-
 
| '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
 
|-
 
| rowspan="3" | {{armh|Neoverse N2|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA || rowspan="3" | {{arm|ARMv9}} {{arm|SVE2}} (128-bit)
 
|-
 
| '''DP''' || 8 FLOPs/cycle || 2 x 4 FLOPs
 
|-
 
| '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
 
|-
 
| rowspan="3" | {{armh|Neoverse V1|l=arch}} || '''EUs''' || colspan="2" | 2 × 256-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|SVE}} (256-bit)
 
|-
 
| '''DP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
 
|-
 
| '''SP''' || 32 FLOPs/cycle || 2 x 16 FLOPs
 
|-
 
| rowspan="3" | {{armh|Cortex-A510|l=arch}} || '''EUs''' || colspan="2" | 1-2 × 128-bit FMA || rowspan="3" | {{arm|ARMv9}} {{arm|SVE2}} (128-bit)
 
|-
 
| '''DP''' || 2-4 FLOPs/cycle || 2-4 FLOPs
 
|-
 
| '''SP''' || 4-8 FLOPs/cycle || 4-8 FLOPs
 
|-
 
! colspan="5" | [[AppliedMicro]]/[[Ampere Computing]] Microarchitectures
 
|-
 
| rowspan="3" | {{apm|Storm|l=arch}}<br>{{apm|Shadowcat|l=arch}}<br>{{apm|Skylark|l=arch}} || '''EUs''' || colspan="2" | 1 × 64-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
 
|-
 
| '''DP''' || 2 FLOPs/cycle || 2 FLOPs
 
|-
 
| '''SP''' || 4 FLOPs/cycle || 4 FLOPs
 
|-
 
! colspan="5" | [[Cavium]] Microarchitectures
 
|-
 
| rowspan="3" | {{cavium|Vulcan|l=arch}} || '''EUs''' || colspan="2" | 2 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
 
|-
 
| '''DP''' || 8 FLOPs/cycle || 2 x 4 FLOPs
 
|-
 
| '''SP''' || 16 FLOPs/cycle || 2 x 8 FLOPs
 
|-
 
! colspan="5" | [[Samsung]] Microarchitectures
 
|-
 
| rowspan="3" | {{samsung|M1|l=arch}}<br>{{samsung|M2|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit FMA + 1 × 128-bit Addition || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
 
|-
 
| '''DP''' || 6 FLOPs/cycle || 1 x 4 FLOPs + 1 x 2 FLOPs
 
|-
 
| '''SP''' || 12 FLOPs/cycle || 1 x 8 FLOPs + 1 x 4 FLOPs
 
|-
 
| rowspan="3" | {{samsung|M3|l=arch}} || '''EUs''' || colspan="2" | 3 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
 
|-
 
| '''DP''' || 12 FLOPs/cycle || 3 x 4 FLOPs
 
|-
 
| '''SP''' || 24 FLOPs/cycle || 3 x 8 FLOPs
 
|-
 
! colspan="5" | [[Phytium]] Microarchitectures
 
|-
 
| rowspan="3" | {{phytium|Xiaomi|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
 
|-
 
| '''DP''' || 4 FLOPs/cycle || 1 x 4 FLOPs
 
|-
 
| '''SP''' || 8 FLOPs/cycle || 1 x 8 FLOPs
 
|-
 
! colspan="5" | [[HiSilicon]] Microarchitectures
 
|-
 
| rowspan="3" | {{hisilicon|TaiShan v110|l=arch}} || '''EUs''' || colspan="2" | 1 × 128-bit FMA || rowspan="3" | {{arm|ARMv8}} {{arm|NEON}} (128-bit)
 
|-
 
| '''DP''' || 4 FLOPs/cycle || 1 x 4 FLOPs
 
|-
 
| '''SP''' || 8 FLOPs/cycle || 1 x 8 FLOPs
 
|}
 
 
== See also ==
 
* [[bytes per FLOP]]
 
* [[floating point]]
 
* [[floating point operation]]
 
* [[operations per second]] (OPS)
 
 
[[category:floating point]]
 
[[Category:computer performance]]
 

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)