From WikiChip
Editing macro-operation fusion
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
{{title|Macro-Operation Fusion (MOP Fusion)}}{{confuse|micro-operation fusion}} | {{title|Macro-Operation Fusion (MOP Fusion)}}{{confuse|micro-operation fusion}} | ||
− | '''Macro-Operation Fusion''' (also '''Macro-Op Fusion''', '''MOP Fusion''', or '''Macrofusion''') is a hardware optimization technique found in many modern [[microarchitectures]] whereby a | + | '''Macro-Operation Fusion''' (also '''Macro-Op Fusion''', '''MOP Fusion''', or '''Macrofusion''') is a hardware optimization technique found in many modern [[microarchitectures]] whereby a pair of adjacent [[macro-operations]] are merged into a single macro-operation prior to decoding. Those instructions are later decoded into fused-µOPs. |
== Overview & Motivation == | == Overview & Motivation == | ||
− | One of the three [[microprocessor performance|performance knobs of a microprocessor]] is the [[instruction count]]. By reducing the number of instructions that must be executed, more work can be done with | + | One of the three [[microprocessor performance|performance knobs of a microprocessor]] is the [[instruction count]]. By reducing the number of instructions that must be executed, more work can be done with lower resource usage. The idea behind macro-operation fusion is to combine multiple adjacent instructions into a single instruction. A fused instruction typically remains fused throughout its lifetime. Therefore fused instructions can represent more work with fewer bits, free up execution units, tracking information (e.g. in the [[register renaming|rename unit]]), save pipeline bandwidth in all stages from decode to retire, and consequently save power. |
A unique aspect of macro-op fusion is that it also helps workloads that are not compiled such as in the case of many [[interpreted programming languages]] (e.g. [[PHP]], the software running WikiChip). | A unique aspect of macro-op fusion is that it also helps workloads that are not compiled such as in the case of many [[interpreted programming languages]] (e.g. [[PHP]], the software running WikiChip). | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== x86 == | == x86 == | ||
− | |||
Intel uses macro-op fusion in all their modern {{intel|microarchitectures}} since {{intel|Core|l=arch}}. | Intel uses macro-op fusion in all their modern {{intel|microarchitectures}} since {{intel|Core|l=arch}}. | ||
− | + | === History === | |
The technique for fusing instructions is owned by [[Intel]] and is protected by [https://www.google.com/patents/US6675376 Patent US6675376] ("System and method for fusing instructions") originally filed in December [[2000]]. MOP Fusion was first introduced in [[2006]] in the {{intel|Core|l=arch}} microarchitecture and has been featured in every Intel microarch since. | The technique for fusing instructions is owned by [[Intel]] and is protected by [https://www.google.com/patents/US6675376 Patent US6675376] ("System and method for fusing instructions") originally filed in December [[2000]]. MOP Fusion was first introduced in [[2006]] in the {{intel|Core|l=arch}} microarchitecture and has been featured in every Intel microarch since. | ||
− | + | === Mechanism === | |
<div style="float: right; text-align: center; margin: 10px;"> | <div style="float: right; text-align: center; margin: 10px;"> | ||
− | [[File:core mopf off.png| | + | [[File:core mopf off.png|450px]] |
− | [[File:core mopf on.png| | + | [[File:core mopf on.png|450px]] |
<small>Slides from Intel's {{intel|Core|l=arch}} microarchitecture presentation.</small> | <small>Slides from Intel's {{intel|Core|l=arch}} microarchitecture presentation.</small> | ||
Line 96: | Line 51: | ||
|} | |} | ||
− | + | ==== Prior limitations ==== | |
− | + | ===== Nehalem µarch limitations ===== | |
In {{intel|Nehalem|l=arch}}, Intel introduced a number of enhancements: | In {{intel|Nehalem|l=arch}}, Intel introduced a number of enhancements: | ||
Line 104: | Line 59: | ||
* Supported on {{x86|x86-64}} mode | * Supported on {{x86|x86-64}} mode | ||
− | + | ===== Core µarch limitations ===== | |
The original implementation in the {{intel|Core|l=arch}} microarchitecture was much more limited than in recent processors. | The original implementation in the {{intel|Core|l=arch}} microarchitecture was much more limited than in recent processors. | ||
Line 117: | Line 72: | ||
* <code>{{x86|TEST}}</code> can fused with all conditional jumps | * <code>{{x86|TEST}}</code> can fused with all conditional jumps | ||
* <code>{{x86|CMP}}</code> can only be fused with {{x86|Carry Flag}} ({{x86|CF}}) / {{x86|Zero Flag}} ({{x86|ZF}}) conditional jumps: <code>{{x86|JA}}</code>, <code>{{x86|JNBE}}</code>, <code>{{x86|JAE}}</code>, <code>{{x86|JNB}}</code>, <code>{{x86|JNC}}</code>, <code>{{x86|JE}}</code>, <code>{{x86|JZ}}</code>, <code>{{x86|JNA}}</code>, <code>{{x86|JBE}}</code>, <code>{{x86|JNAE}}</code>, <code>{{x86|JC}}</code>, <code>{{x86|JB}}</code>, <code>{{x86|JNE}}</code>, <code>{{x86|JNZ}}</code> | * <code>{{x86|CMP}}</code> can only be fused with {{x86|Carry Flag}} ({{x86|CF}}) / {{x86|Zero Flag}} ({{x86|ZF}}) conditional jumps: <code>{{x86|JA}}</code>, <code>{{x86|JNBE}}</code>, <code>{{x86|JAE}}</code>, <code>{{x86|JNB}}</code>, <code>{{x86|JNC}}</code>, <code>{{x86|JE}}</code>, <code>{{x86|JZ}}</code>, <code>{{x86|JNA}}</code>, <code>{{x86|JBE}}</code>, <code>{{x86|JNAE}}</code>, <code>{{x86|JC}}</code>, <code>{{x86|JB}}</code>, <code>{{x86|JNE}}</code>, <code>{{x86|JNZ}}</code> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== See also == | == See also == | ||
* [[micro-operation fusion]] | * [[micro-operation fusion]] | ||
* [[zeroing idioms]] | * [[zeroing idioms]] |