(→Mechanism) |
|||
Line 15: | Line 15: | ||
A pair of two [[dependent instructions]] are first compared against a set of criteria. For example, if the second instruction is commutative (i.e., the order of operands does not affect the result) or if the destination of the [[operand]] of the first instruction is used as the source operand of the second instruction than the instruction may qualify for fusion. Additionally either the first source or destination operand must be a [[register]] and the second source operand (if one exists) must be an [[immediate value]] or a non-{{x86|RIP-Relative Addressing|RIP-relative memory}}. Fusion replaces the two instructions with a single instruction representing both operations behaviorally. | A pair of two [[dependent instructions]] are first compared against a set of criteria. For example, if the second instruction is commutative (i.e., the order of operands does not affect the result) or if the destination of the [[operand]] of the first instruction is used as the source operand of the second instruction than the instruction may qualify for fusion. Additionally either the first source or destination operand must be a [[register]] and the second source operand (if one exists) must be an [[immediate value]] or a non-{{x86|RIP-Relative Addressing|RIP-relative memory}}. Fusion replaces the two instructions with a single instruction representing both operations behaviorally. | ||
− | Fusion is done on compare (<code>{{x86|CMP}}</code> | + | Fusion is done on compare flag-modifying instruction (e.g., <code>{{x86|CMP}}</code> or <code>{{x86|ADD}}</code>) with a subsequent conditional [[jump instruction]]. The produced output is a single single compare-and-branch instruction. The final fused instruction remains as such for its remaining lifetime; that is the fused instruction will stay fused throughout the [[pipeline]] and execute on a single port in the [[back-end]] that can handle both operations. |
+ | |||
+ | * Two instructions must be right next to each other, with no other instruction in between | ||
+ | * First instruction must be one of the following: <code>{{x86|CMP}}</code>, <code>{{x86|TEST}}</code>, <code>{{x86|ADD}}</code>, <code>{{x86|SUB}}</code>, <code>{{x86|INC}}</code>, <code>{{x86|DEC}}</code>, or <code>{{x86|AND}}</code>. (Note: prior to {{intel|Nehalem|l=arch}}, this was limited to only CMP/TEST) | ||
+ | * Second instruction must be a conditional jump (e.g., <code>{{x86|JA}}</code>, <code>{{x86|JAE}}</code>, <code>{{x86|JE}}</code>, <code>{{x86|JNE}}</code>) | ||
+ | |||
+ | Note that prior to {{intel|Nehalem|l=arch}} Macro Fusion was restricted to {{x86|x86-32|32-bit mode}} only. It has since been extended to {{x86|x86-64|64-bit mode}} as well. |
Revision as of 14:34, 30 April 2017
- Not to be confused with micro-operation fusion.
Macro-Operation Fusion (Macro-Op Fusion or MOP Fusion) is a hardware optimization technique found in Intel's x86 microarchitectures whereby a pair of macro-operations are merged into a single macro-operation.
History
The technique for fusing instructions is owned by Intel under Patent US6675376 ("System and method for fusing instructions") originally filed in December 2000. MOP Fusion was first introduced in the Core microarchitecture and has been featured in every Intel microarch since.
Motivation
A fused instruction remains fused throughout its lifetime. Therefore fused instructions can represent more work with less bits, free up execution units, tracking information (e.g. in the rename unit), save pipeline bandwidth in all stages from decode to retire, and consequently save power. Note that this is done before decoding, therefore even decoding bandwidth is save.
Conditional branching are a very common operation in almost all workloads. Macro-op fusion also helps workloads that are not compiled such as in the case of many interpreted programming languages (e.g. PHP, the software running WikiChip). In those programs, conditional branching is seldomly fused as they would by a static compiler.
Mechanism
After the boundaries of macro-ops are found and marked, they are delivered to the instruction queue before being fed to the decoders. At that stage of the pipeline, macro-operation fusion opportunities can be identified and exploited.
A pair of two dependent instructions are first compared against a set of criteria. For example, if the second instruction is commutative (i.e., the order of operands does not affect the result) or if the destination of the operand of the first instruction is used as the source operand of the second instruction than the instruction may qualify for fusion. Additionally either the first source or destination operand must be a register and the second source operand (if one exists) must be an immediate value or a non-RIP-relative memory. Fusion replaces the two instructions with a single instruction representing both operations behaviorally.
Fusion is done on compare flag-modifying instruction (e.g., CMP
or ADD
) with a subsequent conditional jump instruction. The produced output is a single single compare-and-branch instruction. The final fused instruction remains as such for its remaining lifetime; that is the fused instruction will stay fused throughout the pipeline and execute on a single port in the back-end that can handle both operations.
- Two instructions must be right next to each other, with no other instruction in between
- First instruction must be one of the following:
CMP
,TEST
,ADD
,SUB
,INC
,DEC
, orAND
. (Note: prior to Nehalem, this was limited to only CMP/TEST) - Second instruction must be a conditional jump (e.g.,
JA
,JAE
,JE
,JNE
)
Note that prior to Nehalem Macro Fusion was restricted to 32-bit mode only. It has since been extended to 64-bit mode as well.