From WikiChip
Editing macro-operation fusion

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 1: Line 1:
 
{{title|Macro-Operation Fusion (MOP Fusion)}}{{confuse|micro-operation fusion}}
 
{{title|Macro-Operation Fusion (MOP Fusion)}}{{confuse|micro-operation fusion}}
'''Macro-Operation Fusion''' (also '''Macro-Op Fusion''', '''MOP Fusion''', or '''Macrofusion''') is a hardware optimization technique found in many modern [[microarchitectures]] whereby a series of adjacent [[macro-operations]] are merged into a single macro-operation prior or during decoding. Those instructions are later decoded into fused-µOPs.
+
'''Macro-Operation Fusion''' (also '''Macro-Op Fusion''', '''MOP Fusion''', or '''Macrofusion''') is a hardware optimization technique found in many modern [[microarchitectures]] whereby a pair of adjacent [[macro-operations]] are merged into a single macro-operation prior to decoding. Those instructions are later decoded into fused-µOPs.
  
 
== Overview & Motivation ==
 
== Overview & Motivation ==
One of the three [[microprocessor performance|performance knobs of a microprocessor]] is the [[instruction count]]. By reducing the number of instructions that must be executed, more work can be done with fewer resources. The idea behind macro-operation fusion is to combine multiple adjacent instructions into a single instruction. A fused instruction typically remains fused throughout its lifetime. Therefore fused instructions can represent more work with fewer bits, free up execution units, tracking information (e.g. in the [[register renaming|rename unit]]), save pipeline bandwidth in all stages from decode to retire, and consequently save power.
+
One of the three [[microprocessor performance|performance knobs of a microprocessor]] is the [[instruction count]]. By reducing the number of instructions that must be executed, more work can be done with lower resource usage. The idea behind macro-operation fusion is to combine multiple adjacent instructions into a single instruction. A fused instruction typically remains fused throughout its lifetime. Therefore fused instructions can represent more work with fewer bits, free up execution units, tracking information (e.g. in the [[register renaming|rename unit]]), save pipeline bandwidth in all stages from decode to retire, and consequently save power.
  
 
A unique aspect of macro-op fusion is that it also helps workloads that are not compiled such as in the case of many [[interpreted programming languages]] (e.g. [[PHP]], the software running WikiChip).
 
A unique aspect of macro-op fusion is that it also helps workloads that are not compiled such as in the case of many [[interpreted programming languages]] (e.g. [[PHP]], the software running WikiChip).
 
== Arm ==
 
Arm supports a number of macro-op fusion operations in their recent microarchitectures.
 
 
* movw + movt
 
* aese + aesmc
 
* aesd + aesimc
 
 
== RISC-V ==
 
The use of macro-op fusion in RISC-V was proposed in a 2016 Berkeley paper<ref>Celio et al</ref> where a renewed case was made for the use of macro-operation fusion over bloating the ISA with more complex instructions. The paper compared the RISC-V isa performance in terms of instruction count on the popular [[SPEC CPU2006]] benchmark where it is found to be slightly behind contemporary ISAs. In their paper<ref>Celio et al</ref>, it's claimed that the RV64G and RV64GC effective instruction count can be reduced by 5.4% on average by leveraging macro-op fusion, thereby closing much of the deficiency gap. The used of macro-op fusion has gained larger support in the RISC-V community in favor of the microarchitecture taking care of this aspect rather than bloating the ISA with more complex instructions.
 
 
=== Proposed fusion operations ===
 
Some of the common operations are:
 
 
{| class="wikitable"
 
|-
 
! Pattern !! Result
 
|-
 
| // rd = array[offset]<br>add rd, rs1, rs2<br>ld rd, 0(rd) || Fused into an indexed load
 
|-
 
| // &(array[offset])<br>slli rd, rs1, {1,2,3}<br>add rd, rd, rs2 || Fused into a load effective address
 
|-
 
| // rd = array[offset]<br>slli rd, rs1, {1,2,3}<br>add rd, rd, rs2<br>ld rd, 0(rd) || Three-instruction fused into a load effective address
 
|-
 
| // rd = rs1 & 0xffffffff<br>slli rd, rs1, 32<br>srli rd, rd, 32 || Clear upper word
 
|-
 
|// rd = imm[31:0]<br>lui rd, imm[31:12]<br>addi rd, rd, imm[11:0] || Load upper immediate
 
|-
 
| // rd = *(imm[31:0])<br>lui rd, imm[31:12]<br>ld rd, imm[11:0](rd) || Load upper immediate
 
|-
 
|// l[dw] rd, symbol[31:0]<br>auipc rd, symbol[31:12]<br>l[dw] rd, symbol[11:0](rd) || Load global immediate
 
|-
 
| // far jump (1 MB) (AUIPC+JALR)<br>auipc t, imm20<br>jalr ra, imm12(t) || Fused far jump and link with calculated target address
 
|-
 
| addiw rd, rs1, imm12<br>slli rd, rs1, 32<br>SRLI rd, rs1, 32 || Fused into a single 32-bit zero extending add operation
 
|-
 
|mulh[[S]U] rdh, rs1, rs2<br>mul rdl, rs1, rs2 || Fused into a wide multiply
 
|-
 
| div[U] rdq, rs1, rs2<br>rem[U] rdr, rs1, rs2 || Fused into a wide divide
 
|-
 
|// ldpair rd1,rd2, [imm(rs1)]<br>ld rd1, imm(rs1)<br>ld rd2, imm+8(rs1) || Fused into a load-pair
 
|-
 
|// ldia rd, imm(rs1)<br>ld rd, imm(rs1)<br>add rs1, rs1, 8 || Fused into a post-indexed load
 
|}
 
  
 
== x86 ==
 
== x86 ==
=== Intel ===
 
 
Intel uses macro-op fusion in all their modern {{intel|microarchitectures}} since {{intel|Core|l=arch}}.
 
Intel uses macro-op fusion in all their modern {{intel|microarchitectures}} since {{intel|Core|l=arch}}.
==== History ====
+
=== History ===
 
The technique for fusing instructions is owned by [[Intel]] and is protected by [https://www.google.com/patents/US6675376 Patent US6675376] ("System and method for fusing instructions") originally filed in December [[2000]]. MOP Fusion was first introduced in [[2006]] in the {{intel|Core|l=arch}} microarchitecture and has been featured in every Intel microarch since.
 
The technique for fusing instructions is owned by [[Intel]] and is protected by [https://www.google.com/patents/US6675376 Patent US6675376] ("System and method for fusing instructions") originally filed in December [[2000]]. MOP Fusion was first introduced in [[2006]] in the {{intel|Core|l=arch}} microarchitecture and has been featured in every Intel microarch since.
  
==== Mechanism ====
+
=== Mechanism ===
 
<div style="float: right; text-align: center; margin: 10px;">
 
<div style="float: right; text-align: center; margin: 10px;">
[[File:core mopf off.png|350px]]
+
[[File:core mopf off.png|450px]]
  
[[File:core mopf on.png|350px]]
+
[[File:core mopf on.png|450px]]
  
 
<small>Slides from Intel's {{intel|Core|l=arch}} microarchitecture presentation.</small>
 
<small>Slides from Intel's {{intel|Core|l=arch}} microarchitecture presentation.</small>
Line 96: Line 51:
 
|}
 
|}
  
===== Prior limitations =====
+
==== Prior limitations ====
  
====== Nehalem µarch limitations ======
+
===== Nehalem µarch limitations =====
 
In {{intel|Nehalem|l=arch}}, Intel introduced a number of enhancements:
 
In {{intel|Nehalem|l=arch}}, Intel introduced a number of enhancements:
  
Line 104: Line 59:
 
* Supported on {{x86|x86-64}} mode
 
* Supported on {{x86|x86-64}} mode
  
====== Core µarch limitations ======
+
===== Core µarch limitations =====
 
The original implementation in the {{intel|Core|l=arch}} microarchitecture was much more limited than in recent processors.
 
The original implementation in the {{intel|Core|l=arch}} microarchitecture was much more limited than in recent processors.
  
Line 117: Line 72:
 
* <code>{{x86|TEST}}</code> can fused with all conditional jumps
 
* <code>{{x86|TEST}}</code> can fused with all conditional jumps
 
* <code>{{x86|CMP}}</code> can only be fused with {{x86|Carry Flag}} ({{x86|CF}}) / {{x86|Zero Flag}} ({{x86|ZF}}) conditional jumps: <code>{{x86|JA}}</code>, <code>{{x86|JNBE}}</code>, <code>{{x86|JAE}}</code>, <code>{{x86|JNB}}</code>, <code>{{x86|JNC}}</code>, <code>{{x86|JE}}</code>, <code>{{x86|JZ}}</code>, <code>{{x86|JNA}}</code>, <code>{{x86|JBE}}</code>, <code>{{x86|JNAE}}</code>, <code>{{x86|JC}}</code>, <code>{{x86|JB}}</code>, <code>{{x86|JNE}}</code>, <code>{{x86|JNZ}}</code>
 
* <code>{{x86|CMP}}</code> can only be fused with {{x86|Carry Flag}} ({{x86|CF}}) / {{x86|Zero Flag}} ({{x86|ZF}}) conditional jumps: <code>{{x86|JA}}</code>, <code>{{x86|JNBE}}</code>, <code>{{x86|JAE}}</code>, <code>{{x86|JNB}}</code>, <code>{{x86|JNC}}</code>, <code>{{x86|JE}}</code>, <code>{{x86|JZ}}</code>, <code>{{x86|JNA}}</code>, <code>{{x86|JBE}}</code>, <code>{{x86|JNAE}}</code>, <code>{{x86|JC}}</code>, <code>{{x86|JB}}</code>, <code>{{x86|JNE}}</code>, <code>{{x86|JNZ}}</code>
 
=== Centaur ===
 
[[Centaur Technology]] also implements macro-op fusion in its architectures, including in its most recent server SoC, {{centtech|CHA|l=arch}}.
 
 
== Bibliography ==
 
* Celio, Christopher, et al. "The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V." arXiv preprint arXiv:1607.02318 (2016).
 
 
{{reflist}}
 
  
 
== See also ==
 
== See also ==
 
* [[micro-operation fusion]]
 
* [[micro-operation fusion]]
 
* [[zeroing idioms]]
 
* [[zeroing idioms]]

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)