From WikiChip
Editing macro-operation fusion

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 3: Line 3:
  
 
== Overview & Motivation ==
 
== Overview & Motivation ==
One of the three [[microprocessor performance|performance knobs of a microprocessor]] is the [[instruction count]]. By reducing the number of instructions that must be executed, more work can be done with fewer resources. The idea behind macro-operation fusion is to combine multiple adjacent instructions into a single instruction. A fused instruction typically remains fused throughout its lifetime. Therefore fused instructions can represent more work with fewer bits, free up execution units, tracking information (e.g. in the [[register renaming|rename unit]]), save pipeline bandwidth in all stages from decode to retire, and consequently save power.
+
One of the three [[microprocessor performance|performance knobs of a microprocessor]] is the [[instruction count]]. By reducing the number of instructions that must be executed, more work can be done with lower resource usage. The idea behind macro-operation fusion is to combine multiple adjacent instructions into a single instruction. A fused instruction typically remains fused throughout its lifetime. Therefore fused instructions can represent more work with fewer bits, free up execution units, tracking information (e.g. in the [[register renaming|rename unit]]), save pipeline bandwidth in all stages from decode to retire, and consequently save power.
  
 
A unique aspect of macro-op fusion is that it also helps workloads that are not compiled such as in the case of many [[interpreted programming languages]] (e.g. [[PHP]], the software running WikiChip).
 
A unique aspect of macro-op fusion is that it also helps workloads that are not compiled such as in the case of many [[interpreted programming languages]] (e.g. [[PHP]], the software running WikiChip).
 
== Arm ==
 
Arm supports a number of macro-op fusion operations in their recent microarchitectures.
 
 
* movw + movt
 
* aese + aesmc
 
* aesd + aesimc
 
  
 
== RISC-V ==
 
== RISC-V ==
The use of macro-op fusion in RISC-V was proposed in a 2016 Berkeley paper<ref>Celio et al</ref> where a renewed case was made for the use of macro-operation fusion over bloating the ISA with more complex instructions. The paper compared the RISC-V isa performance in terms of instruction count on the popular [[SPEC CPU2006]] benchmark where it is found to be slightly behind contemporary ISAs. In their paper<ref>Celio et al</ref>, it's claimed that the RV64G and RV64GC effective instruction count can be reduced by 5.4% on average by leveraging macro-op fusion, thereby closing much of the deficiency gap. The used of macro-op fusion has gained larger support in the RISC-V community in favor of the microarchitecture taking care of this aspect rather than bloating the ISA with more complex instructions.
+
The used of RISC-V was proposed in a 2016 Berkeley paper<ref>Celio et al</ref> where a renewed case was made for the use of macro-operation fusion over bloating the ISA with more complex instructions. The paper compared the RISC-V isa performance in terms of instruction count on the popular [[SPEC CPU2006]] benchmark where it is found to be slightly behind contemporary ISAs. In their paper<ref>Celio et al</ref>, it's claimed that the RV64G and RV64GC effective instruction count can be reduced by 5.4% on average by leveraging macro-op fusion, thereby closing much of the deficiency gap. The used of macro-op fusion has gained larger support in the RISC-V community in favor of the microarchitecture taking care of this aspect rather than bloating the ISA with more complex instructions.
  
 
=== Proposed fusion operations ===
 
=== Proposed fusion operations ===
Line 52: Line 45:
  
 
== x86 ==
 
== x86 ==
=== Intel ===
 
 
Intel uses macro-op fusion in all their modern {{intel|microarchitectures}} since {{intel|Core|l=arch}}.
 
Intel uses macro-op fusion in all their modern {{intel|microarchitectures}} since {{intel|Core|l=arch}}.
==== History ====
+
=== History ===
 
The technique for fusing instructions is owned by [[Intel]] and is protected by [https://www.google.com/patents/US6675376 Patent US6675376] ("System and method for fusing instructions") originally filed in December [[2000]]. MOP Fusion was first introduced in [[2006]] in the {{intel|Core|l=arch}} microarchitecture and has been featured in every Intel microarch since.
 
The technique for fusing instructions is owned by [[Intel]] and is protected by [https://www.google.com/patents/US6675376 Patent US6675376] ("System and method for fusing instructions") originally filed in December [[2000]]. MOP Fusion was first introduced in [[2006]] in the {{intel|Core|l=arch}} microarchitecture and has been featured in every Intel microarch since.
  
==== Mechanism ====
+
=== Mechanism ===
 
<div style="float: right; text-align: center; margin: 10px;">
 
<div style="float: right; text-align: center; margin: 10px;">
[[File:core mopf off.png|350px]]
+
[[File:core mopf off.png|450px]]
  
[[File:core mopf on.png|350px]]
+
[[File:core mopf on.png|450px]]
  
 
<small>Slides from Intel's {{intel|Core|l=arch}} microarchitecture presentation.</small>
 
<small>Slides from Intel's {{intel|Core|l=arch}} microarchitecture presentation.</small>
Line 96: Line 88:
 
|}
 
|}
  
===== Prior limitations =====
+
==== Prior limitations ====
  
====== Nehalem µarch limitations ======
+
===== Nehalem µarch limitations =====
 
In {{intel|Nehalem|l=arch}}, Intel introduced a number of enhancements:
 
In {{intel|Nehalem|l=arch}}, Intel introduced a number of enhancements:
  
Line 104: Line 96:
 
* Supported on {{x86|x86-64}} mode
 
* Supported on {{x86|x86-64}} mode
  
====== Core µarch limitations ======
+
===== Core µarch limitations =====
 
The original implementation in the {{intel|Core|l=arch}} microarchitecture was much more limited than in recent processors.
 
The original implementation in the {{intel|Core|l=arch}} microarchitecture was much more limited than in recent processors.
  
Line 117: Line 109:
 
* <code>{{x86|TEST}}</code> can fused with all conditional jumps
 
* <code>{{x86|TEST}}</code> can fused with all conditional jumps
 
* <code>{{x86|CMP}}</code> can only be fused with {{x86|Carry Flag}} ({{x86|CF}}) / {{x86|Zero Flag}} ({{x86|ZF}}) conditional jumps: <code>{{x86|JA}}</code>, <code>{{x86|JNBE}}</code>, <code>{{x86|JAE}}</code>, <code>{{x86|JNB}}</code>, <code>{{x86|JNC}}</code>, <code>{{x86|JE}}</code>, <code>{{x86|JZ}}</code>, <code>{{x86|JNA}}</code>, <code>{{x86|JBE}}</code>, <code>{{x86|JNAE}}</code>, <code>{{x86|JC}}</code>, <code>{{x86|JB}}</code>, <code>{{x86|JNE}}</code>, <code>{{x86|JNZ}}</code>
 
* <code>{{x86|CMP}}</code> can only be fused with {{x86|Carry Flag}} ({{x86|CF}}) / {{x86|Zero Flag}} ({{x86|ZF}}) conditional jumps: <code>{{x86|JA}}</code>, <code>{{x86|JNBE}}</code>, <code>{{x86|JAE}}</code>, <code>{{x86|JNB}}</code>, <code>{{x86|JNC}}</code>, <code>{{x86|JE}}</code>, <code>{{x86|JZ}}</code>, <code>{{x86|JNA}}</code>, <code>{{x86|JBE}}</code>, <code>{{x86|JNAE}}</code>, <code>{{x86|JC}}</code>, <code>{{x86|JB}}</code>, <code>{{x86|JNE}}</code>, <code>{{x86|JNZ}}</code>
 
=== Centaur ===
 
[[Centaur Technology]] also implements macro-op fusion in its architectures, including in its most recent server SoC, {{centtech|CHA|l=arch}}.
 
  
 
== Bibliography ==
 
== Bibliography ==

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)