From WikiChip
Editing acorn/microarchitectures/arm2

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 110: Line 110:
 
<div style="float: left; margin: 10px;">'''Register-Immediate:'''<br>[[File:arm1 reg imm.svg|300px]]</div></div>
 
<div style="float: left; margin: 10px;">'''Register-Immediate:'''<br>[[File:arm1 reg imm.svg|300px]]</div></div>
  
{{clear}}
 
 
===== Multiplication =====
 
===== Multiplication =====
[[File:arm2 mul cycle.svg|right|250px]]
+
The ARM1 major performance issue was with multiplication. The ARM1 lacked hardware multiplication which meant software had to resort to a software-based solution (e.g., classic [[Shift-and-Add Multiplication]]). For example to perform <code>var = x * 5;</code> one could rewrite it as <code>var = x + (x << 2);</code> to achieve the same result without a multiplication operation. While originally was not thought to be a big program, software multiplication proved to be a rather serious bottleneck.
The ARM1 major performance issue was with multiplication. The ARM1 lacked hardware multiplication which meant software had to resort to a software-based solution (e.g., classic [[Shift-and-Add Multiplication]]). For example to perform <code>var = x * 5;</code> one could rewrite it as <code>var = x + (x << 2);</code> to achieve the same result without a multiplication operation. While originally was not thought to be a big problem, software multiplication proved to be a rather serious bottleneck.
 
  
 
This was addressed with the ARM2 which introduced a [[Booth's Multiplier]]. Conceptually, the multiplier sits on the "B" operand of the ALU in a similar way to how the barrel shifter sits on the "A" operand of the ALU, however there are some major differences in how they are implemented and operate.
 
This was addressed with the ARM2 which introduced a [[Booth's Multiplier]]. Conceptually, the multiplier sits on the "B" operand of the ALU in a similar way to how the barrel shifter sits on the "A" operand of the ALU, however there are some major differences in how they are implemented and operate.
  
With a number of hard constraints (i.e., [[die size]] and power dissipation), the ARM2 solution is a very conservative 2-bit multiplier. Unlike the shifter which can shift the first ALU operand by some amount of bits in almost every instruction, the multiplier is typically inoperative. Only the <code>MUL</code> and <code>MLA</code> make use of it. When that happens the multiplier is invoked. Each cycle the 2-bit Booth's algorithm multiplication is performed. The result is fed through the ALU to a destination register. The destination register is also used to hold the intermediate value which can last up to 16 cycles for all 32 bits. For this reason using the same destination register as the source has been prohibited as it would invoke [[undefined behavior]]. Since multiplication is commutative, swapping the two operands around should resolve this problem.
+
With a number of hard constraints (i.e., [[die size]] and power dissipation), the ARM2 solution is a very conservative 2-bit multiplier. Unlike the shifter which can shift the first ALU operand by some amount of bits in almost every instruction, the multiplier is typically inoperative. Only the <code>MUL</code> and <code>MLA</code> make use of it. When that happens the multiplier is invoked. Each cycle the 2-bit Booth's algorithm multiplication is performed. The result is fed through the ALU to a destination register. The destination register is also used to hold the intermediate value which can last up to 16 cycles for all 32 bits. For this reason using the same destination register as the source has been prohibited as it would invoke [[undefined behavior]].
  
 
The second instruction implemented, the <code>MLA</code>, supports multiply and accumulate. This instruction takes advantage of the fact that the ALU is situated after the multiplier, allowing a final addition operation to be performed on the result of the multiplication prior to saving the value back into the destination register.
 
The second instruction implemented, the <code>MLA</code>, supports multiply and accumulate. This instruction takes advantage of the fact that the ALU is situated after the multiplier, allowing a final addition operation to be performed on the result of the multiplication prior to saving the value back into the destination register.

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)

This page is a member of 1 hidden category:

codenameARM2 +
core count1 +
designerAcorn Computers +
first launched1986 +
full page nameacorn/microarchitectures/arm2 +
instance ofmicroarchitecture +
instruction set architectureARMv2 +
manufacturerVLSI Technology + and Sanyo +
microarchitecture typeCPU +
nameARM2 +
pipeline stages3 +
process2,000 nm (2 μm, 0.002 mm) +