(→Architecture) |
(Undo revision 98405 by 1.200.147.180 (talk)) |
||
(39 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
− | {{ | + | {{acorn title|ARM1|arch}} |
{{microarchitecture | {{microarchitecture | ||
|atype=CPU | |atype=CPU | ||
|name=ARM1 | |name=ARM1 | ||
− | |designer= | + | |designer=Acorn Computers |
|manufacturer=VLSI Technology | |manufacturer=VLSI Technology | ||
|introduction=1985 | |introduction=1985 | ||
Line 18: | Line 18: | ||
|l1d per=Core | |l1d per=Core | ||
|successor=ARM2 | |successor=ARM2 | ||
− | |successor link= | + | |successor link=acorn/microarchitectures/arm2 |
− | |||
− | |||
− | |||
− | |||
− | |||
}} | }} | ||
− | '''ARM1''' was the first [[ARM]] microarchitecture implemented by | + | '''ARM1''' was the first [[ARM]] microarchitecture implemented by [[Acorn Computers]] as a research and development project for the BBC Computer Literacy Project. ARM1 was introduced in [[1985]] and was extended to be used as a [[coprocessor]] in the Acorn's [[BBC Micro]] microcomputers. ARM1 was distributed as an evaluation system and was never commercialized. |
== History == | == History == | ||
Line 32: | Line 27: | ||
The first prototype tested worked on the first try, this was despite the ammeter reading no power. The prototype test board designed was faulty with a short. The chip was entirely running off the leakage from the I/Os. Designed to run at 1 W, the chip averaged under 100 mW typical power. | The first prototype tested worked on the first try, this was despite the ammeter reading no power. The prototype test board designed was faulty with a short. The chip was entirely running off the leakage from the I/Os. Designed to run at 1 W, the chip averaged under 100 mW typical power. | ||
+ | |||
+ | Originally intended to perform at roughly 1.5 times performance of the {{decc|VAX 11/780}}, the prototypes ended up achieveing between 2x to 4x the performance of the [[DEC]] {{decc|VAX 11/780}}; this is roughly equivalent to 10 times that of that original [[IBM]] {{ibm|PC AT}} or that of the [[Motorola]] {{motorola|68020}} operating at 16.67 MHz. | ||
== Process Technology == | == Process Technology == | ||
Line 43: | Line 40: | ||
* Implements {{armh|ARMv1}} | * Implements {{armh|ARMv1}} | ||
* Goal 1.5x performance of the {{decc|VAX 11/780}} | * Goal 1.5x performance of the {{decc|VAX 11/780}} | ||
− | * 26-bit address space | + | * {{arm|26-bit|26-bit address space}} |
* Pipeline | * Pipeline | ||
** ''Very simple'' | ** ''Very simple'' | ||
Line 55: | Line 52: | ||
=== Block Diagram === | === Block Diagram === | ||
− | + | ==== Core ==== | |
+ | [[File:arm1 block diagram.svg|700px]] | ||
== Core == | == Core == | ||
− | The ARM1 is an extremely simple 32-bit single-chip RISC microprocessor implementation. | + | The ARM1 is an extremely simple 32-bit single-chip [[RISC]] microprocessor implementation with a number of [[CISC]] features. |
=== Pipeline === | === Pipeline === | ||
− | The ARM1 utilizes a [[pipelining]] technique in order to improve performance and efficiency. The ARM1's pipeline consists of 3 | + | The ARM1 utilizes a [[pipelining]] technique in order to improve performance and efficiency. At peak performance the ARM1 can reach 8 [[million instructions per second]] with an average of 3 MIPS when using a 150 ns row access [[DRAM]]. The ARM1's pipeline consists of 3 stages (although some instructions may take as much as 5 cycles): |
: [[File:arm1 pipeline.svg|800px]] | : [[File:arm1 pipeline.svg|800px]] | ||
− | + | [[File:two-phase clock.svg|right|300px]] | |
− | The ARM1 operates on a guaranteed non-overlapping [[two-phase clock]] which allowed for [[level-triggered]] transfer instead of [[edge-triggering]]. The two clock phases are not generated on-die but come from an external [[oscillator]]. A complete cycle on the ARM1 is therefore Φ1 + Φ2. | + | The ARM1 operates on a guaranteed non-overlapping [[two-phase clock]] which allowed for [[level-triggered]] transfer instead of [[edge-triggering]]. The two clock phases are not generated on-die but come from an external [[oscillator]]. A complete cycle on the ARM1 is therefore Φ1 + Φ2. To simplify system design, these clocks may be stretched to work in-sync with memory access times. |
==== Fetch ==== | ==== Fetch ==== | ||
Line 74: | Line 72: | ||
The [[program counter]] on the ARM1 always points to the instruction being fetched. That is, with every instruction being exactly 4 [[bytes]], the currently executing instruction is always <code>PC - 8</code>. During the fetch stage, the address specified by the address register gets sent through the address pins and is fetched from memory. | The [[program counter]] on the ARM1 always points to the instruction being fetched. That is, with every instruction being exactly 4 [[bytes]], the currently executing instruction is always <code>PC - 8</code>. During the fetch stage, the address specified by the address register gets sent through the address pins and is fetched from memory. | ||
− | In conjunction with the address register is a dedicated incrementer which calculates the next address. The actual address for the next instruction will usually come from that incrementer. However, occasionally, the next instruction may also come from the [[ALU]] instead. On very rare occasions, the next instruction value can be forced to an exception. When the instruction is coming from the incrementer, the ARM1 will | + | In conjunction with the address register is a dedicated incrementer which calculates the next address. The actual address for the next instruction will usually come from that incrementer. However, occasionally, the next instruction may also come from the [[ALU]] instead. On very rare occasions, the next instruction value can be forced to an exception. When the instruction is coming from the incrementer, the ARM1 will assert this fact on the <code>SEQ</code> pin, allowing the external memory controller to know that the next instruction will in fact be +4 the current instruction, allowing it to determine if an address translation is necessary and prepare ahead. This is done to improve performance because it can make use of [[Page-Mode DRAM]], allowing for more efficient consecutive memory reads. |
+ | |||
+ | ==== Decode ==== | ||
+ | On the second cycle of each instruction, the decode occurs. At this stage the instruction is decoded and the appropriate control signals are generated. The ARM1 implements the decoding in a number of separate units: | ||
+ | |||
+ | * Instruction Decode, performs the top-level decoding | ||
+ | * Register Decode, decodes the register selection field | ||
+ | * ALU Decode, decodes the ALU operation | ||
+ | * Shift Decode, decodes the [[barrel shifter]] controls | ||
+ | |||
+ | The Register Decode handles the register selection for both read ports and the write port. | ||
+ | |||
+ | The reason the decode is implemented in a number of separate units is because the ARM1 makes use of [[microcode]] [[ROM]]s ([[PLA]]). Each instruction is decoded into up to four [[µOP]] signal-wise. In other words, the [[ARM]] instructions are broken down into up to four sets of internal-µOP signals indicating things such as which registers to select or what value to shift by. For some complex operations such as [[block-transfer instructions|block-transfers]], the [[microsequencer]] also performs a looping operation for each register. | ||
+ | |||
+ | ==== Execute ==== | ||
+ | [[File:arm1 register file.svg|right|250px]] | ||
+ | The ARM1 has a [[physical register file]] of 25 {{arch|32}} registers (same as the [[architectural register file]]). Register 15 ({{arm|R15}}) is the [[Program Counter]]. 16 of the registers are visible to the user with the remaining only being accessible while in supervisor mode. The register file has two read ports for the operands heading to the ALU and a single write port for the ALU write-back value. Additionally there is a dedicated {{arm|R15}} read and write port. | ||
+ | |||
+ | Each cycle two values are operated on. During clock phase 1 (''Φ1'') the values are fetched from the appropriate sources into the ALU for execution and during clock phase 2 (''Φ2''), the 32-bit ALU output is stored onto the Register File write port. | ||
+ | |||
+ | For a typical [[register-register]] operation, the first operand is fetched from the register file on Port 0 directly to the ALU while the second operand is fetched from the register file on Port 1 and through the barrel shifter to the ALU. For a [[register-immediate]] operation, the first operand is fetched from the register file on Port 0 directly to the ALU while the second operand is fetched from the instruction. | ||
+ | |||
+ | <div style="display: inline-block; padding-left: 10px; float: left;"> | ||
+ | <div style="float: left; margin: 10px;">'''Register-Register:'''<br>[[File:arm1 reg reg.svg|300px]]</div> | ||
+ | <div style="float: left; margin: 10px;">'''Register-Immediate:'''<br>[[File:arm1 reg imm.svg|300px]]</div></div> | ||
+ | |||
+ | It's worth noting that the ARM1 lacked hardware multiplication which meant software had to resort to a software-based solution (e.g., classic [[Shift-and-Add Multiplication]]). For example to perform <code>var = x * 5;</code> one could rewrite it as <code>var = x + (x << 2);</code> to achieve the same result without a multiplication operation. The downside for this is that unless it's done for very simple operations (such as this example), software multiplication is horrifically slow. | ||
+ | |||
+ | {{clear}} | ||
+ | |||
+ | ===== Interrupt ===== | ||
+ | The ARM1 has fast interrupt capabilities for real-time responses. Exceptions can occur internally or externally to the chip. The average interrupt latency is sub-2 µs with a worst case of sub-6 µs. An interrupt must wait for the currently executing instruction to complete before the interrupt executes. The current instruction completes only when a new instruction starts fetching. When an exception takes place, the processor sets the [[PC]] to a specific memory address within the [[interrupt vector table]]. | ||
+ | |||
+ | For example, in the case of <code>RESET</code> pin being asserted, the current pipeline gets flushed and the PC is forced to execute from address 0x0. | ||
+ | |||
+ | ===== Multi-Cycle Instruction ===== | ||
+ | [[File:arm1 multi-cycle.svg|left|400px]] | ||
+ | A number of ARM instructions cannot be implemented in a single cycle given the limited resources of the ARM1 (i.e., a single ALU and a single shifter). Instructions such as a store {{arm|STR}} (store register) requires calculating the effective address before it can store the data. To solve this problem, the ARM1 effectively runs the same instruction through the execute stage two to three times - in the first execute cycle is used to compute the address while the second execute stage the data store. | ||
+ | |||
+ | |||
+ | For example, in the case of a <code>STR R2, [R3, #4]</code>, in the first execute cycle the processor will perform <code>Address = R3 + 4</code>. ARM1 will then run another execute stage for this instruction. In the second cycle, ARM1 will generate a <code>DOUT = R2</code> at the address generate previously and assert the appropriate pins to indicate this is a write. | ||
+ | |||
+ | <table style="border-spacing: 10px; border: 1px solid black; text-align: center;" class="wikitable"> | ||
+ | <tr><td rowspan="6">{{arrow|down}}</td><th><code>ADD</code></th><td>''Fetch''</td><td>''Decode''</td><td>''Execute''</td></tr> | ||
+ | <tr><th><code>STR</code></th><td> </td><td>''Fetch''</td><td>''Decode''</td><td>''Address<br>Calculate''</td><td>''Data<br>Store''</td></tr> | ||
+ | <tr><th><code>ADD</code></th><td> </td><td> </td><td>''Fetch''</td><td> </td><td>''Decode''</td><td>''Execute''</td></tr> | ||
+ | <tr><th><code>ADD</code></th><td> </td><td> </td><td> </td><td>''Fetch''</td><td> </td><td>''Decode''</td><td>''Execute''</td></tr> | ||
+ | <tr><th><code>ADD</code></th><td> </td><td> </td><td> </td><td> </td><td> </td><td>''Fetch''</td><td>''Decode''</td><td>''Execute''</td></tr> | ||
+ | <tr><td colspan="9">Time {{arrow|right}}</td></tr> | ||
+ | </table> | ||
+ | |||
+ | {{clear}} | ||
+ | |||
+ | |||
+ | ===== Block-transfer ===== | ||
+ | {{see also|Block-Transfer Instructions}} | ||
+ | {{empty section}} | ||
+ | <!-- | ||
+ | Talk about the priority encoder ... | ||
+ | --> | ||
== Die Shot == | == Die Shot == | ||
Line 93: | Line 150: | ||
== All ARM1 Chips == | == All ARM1 Chips == | ||
− | {{ | + | <!-- NOTE: |
+ | This table is generated automatically from the data in the actual articles. | ||
+ | If a microprocessor is missing from the list, an appropriate article for it needs to be | ||
+ | created and tagged accordingly. | ||
+ | |||
+ | Missing a chip? please dump its name here: http://en.wikichip.org/wiki/WikiChip:wanted_chips | ||
+ | --> | ||
+ | {{comp table start}} | ||
+ | <table class="comptable sortable tc11 tc12 tc13"> | ||
+ | <tr class="comptable-header"><th> </th><th colspan="12">List of ARM1-based Processors</th></tr> | ||
+ | {{comp table header 1|cols=Process, Launched, Frequency, Power Dissipation, Max Memory}} | ||
+ | {{#ask: [[Category:all microprocessor models]] [[instance of::microprocessor]] [[microarchitecture::ARM1]] | ||
+ | |?full page name | ||
+ | |?model number | ||
+ | |?microarchitecture | ||
+ | |?first launched | ||
+ | |?base frequency#MHz | ||
+ | |?power dissipation | ||
+ | |?max memory#MiB | ||
+ | |format=template | ||
+ | |template=proc table 3 | ||
+ | |userparam=7 | ||
+ | |mainlabel=- | ||
+ | |valuesep=, | ||
+ | }} | ||
+ | {{comp table count|ask=[[Category:all microprocessor models]] [[instance of::microprocessor]] [[microarchitecture::ARM1]]}} | ||
+ | </table> | ||
+ | {{comp table end}} | ||
== References == | == References == | ||
* ARM hardware reference manual, ARM Evaluation System, Acorn OEM Products, August 1986 | * ARM hardware reference manual, ARM Evaluation System, Acorn OEM Products, August 1986 | ||
+ | * Furber, S. B., and A. R. P. Thomas. "ARM3—a study in design for compatibility." Microprocessors and Microsystems 14.6 (1990): 407-415. | ||
== Documents == | == Documents == | ||
− | + | * [[:File:ARM hardware reference manual.pdf|ARM hardware Reference Manual, ARM Evaluation System]], Acorn OEM Products, August 4, 1986. |
Latest revision as of 11:56, 14 January 2021
Edit Values | |
ARM1 µarch | |
General Info | |
Arch Type | CPU |
Designer | Acorn Computers |
Manufacturer | VLSI Technology |
Introduction | 1985 |
Process | 3 µm |
Core Configs | 1 |
Pipeline | |
Type | Scalar, Pipelined |
Stages | 3 |
Decode | 1-way |
Instructions | |
ISA | ARMv1 |
Cache | |
L1I Cache | 0 KiB/Core |
L1D Cache | 0 KiB/Core |
Succession | |
ARM1 was the first ARM microarchitecture implemented by Acorn Computers as a research and development project for the BBC Computer Literacy Project. ARM1 was introduced in 1985 and was extended to be used as a coprocessor in the Acorn's BBC Micro microcomputers. ARM1 was distributed as an evaluation system and was never commercialized.
Contents
History[edit]
- Main article: ARM's History
The ARM1 (Acorn RISC Machine 1) is Acorn Computers' first microprocessor design. The ARM1 was the initial result of the Advanced Research and Development division Acorn Computers formed in order to advance the development of their own RISC processor. The ARM instruction set design started in 1983. A reference model was written in BBC BASIC by Sophie Wilson and Steve Furber in just 808 lines of code. On April 26 1985, after 6 man-years of design effort, the first ARM processor prototype was delivered. The first batch of prototypes were functional and were shipped to customers in the form of evaluation systems. At that time the ARM1 was the simplest RISC processor produced.
The first prototype tested worked on the first try, this was despite the ammeter reading no power. The prototype test board designed was faulty with a short. The chip was entirely running off the leakage from the I/Os. Designed to run at 1 W, the chip averaged under 100 mW typical power.
Originally intended to perform at roughly 1.5 times performance of the VAX 11/780, the prototypes ended up achieveing between 2x to 4x the performance of the DEC VAX 11/780; this is roughly equivalent to 10 times that of that original IBM PC AT or that of the Motorola 68020 operating at 16.67 MHz.
Process Technology[edit]
- See also: 3 µm process
ARM1 chips were manufactured by VLSI Technology on a 3 µm double-level metal CMOS process.
Architecture[edit]
The ARM1 is based on the ARMv1 ISA which is an entirely clean-sheet 32-bit RISC design.
Overview[edit]
- 3 µm process
- Implements ARMv1
- Goal 1.5x performance of the VAX 11/780
- 26-bit address space
- Pipeline
- Very simple
- 3-stage
- No hardware multiplication
- 25 32-bit registers
- 16 For user
- 9 For supervisor
- 4 Modes
- User, Supervisor, IRQ, FIQ
Block Diagram[edit]
Core[edit]
Core[edit]
The ARM1 is an extremely simple 32-bit single-chip RISC microprocessor implementation with a number of CISC features.
Pipeline[edit]
The ARM1 utilizes a pipelining technique in order to improve performance and efficiency. At peak performance the ARM1 can reach 8 million instructions per second with an average of 3 MIPS when using a 150 ns row access DRAM. The ARM1's pipeline consists of 3 stages (although some instructions may take as much as 5 cycles):
The ARM1 operates on a guaranteed non-overlapping two-phase clock which allowed for level-triggered transfer instead of edge-triggering. The two clock phases are not generated on-die but come from an external oscillator. A complete cycle on the ARM1 is therefore Φ1 + Φ2. To simplify system design, these clocks may be stretched to work in-sync with memory access times.
Fetch[edit]
The Instruction Pipe is a functional block that holds awaiting instructions until execution, it therefore holds a number of instruction sufficient to ensure instructions are always executing at all cycles on all stages.
The program counter on the ARM1 always points to the instruction being fetched. That is, with every instruction being exactly 4 bytes, the currently executing instruction is always PC - 8
. During the fetch stage, the address specified by the address register gets sent through the address pins and is fetched from memory.
In conjunction with the address register is a dedicated incrementer which calculates the next address. The actual address for the next instruction will usually come from that incrementer. However, occasionally, the next instruction may also come from the ALU instead. On very rare occasions, the next instruction value can be forced to an exception. When the instruction is coming from the incrementer, the ARM1 will assert this fact on the SEQ
pin, allowing the external memory controller to know that the next instruction will in fact be +4 the current instruction, allowing it to determine if an address translation is necessary and prepare ahead. This is done to improve performance because it can make use of Page-Mode DRAM, allowing for more efficient consecutive memory reads.
Decode[edit]
On the second cycle of each instruction, the decode occurs. At this stage the instruction is decoded and the appropriate control signals are generated. The ARM1 implements the decoding in a number of separate units:
- Instruction Decode, performs the top-level decoding
- Register Decode, decodes the register selection field
- ALU Decode, decodes the ALU operation
- Shift Decode, decodes the barrel shifter controls
The Register Decode handles the register selection for both read ports and the write port.
The reason the decode is implemented in a number of separate units is because the ARM1 makes use of microcode ROMs (PLA). Each instruction is decoded into up to four µOP signal-wise. In other words, the ARM instructions are broken down into up to four sets of internal-µOP signals indicating things such as which registers to select or what value to shift by. For some complex operations such as block-transfers, the microsequencer also performs a looping operation for each register.
Execute[edit]
The ARM1 has a physical register file of 25 32-bit registers (same as the architectural register file). Register 15 (R15) is the Program Counter. 16 of the registers are visible to the user with the remaining only being accessible while in supervisor mode. The register file has two read ports for the operands heading to the ALU and a single write port for the ALU write-back value. Additionally there is a dedicated R15 read and write port.
Each cycle two values are operated on. During clock phase 1 (Φ1) the values are fetched from the appropriate sources into the ALU for execution and during clock phase 2 (Φ2), the 32-bit ALU output is stored onto the Register File write port.
For a typical register-register operation, the first operand is fetched from the register file on Port 0 directly to the ALU while the second operand is fetched from the register file on Port 1 and through the barrel shifter to the ALU. For a register-immediate operation, the first operand is fetched from the register file on Port 0 directly to the ALU while the second operand is fetched from the instruction.
It's worth noting that the ARM1 lacked hardware multiplication which meant software had to resort to a software-based solution (e.g., classic Shift-and-Add Multiplication). For example to perform var = x * 5;
one could rewrite it as var = x + (x << 2);
to achieve the same result without a multiplication operation. The downside for this is that unless it's done for very simple operations (such as this example), software multiplication is horrifically slow.
Interrupt[edit]
The ARM1 has fast interrupt capabilities for real-time responses. Exceptions can occur internally or externally to the chip. The average interrupt latency is sub-2 µs with a worst case of sub-6 µs. An interrupt must wait for the currently executing instruction to complete before the interrupt executes. The current instruction completes only when a new instruction starts fetching. When an exception takes place, the processor sets the PC to a specific memory address within the interrupt vector table.
For example, in the case of RESET
pin being asserted, the current pipeline gets flushed and the PC is forced to execute from address 0x0.
Multi-Cycle Instruction[edit]
A number of ARM instructions cannot be implemented in a single cycle given the limited resources of the ARM1 (i.e., a single ALU and a single shifter). Instructions such as a store STR (store register) requires calculating the effective address before it can store the data. To solve this problem, the ARM1 effectively runs the same instruction through the execute stage two to three times - in the first execute cycle is used to compute the address while the second execute stage the data store.
For example, in the case of a STR R2, [R3, #4]
, in the first execute cycle the processor will perform Address = R3 + 4
. ARM1 will then run another execute stage for this instruction. In the second cycle, ARM1 will generate a DOUT = R2
at the address generate previously and assert the appropriate pins to indicate this is a write.
ADD | Fetch | Decode | Execute | |||||
---|---|---|---|---|---|---|---|---|
STR | Fetch | Decode | Address Calculate | Data Store | ||||
ADD | Fetch | Decode | Execute | |||||
ADD | Fetch | Decode | Execute | |||||
ADD | Fetch | Decode | Execute | |||||
Time |
Block-transfer[edit]
- See also: Block-Transfer Instructions
This section is empty; you can help add the missing info by editing this page. |
Die Shot[edit]
- 3 µm process
- 24,800 transistors
- ~6,000 gates
- ~7 mm x 7mm
- 50 mm² die size
- PLCC-82 (Plastic leaded chip carrier)
- 74 signal pins
- 8 power/ground pins
All ARM1 Chips[edit]
List of ARM1-based Processors | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Process | Launched | Frequency | Power Dissipation | Max Memory | |||||||
VC2588 | ARM1 | 1985 | 6 MHz 0.006 GHz 6,000 kHz | 1 W 1,000 mW 0.00134 hp 0.001 kW | 64 MiB 65,536 KiB 67,108,864 B 0.0625 GiB 6.103516e-5 TiB | |||||||
Count: 1 |
References[edit]
- ARM hardware reference manual, ARM Evaluation System, Acorn OEM Products, August 1986
- Furber, S. B., and A. R. P. Thomas. "ARM3—a study in design for compatibility." Microprocessors and Microsystems 14.6 (1990): 407-415.
Documents[edit]
- ARM hardware Reference Manual, ARM Evaluation System, Acorn OEM Products, August 4, 1986.
codename | ARM1 + |
core count | 1 + |
designer | Acorn Computers + |
first launched | 1985 + |
full page name | acorn/microarchitectures/arm1 + |
instance of | microarchitecture + |
instruction set architecture | ARMv1 + |
manufacturer | VLSI Technology + |
microarchitecture type | CPU + |
name | ARM1 + |
pipeline stages | 3 + |
process | 3,000 nm (3 μm, 0.003 mm) + |