From WikiChip
ARM1 - Microarchitectures - ARM
< acorn
Revision as of 01:57, 25 June 2017 by Inject (talk | contribs) (Pipeline)

Edit Values
ARM1 µarch
General Info
Arch TypeCPU
DesignerARM Holdings
ManufacturerVLSI Technology
Introduction1985
Process3 µm
Core Configs1
Pipeline
TypeScalar, Pipelined
Stages3
Decode1-way
Instructions
ISAARMv1
Cache
L1I Cache0 KiB/Core
L1D Cache0 KiB/Core
Succession

ARM1 was the first ARM microarchitecture implemented by ARM Holdings (then Acorn Computers) as a research and development project for the BBC Computer Literacy Project. ARM1 was introduced in 1985 and was extended to be used as a coprocessor in the Acorn's BBC Micro microcomputers. ARM1 was distributed as an evaluation system and was never commercialized.

History

Main article: ARM's History

The ARM1 (Acorn RISC Machine 1) is Acorn Computers' first microprocessor design. The ARM1 was the initial result of the Advanced Research and Development division Acorn Computers formed in order to advance the development of their own RISC processor. The ARM instruction set design started in 1983. A reference model was written in BBC BASIC by Sophie Wilson and Steve Furber in just 808 lines of code. On April 26 1985, after 6 man-years of design effort, the first ARM processor prototype was delivered. The first batch of prototypes were functional and were shipped to customers in the form of evaluation systems. At that time the ARM1 was the simplest RISC processor produced.

The first prototype tested worked on the first try, this was despite the ammeter reading no power. The prototype test board designed was faulty with a short. The chip was entirely running off the leakage from the I/Os. Designed to run at 1 W, the chip averaged under 100 mW typical power.

Process Technology

See also: 3 µm process

ARM1 chips were manufactured by VLSI Technology on a 3 µm double-level metal CMOS process.

Architecture

The ARM1 is based on the ARMv1 ISA which is an entirely clean-sheet 32-bit RISC design.

Overview

  • 3 µm process
  • Implements ARMv1
  • Goal 1.5x performance of the VAX 11/780
  • 26-bit address space
  • Pipeline
    • Very simple
    • 3-stage
    • No hardware multiplication
    • 25 32-bit registers
      • 16 For user
      • 9 For supervisor
    • 4 Modes
      • User, Supervisor, IRQ, FIQ

Block Diagram

Core

arm1 block diagram.svg

Core

The ARM1 is an extremely simple 32-bit single-chip RISC microprocessor implementation with a number of CISC features.

Pipeline

The ARM1 utilizes a pipelining technique in order to improve performance and efficiency. The ARM1's pipeline consists of 3 states (although some instructions may take as much as 5 cycles):


arm1 pipeline.svg


The ARM1 operates on a guaranteed non-overlapping two-phase clock which allowed for level-triggered transfer instead of edge-triggering. The two clock phases are not generated on-die but come from an external oscillator. A complete cycle on the ARM1 is therefore Φ1 + Φ2.

Fetch

arm1 pc.svg

The Instruction Pipe is a functional block that holds awaiting instructions until execution, it therefore holds a number of instruction sufficient to ensure instructions are always executing at all cycles on all stages.

The program counter on the ARM1 always points to the instruction being fetched. That is, with every instruction being exactly 4 bytes, the currently executing instruction is always PC - 8. During the fetch stage, the address specified by the address register gets sent through the address pins and is fetched from memory.

In conjunction with the address register is a dedicated incrementer which calculates the next address. The actual address for the next instruction will usually come from that incrementer. However, occasionally, the next instruction may also come from the ALU instead. On very rare occasions, the next instruction value can be forced to an exception. When the instruction is coming from the incrementer, the ARM1 will assert this fact on the SEQ pin, allowing the external memory controller to know that the next instruction will in fact be +4 the current instruction, allowing it to determine if an address translation is necessary and prepare ahead. This is done to improve performance because it can make use of Page-Mode DRAM, allowing for more efficient consecutive memory reads.

Decode

On the second cycle of each instruction, the decode occurs. At this stage the instruction is decoded and the appropriate control signals are generated. The ARM1 implements the decoding in a number of separate units:

  • Instruction Decode, performs the top-level decoding
  • Register Decode, decodes the register selection field
  • ALU Decode, decodes the ALU operation
  • Shift Decode, decodes the barrel shifter controls

The Register Decode handles the register selection for both read ports and the write port.

Execute

arm1 register file.svg

The ARM1 has a physical register file of 25 32-bit registers (same as the architectural register file). Register 15 (R15) is the Program Counter. 16 of the registers are visible to the user with the reminder only being accessible while in supervisor mode. The register file has two read ports for the operands heading to the ALU and a single write port for the ALU write-back value. Additionally there is a dedicated R15 read and write port.

Each cycle two values are operated on. During clock phase 1 (Φ1) the values are fetched from the appropriate sources into the ALU for execution and during clock phase 2 (Φ2), the 32-bit ALU output is stored onto the Register File write port.

For a typical register-register operation, the first operand is fetched from the register file on Port 0 directly to the ALU while the second operand is fetched from the register file on Port 1 and through the barrel shifter to the ALU. For a register-immediate operation, the first operand is fetched from the register file on Port 0 directly to the ALU while the second operand is fetched from the instruction.

Multi-Cycle Instruction

A number of ARM instructions cannot be implemented in a single cycle given the limited resources of the ARM1 (i.e., a single ALU and a single shifter). Instructions such as a store STR (store register) requires calculating the effective address before it can store the data. To solve this problem, the ARM1 effectively runs the same instruction through the execute stage two to three times - in the first execute cycle is used to compute the address while the second execute stage the data store.

Die Shot

  • 3 µm process
  • 24,800 transistors
  • ~6,000 gates
  • ~7 mm x 7mm
  • 50 mm² die size
  • PLCC-82 (Plastic leaded chip carrier)
    • 74 signal pins
    • 8 power/ground pins


arm1 die shot.png


arm1 die shot (annotated).png

All ARM1 Chips

New text document.svg This section is empty; you can help add the missing info by editing this page.

References

  • ARM hardware reference manual, ARM Evaluation System, Acorn OEM Products, August 1986

Documents

New text document.svg This section is empty; you can help add the missing info by editing this page.
codenameARM1 +
core count1 +
designerARM Holdings +
first launched1985 +
full page nameacorn/microarchitectures/arm1 +
instance ofmicroarchitecture +
instruction set architectureARMv1 +
manufacturerVLSI Technology +
microarchitecture typeCPU +
nameARM1 +
pipeline stages3 +
process3,000 nm (3 μm, 0.003 mm) +