From WikiChip
Difference between revisions of "intel/microarchitectures/merced"
Chlamchowder (talk | contribs) (Filling out info, work in progress) |
Chlamchowder (talk | contribs) m (Filling out info, work in progress) |
||
Line 79: | Line 79: | ||
** Two bundles, each containing three instructions, fetched from the instruction cache every cycle | ** Two bundles, each containing three instructions, fetched from the instruction cache every cycle | ||
** No decoder necessary | ** No decoder necessary | ||
+ | * Execution Engine | ||
+ | ** Instructions from bundles dispersed to issue ports | ||
+ | ** Scoreboarding resolves dependencies with compiler hints | ||
+ | ** FPU can be accessed from the integer side by floating point get and set instructions. | ||
+ | *** Transfer from FPU to integer side takes two clocks | ||
+ | *** Transfer from integer side to FPU takes 9 clocks | ||
+ | * Memory Subsystem | ||
+ | ** L1D has two cycle latency | ||
+ | ** Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load. | ||
+ | ** The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency. | ||
− | [[File:merced.png | + | [[File:merced.png]] |
Revision as of 01:39, 20 January 2021
Edit Values | |
Merced µarch | |
General Info | |
Arch Type | CPU |
Designer | Intel |
Manufacturer | Intel |
Introduction | June, 2001 |
Process | 180 nm |
Core Configs | 1 |
Instructions | |
ISA | IA-64 |
Succession | |
Merced was the first Itanium microarchitecture designed by Intel.
Architecture
- 10 stage pipeline
- IPG: Get next instruction pointer
- FET: Fetch from instruction cache
- ROT: Instruction rotation, decoupling buffer
- EXP: Instruction dispersal
- REN: Register remapping
- WLD: Word line decode
- REG: Register file read
- EXE: Execute
- DET: Exception detection
- WRB: Writeback
- Branch Predictor
- Early zero bubble predictor using Target Address Registers controlled by the compiler
- Two-level predictor with 4 bits of local history and a 512 entry prediction table
- Indirect branches handled with 64 entry Multiway Branch Prediction Table
- 64-entry Target Address Cache
- 8 entry Return Address Stack
- Branch predictor can resteer at ROT stage using the loop exit predictor, or compiler provided prediction hints for the third slot in a bundle.
- Branch predictor can resteer at EXP stage for any branch
- 16K 4-way L1 Instruction Cache
- Fetch and Decode
- Two bundles, each containing three instructions, fetched from the instruction cache every cycle
- No decoder necessary
- Execution Engine
- Instructions from bundles dispersed to issue ports
- Scoreboarding resolves dependencies with compiler hints
- FPU can be accessed from the integer side by floating point get and set instructions.
- Transfer from FPU to integer side takes two clocks
- Transfer from integer side to FPU takes 9 clocks
- Memory Subsystem
- L1D has two cycle latency
- Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load.
- The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency.
Facts about "Merced - Microarchitectures - Intel"
codename | Merced + |
core count | 1 + |
designer | Intel + |
first launched | June 2001 + |
full page name | intel/microarchitectures/merced + |
instance of | microarchitecture + |
instruction set architecture | IA-64 + |
manufacturer | Intel + |
microarchitecture type | CPU + |
name | Merced + |
process | 180 nm (0.18 μm, 1.8e-4 mm) + |