From WikiChip
Difference between revisions of "intel/microarchitectures/merced"
Chlamchowder (talk | contribs) (Filling out info, work in progress) |
Chlamchowder (talk | contribs) m (Filling out info, work in progress) |
||
| Line 79: | Line 79: | ||
** Two bundles, each containing three instructions, fetched from the instruction cache every cycle | ** Two bundles, each containing three instructions, fetched from the instruction cache every cycle | ||
** No decoder necessary | ** No decoder necessary | ||
| + | * Execution Engine | ||
| + | ** Instructions from bundles dispersed to issue ports | ||
| + | ** Scoreboarding resolves dependencies with compiler hints | ||
| + | ** FPU can be accessed from the integer side by floating point get and set instructions. | ||
| + | *** Transfer from FPU to integer side takes two clocks | ||
| + | *** Transfer from integer side to FPU takes 9 clocks | ||
| + | * Memory Subsystem | ||
| + | ** L1D has two cycle latency | ||
| + | ** Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load. | ||
| + | ** The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency. | ||
| − | [[File:merced.png | + | [[File:merced.png]] |
Revision as of 01:39, 20 January 2021
| Edit Values | |
| Merced µarch | |
| General Info | |
| Arch Type | CPU |
| Designer | Intel |
| Manufacturer | Intel |
| Introduction | June, 2001 |
| Process | 180 nm |
| Core Configs | 1 |
| Instructions | |
| ISA | IA-64 |
| Succession | |
Merced was the first Itanium microarchitecture designed by Intel.
Architecture
- 10 stage pipeline
- IPG: Get next instruction pointer
- FET: Fetch from instruction cache
- ROT: Instruction rotation, decoupling buffer
- EXP: Instruction dispersal
- REN: Register remapping
- WLD: Word line decode
- REG: Register file read
- EXE: Execute
- DET: Exception detection
- WRB: Writeback
- Branch Predictor
- Early zero bubble predictor using Target Address Registers controlled by the compiler
- Two-level predictor with 4 bits of local history and a 512 entry prediction table
- Indirect branches handled with 64 entry Multiway Branch Prediction Table
- 64-entry Target Address Cache
- 8 entry Return Address Stack
- Branch predictor can resteer at ROT stage using the loop exit predictor, or compiler provided prediction hints for the third slot in a bundle.
- Branch predictor can resteer at EXP stage for any branch
- 16K 4-way L1 Instruction Cache
- Fetch and Decode
- Two bundles, each containing three instructions, fetched from the instruction cache every cycle
- No decoder necessary
- Execution Engine
- Instructions from bundles dispersed to issue ports
- Scoreboarding resolves dependencies with compiler hints
- FPU can be accessed from the integer side by floating point get and set instructions.
- Transfer from FPU to integer side takes two clocks
- Transfer from integer side to FPU takes 9 clocks
- Memory Subsystem
- L1D has two cycle latency
- Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load.
- The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency.
Facts about "Merced - Microarchitectures - Intel"
| codename | Merced + |
| core count | 1 + |
| designer | Intel + |
| first launched | June 2001 + |
| full page name | intel/microarchitectures/merced + |
| instance of | microarchitecture + |
| instruction set architecture | IA-64 + |
| manufacturer | Intel + |
| microarchitecture type | CPU + |
| name | Merced + |
| process | 180 nm (0.18 μm, 1.8e-4 mm) + |
