From WikiChip
Merced - Microarchitectures - Intel
< intel‎ | microarchitectures
Revision as of 01:39, 20 January 2021 by Chlamchowder (talk | contribs) (Filling out info, work in progress)

Edit Values
Merced µarch
General Info
Arch TypeCPU
DesignerIntel
ManufacturerIntel
IntroductionJune, 2001
Process180 nm
Core Configs1
Instructions
ISAIA-64
Succession

Merced was the first Itanium microarchitecture designed by Intel.

Architecture

  • 10 stage pipeline
    • IPG: Get next instruction pointer
    • FET: Fetch from instruction cache
    • ROT: Instruction rotation, decoupling buffer
    • EXP: Instruction dispersal
    • REN: Register remapping
    • WLD: Word line decode
    • REG: Register file read
    • EXE: Execute
    • DET: Exception detection
    • WRB: Writeback
  • Branch Predictor
    • Early zero bubble predictor using Target Address Registers controlled by the compiler
    • Two-level predictor with 4 bits of local history and a 512 entry prediction table
    • Indirect branches handled with 64 entry Multiway Branch Prediction Table
    • 64-entry Target Address Cache
    • 8 entry Return Address Stack
    • Branch predictor can resteer at ROT stage using the loop exit predictor, or compiler provided prediction hints for the third slot in a bundle.
    • Branch predictor can resteer at EXP stage for any branch
  • 16K 4-way L1 Instruction Cache
  • Fetch and Decode
    • Two bundles, each containing three instructions, fetched from the instruction cache every cycle
    • No decoder necessary
  • Execution Engine
    • Instructions from bundles dispersed to issue ports
    • Scoreboarding resolves dependencies with compiler hints
    • FPU can be accessed from the integer side by floating point get and set instructions.
      • Transfer from FPU to integer side takes two clocks
      • Transfer from integer side to FPU takes 9 clocks
  • Memory Subsystem
    • L1D has two cycle latency
    • Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load.
    • The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency.

merced.png

codenameMerced +
core count1 +
designerIntel +
first launchedJune 2001 +
full page nameintel/microarchitectures/merced +
instance ofmicroarchitecture +
instruction set architectureIA-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameMerced +
process180 nm (0.18 μm, 1.8e-4 mm) +