From WikiChip
Difference between revisions of "intel/microarchitectures/merced"
< intel‎ | microarchitectures

(Filling out info, work in progress)
m (Filling out info, work in progress)
Line 79: Line 79:
 
** Two bundles, each containing three instructions, fetched from the instruction cache every cycle
 
** Two bundles, each containing three instructions, fetched from the instruction cache every cycle
 
** No decoder necessary
 
** No decoder necessary
 +
* Execution Engine
 +
** Instructions from bundles dispersed to issue ports
 +
** Scoreboarding resolves dependencies with compiler hints
 +
** FPU can be accessed from the integer side by floating point get and set instructions.
 +
*** Transfer from FPU to integer side takes two clocks
 +
*** Transfer from integer side to FPU takes 9 clocks
 +
* Memory Subsystem
 +
** L1D has two cycle latency
 +
** Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load.
 +
** The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency.
  
[[File:merced.png|thumb|Block diagram of Merced]]
+
[[File:merced.png]]

Revision as of 02:39, 20 January 2021

Edit Values
Merced µarch
General Info
Arch TypeCPU
DesignerIntel
ManufacturerIntel
IntroductionJune, 2001
Process180 nm
Core Configs1
Instructions
ISAIA-64
Succession

Merced was the first Itanium microarchitecture designed by Intel.

Architecture

  • 10 stage pipeline
    • IPG: Get next instruction pointer
    • FET: Fetch from instruction cache
    • ROT: Instruction rotation, decoupling buffer
    • EXP: Instruction dispersal
    • REN: Register remapping
    • WLD: Word line decode
    • REG: Register file read
    • EXE: Execute
    • DET: Exception detection
    • WRB: Writeback
  • Branch Predictor
    • Early zero bubble predictor using Target Address Registers controlled by the compiler
    • Two-level predictor with 4 bits of local history and a 512 entry prediction table
    • Indirect branches handled with 64 entry Multiway Branch Prediction Table
    • 64-entry Target Address Cache
    • 8 entry Return Address Stack
    • Branch predictor can resteer at ROT stage using the loop exit predictor, or compiler provided prediction hints for the third slot in a bundle.
    • Branch predictor can resteer at EXP stage for any branch
  • 16K 4-way L1 Instruction Cache
  • Fetch and Decode
    • Two bundles, each containing three instructions, fetched from the instruction cache every cycle
    • No decoder necessary
  • Execution Engine
    • Instructions from bundles dispersed to issue ports
    • Scoreboarding resolves dependencies with compiler hints
    • FPU can be accessed from the integer side by floating point get and set instructions.
      • Transfer from FPU to integer side takes two clocks
      • Transfer from integer side to FPU takes 9 clocks
  • Memory Subsystem
    • L1D has two cycle latency
    • Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load.
    • The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency.

merced.png

codenameMerced +
core count1 +
designerIntel +
first launchedJune 2001 +
full page nameintel/microarchitectures/merced +
instance ofmicroarchitecture +
instruction set architectureIA-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameMerced +
process180 nm (0.18 μm, 1.8e-4 mm) +