Difference between revisions of "intel/microarchitectures/merced"

	Edit Values
	Merced µarch
	General Info
Arch Type	CPU
Designer	Intel
Manufacturer	Intel
Introduction	June, 2001
Process	180 nm
Core Configs	1
	Instructions
ISA	IA-64
	Succession
	McKinley

Revision as of 01:39, 20 January 2021

Merced was the first Itanium microarchitecture designed by Intel.

Architecture

10 stage pipeline
- IPG: Get next instruction pointer
- FET: Fetch from instruction cache
- ROT: Instruction rotation, decoupling buffer
- EXP: Instruction dispersal
- REN: Register remapping
- WLD: Word line decode
- REG: Register file read
- EXE: Execute
- DET: Exception detection
- WRB: Writeback
Branch Predictor
- Early zero bubble predictor using Target Address Registers controlled by the compiler
- Two-level predictor with 4 bits of local history and a 512 entry prediction table
- Indirect branches handled with 64 entry Multiway Branch Prediction Table
- 64-entry Target Address Cache
- 8 entry Return Address Stack
- Branch predictor can resteer at ROT stage using the loop exit predictor, or compiler provided prediction hints for the third slot in a bundle.
- Branch predictor can resteer at EXP stage for any branch
16K 4-way L1 Instruction Cache
Fetch and Decode
- Two bundles, each containing three instructions, fetched from the instruction cache every cycle
- No decoder necessary
Execution Engine
- Instructions from bundles dispersed to issue ports
- Scoreboarding resolves dependencies with compiler hints
- FPU can be accessed from the integer side by floating point get and set instructions.
  - Transfer from FPU to integer side takes two clocks
  - Transfer from integer side to FPU takes 9 clocks
Memory Subsystem
- L1D has two cycle latency
- Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load.
- The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency.

@@ Line 79: / Line 79: @@
 ** Two bundles, each containing three instructions, fetched from the instruction cache every cycle
 ** No decoder necessary
+* Execution Engine
+** Instructions from bundles dispersed to issue ports
+** Scoreboarding resolves dependencies with compiler hints
+** FPU can be accessed from the integer side by floating point get and set instructions.
+*** Transfer from FPU to integer side takes two clocks
+*** Transfer from integer side to FPU takes 9 clocks
+* Memory Subsystem
+** L1D has two cycle latency
+** Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load.
+** The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency.
-[[File:merced.png|thumb|Block diagram of Merced]]
+[[File:merced.png]]

codename	Merced +
core count	1 +
designer	Intel +
first launched	June 2001 +
full page name	intel/microarchitectures/merced +
instance of	microarchitecture +
instruction set architecture	IA-64 +
manufacturer	Intel +
microarchitecture type	CPU +
name	Merced +
process	180 nm (0.18 μm, 1.8e-4 mm) +

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas

Samsung

Revision as of 01:39, 20 January 2021

Architecture