From WikiChip
					
    Difference between revisions of "intel/microarchitectures/merced"    
                	
														Chlamchowder (talk | contribs)  (Filling out info, work in progress)  | 
				Chlamchowder (talk | contribs)  m (Filling out info, work in progress)  | 
				||
| Line 79: | Line 79: | ||
** Two bundles, each containing three instructions, fetched from the instruction cache every cycle  | ** Two bundles, each containing three instructions, fetched from the instruction cache every cycle  | ||
** No decoder necessary  | ** No decoder necessary  | ||
| + | * Execution Engine  | ||
| + | ** Instructions from bundles dispersed to issue ports  | ||
| + | ** Scoreboarding resolves dependencies with compiler hints  | ||
| + | ** FPU can be accessed from the integer side by floating point get and set instructions.   | ||
| + | *** Transfer from FPU to integer side takes two clocks  | ||
| + | *** Transfer from integer side to FPU takes 9 clocks  | ||
| + | * Memory Subsystem  | ||
| + | ** L1D has two cycle latency  | ||
| + | ** Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load.  | ||
| + | ** The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency.   | ||
| − | [[File:merced.png  | + | [[File:merced.png]]  | 
Revision as of 01:39, 20 January 2021
| Edit Values | |
| Merced µarch | |
| General Info | |
| Arch Type | CPU | 
| Designer | Intel | 
| Manufacturer | Intel | 
| Introduction | June, 2001 | 
| Process | 180 nm | 
| Core Configs | 1 | 
| Instructions | |
| ISA | IA-64 | 
| Succession | |
Merced was the first Itanium microarchitecture designed by Intel.
Architecture
-  10 stage pipeline
- IPG: Get next instruction pointer
 - FET: Fetch from instruction cache
 - ROT: Instruction rotation, decoupling buffer
 - EXP: Instruction dispersal
 - REN: Register remapping
 - WLD: Word line decode
 - REG: Register file read
 - EXE: Execute
 - DET: Exception detection
 - WRB: Writeback
 
 -  Branch Predictor
- Early zero bubble predictor using Target Address Registers controlled by the compiler
 - Two-level predictor with 4 bits of local history and a 512 entry prediction table
 - Indirect branches handled with 64 entry Multiway Branch Prediction Table
 - 64-entry Target Address Cache
 - 8 entry Return Address Stack
 - Branch predictor can resteer at ROT stage using the loop exit predictor, or compiler provided prediction hints for the third slot in a bundle.
 - Branch predictor can resteer at EXP stage for any branch
 
 - 16K 4-way L1 Instruction Cache
 -  Fetch and Decode
- Two bundles, each containing three instructions, fetched from the instruction cache every cycle
 - No decoder necessary
 
 -  Execution Engine
- Instructions from bundles dispersed to issue ports
 - Scoreboarding resolves dependencies with compiler hints
 -  FPU can be accessed from the integer side by floating point get and set instructions. 
- Transfer from FPU to integer side takes two clocks
 - Transfer from integer side to FPU takes 9 clocks
 
 
 -  Memory Subsystem
- L1D has two cycle latency
 - Software can issue "advanced loads", which go into a Advanced Load Address Table that checks for conflicting stores. Software needs to check the ALAT before using the load result. If there's a conflict, a software handler has to reissue the conflicting load.
 - The FPU is directly fed by the dual-ported L2 cache, with 9 cycle load latency.
 
 
Facts about "Merced - Microarchitectures - Intel"
| codename | Merced + | 
| core count | 1 + | 
| designer | Intel + | 
| first launched | June 2001 + | 
| full page name | intel/microarchitectures/merced + | 
| instance of | microarchitecture + | 
| instruction set architecture | IA-64 + | 
| manufacturer | Intel + | 
| microarchitecture type | CPU + | 
| name | Merced + | 
| process | 180 nm (0.18 μm, 1.8e-4 mm) + | 
