From WikiChip
Difference between revisions of "amd/microarchitectures/k6"
< amd‎ | microarchitectures

(add a block diagram)
 
(2 intermediate revisions by 2 users not shown)
Line 10: Line 10:
 
| process          = 350 nm
 
| process          = 350 nm
 
| process 2        = 250 nm
 
| process 2        = 250 nm
|isa=x86-16
+
|isa=x86-32
|isa 2=x86-32
 
  
 
| succession      = Yes
 
| succession      = Yes
Line 22: Line 21:
  
 
== Architecture ==
 
== Architecture ==
{{empty section}}
+
* 6-7 stage integer pipeline
 +
** Fetch
 +
** Decode
 +
** Micro-op issue
 +
** Operand Fetch
 +
** Execute 1
 +
** Optional Execute 2
 +
** Commit
 +
* Branch Predictor
 +
** 2-level predictor with 8192 entry branch history table
 +
*** Does not store target addresses. Target addresses are calculated during instruction decode
 +
** 16-entry branch target cache
 +
*** Caches 16 bytes of instructions at the branch target, and supplies that directly to the decoders to avoid a 1-cycle penalty
 +
** 16-entry return address stack
 +
* 32K 2-way L1 Instruction Cache
 +
** Also holds 20K of predecode data. Instructions are predecoded as L1 instruction cache is filled
 +
** 64 entry TLB
 +
* Fetch and Decode
 +
** 16 instruction bytes fetched per cycle, either from L1 instruction cache or branch target cache
 +
** Decoders can handle the following combinations per clock:
 +
*** Short decode: Two x86 instructions that generate up to two micro-ops each
 +
*** Vector decode: One x86 instruction that generates up to four micro-ops
 +
*** Complex instructions handled by microcode
 +
* Out of order execution resources
 +
** Scheduler holds 6 groups of 4 micro-ops, or up to 12 x64 instructions
 +
*** Receives a group of 4 micro-ops from decoders every cycle. If fewer than 4 micro-ops are generated from the decoders, empty slots are padded with NOPs
 +
*** Retires one 4-micro-op group every cycle
 +
** 48 integer registers: 8 architectural, 16 scratch, 24 rename
 +
** 21 MMX registers: 8 architectural, 1 scratch, 12 rename
 +
** Up to 7 outstanding branches
 +
** 7 entry store queue
 +
* Issues up to 6 micro-ops per cycle. Ports:
 +
** Integer X: All ALU ops, including multiplies and divides
 +
** Integer Y: Basic ALU ops
 +
*** Both integer pipelines can handle MMX/3DNow instructions, but share MMX/3DNow multiplier and MMX shifter functional units. If the two pipelines try to issue operations to a shared functional unit, one operation will stall.
 +
** Floating point
 +
** Load
 +
** Store
 +
** Branch
 +
* 32K 2-way L1 Data Cache
 +
** Dual ported, write back
 +
** 128 entry TLB
 +
** Load unit has 2-cycle latency
 +
[[File:K6 block diagram.png]]
  
 
== Die Shot ==
 
== Die Shot ==
Line 58: Line 100:
 
* {{amd|K6}}
 
* {{amd|K6}}
 
* {{intel|P6}}
 
* {{intel|P6}}
 +
 +
== References ==
 +
* AMD K6-2 Data Sheet [http://www.amd-k6.com/wp-content/uploads/2012/07/AMD_K6-2_Desktop_Datasheet.pdf]
 +
* AMD K6 Code Optimization [http://www.ii.uib.no/~osvik/amd_opt/21924d.pdf]

Latest revision as of 17:57, 22 May 2019

Edit Values
K6 µarch
General Info
Arch TypeCPU
DesignerAMD, NexGen
ManufacturerAMD
IntroductionApril 2, 1997
Phase-out2000
Process350 nm, 250 nm
Instructions
ISAx86-32
Succession

K6 was the microarchitecture for AMD's K6 line of microprocessors as a successor to the K5. Contrary to its namesake, K6 design is entirely NexGen's and not based on K5. Launched in early 1997, the K6 microarchitecture provided the superior performance AMD needed to become a viable competitor to Intel (performance K5 failed to deliver). K6 was superseded by K6-2 in 1998.

Architecture[edit]

  • 6-7 stage integer pipeline
    • Fetch
    • Decode
    • Micro-op issue
    • Operand Fetch
    • Execute 1
    • Optional Execute 2
    • Commit
  • Branch Predictor
    • 2-level predictor with 8192 entry branch history table
      • Does not store target addresses. Target addresses are calculated during instruction decode
    • 16-entry branch target cache
      • Caches 16 bytes of instructions at the branch target, and supplies that directly to the decoders to avoid a 1-cycle penalty
    • 16-entry return address stack
  • 32K 2-way L1 Instruction Cache
    • Also holds 20K of predecode data. Instructions are predecoded as L1 instruction cache is filled
    • 64 entry TLB
  • Fetch and Decode
    • 16 instruction bytes fetched per cycle, either from L1 instruction cache or branch target cache
    • Decoders can handle the following combinations per clock:
      • Short decode: Two x86 instructions that generate up to two micro-ops each
      • Vector decode: One x86 instruction that generates up to four micro-ops
      • Complex instructions handled by microcode
  • Out of order execution resources
    • Scheduler holds 6 groups of 4 micro-ops, or up to 12 x64 instructions
      • Receives a group of 4 micro-ops from decoders every cycle. If fewer than 4 micro-ops are generated from the decoders, empty slots are padded with NOPs
      • Retires one 4-micro-op group every cycle
    • 48 integer registers: 8 architectural, 16 scratch, 24 rename
    • 21 MMX registers: 8 architectural, 1 scratch, 12 rename
    • Up to 7 outstanding branches
    • 7 entry store queue
  • Issues up to 6 micro-ops per cycle. Ports:
    • Integer X: All ALU ops, including multiplies and divides
    • Integer Y: Basic ALU ops
      • Both integer pipelines can handle MMX/3DNow instructions, but share MMX/3DNow multiplier and MMX shifter functional units. If the two pipelines try to issue operations to a shared functional unit, one operation will stall.
    • Floating point
    • Load
    • Store
    • Branch
  • 32K 2-way L1 Data Cache
    • Dual ported, write back
    • 128 entry TLB
    • Load unit has 2-cycle latency

K6 block diagram.png

Die Shot[edit]

New text document.svg This section is empty; you can help add the missing info by editing this page.

All K6 Chips[edit]

K6 Chips
ModelCoreLaunchedPower DissipationFreqMax Mem
AMD-K6-166ALR6k86May 199717.2 W
17,200 mW
0.0231 hp
0.0172 kW
166.66 MHz
0.167 GHz
166,660 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6-166ALYD6k86May 199717.2 W
17,200 mW
0.0231 hp
0.0172 kW
166.66 MHz
0.167 GHz
166,660 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6-200AFR6k86May 199720 W
20,000 mW
0.0268 hp
0.02 kW
199.99 MHz
0.2 GHz
199,990 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6-200ALR6k86May 199720 W
20,000 mW
0.0268 hp
0.02 kW
199.99 MHz
0.2 GHz
199,990 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6-200ALYD6k86May 199720 W
20,000 mW
0.0268 hp
0.02 kW
199.99 MHz
0.2 GHz
199,990 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6-233AFR6k86May 199728.3 W
28,300 mW
0.038 hp
0.0283 kW
233.33 MHz
0.233 GHz
233,330 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6-233ANR6k86May 199728.3 W
28,300 mW
0.038 hp
0.0283 kW
233.33 MHz
0.233 GHz
233,330 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6-233APR6k86May 199728.3 W
28,300 mW
0.038 hp
0.0283 kW
233.33 MHz
0.233 GHz
233,330 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/233ACZLittle Foot5 March 19989 W
9,000 mW
0.0121 hp
0.009 kW
233.33 MHz
0.233 GHz
233,330 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/233ADZLittle Foot5 March 19989 W
9,000 mW
0.0121 hp
0.009 kW
233.33 MHz
0.233 GHz
233,330 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/233BCZLittle Foot5 March 19989 W
9,000 mW
0.0121 hp
0.009 kW
233.33 MHz
0.233 GHz
233,330 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/266ACZLittle Foot5 March 19989.8 W
9,800 mW
0.0131 hp
0.0098 kW
266.66 MHz
0.267 GHz
266,660 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/266ADZLittle Foot5 March 19989.8 W
9,800 mW
0.0131 hp
0.0098 kW
266.66 MHz
0.267 GHz
266,660 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/266AFRLittle Foot5 March 199814.5 W
14,500 mW
0.0194 hp
0.0145 kW
266.66 MHz
0.267 GHz
266,660 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/266BCZLittle Foot5 March 19989.8 W
9,800 mW
0.0131 hp
0.0098 kW
266.66 MHz
0.267 GHz
266,660 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/300ADZLittle Foot5 March 199811 W
11,000 mW
0.0148 hp
0.011 kW
299.99 MHz
0.3 GHz
299,990 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/300AFRLittle Foot5 March 199815.4 W
15,400 mW
0.0207 hp
0.0154 kW
299.99 MHz
0.3 GHz
299,990 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/300BDZLittle Foot5 March 199811 W
11,000 mW
0.0148 hp
0.011 kW
299.99 MHz
0.3 GHz
299,990 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/PR2-166ALR6k862 April 199717.2 W
17,200 mW
0.0231 hp
0.0172 kW
166.66 MHz
0.167 GHz
166,660 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
AMD-K6/PR2-200ALR6k862 April 199720 W
20,000 mW
0.0268 hp
0.02 kW
199.99 MHz
0.2 GHz
199,990 kHz
4,096 MiB
4,194,304 KiB
4,294,967,296 B
4 GiB
0.00391 TiB
Count: 20

See also[edit]

References[edit]

  • AMD K6-2 Data Sheet [1]
  • AMD K6 Code Optimization [2]
codenameK6 +
designerAMD + and NexGen +
first launchedApril 2, 1997 +
full page nameamd/microarchitectures/k6 +
instance ofmicroarchitecture +
instruction set architecturex86-32 +
manufacturerAMD +
microarchitecture typeCPU +
nameK6 +
phase-out2000 +
process350 nm (0.35 μm, 3.5e-4 mm) + and 250 nm (0.25 μm, 2.5e-4 mm) +