From WikiChip
Saltwell - Microarchitectures - Intel
< intel‎ | microarchitectures(Redirected from saltwell)

Edit Values
Saltwell µarch
General Info
Arch TypeCPU
DesignerIntel
ManufacturerIntel
Introduction2011
Phase-out2013
Process32 nm
Core Configs1, 2
Pipeline
TypeSuperscalar, Superpipeline
SpeculativeNo
Reg RenamingNo
Stages16
Instructions
ISAx86-64
ExtensionsMOVBE, MMX, SSE, SSE2, SSE3, SSSE3
Cache
L1I Cache32 KiB/Core
8-way set associative
L1D Cache24 KiB/Core
6-way set associative
L2 Cache512 KiB/Cores
8-way set associative
Cores
Core NamesPenwell,
Cedarview,
Cloverview,
Centerton,
Briarwood,
Berryville
Succession

Saltwell was a microarchitecture for Intel's 32 nm ultra-low power system on chips first introduced in late 2011 for the Atom family. Saltwell is a shrink of Bonnell which also incorporated all support chips on-die. Saltwell, unlike its predecessor was aimed directly at smartphones (as opposed to MIDs).

Codenames[edit]

Platform Core Target
Medfield Penwell Smartphones
Cedar Trail Cedar Trail Netbooks
Clover Trail+ Cedar Trail+ Tablets
Medfield Medfield Tablet / Smartphone
Bordenville Centerton Microservers
Bordenville Briarwood Microservers
Berryville CE (set-tops)

Architecture[edit]

Saltwell's primary goals were:

  1. Improve on Bonnell by getting rid of older support chips
  2. Add enhancements using 32 nm process while transitioning to 22 nm
    1. Improve GPU, power
    2. Burst frequencies

Key changes from Bonnell[edit]

  • L2$ increase rate
  • L2$ now seperate rail
  • New low-power SRAM for machine state
  • Larger instruction fetch
  • Double the size of the branch prediction history table

Memory Hierarchy[edit]

  • Cache
    • Hardware prefetchers
    • L1 Cache:
      • 32 KiB 8-way set associative instruction
        • 1 read and 1 write port
      • 24 KiB 6-way set associative data
        • 1 read and 1 write port
      • 8 transistors (instead of 6) to reduce voltage
      • Per core
    • L2 Cache:
      • 512 KiB 8-way set associative
      • ECC
      • Shrinkable from 512 KiB to 128 KiB (2-way)
      • 32B/cycle and 32 outstanding cache requests
      • separate voltage rail, fixed @ 1.05V
      • Per core
    • L3 Cache:
      • No level 3 cache
    • Non-Cache Shared State Memory
      • 256 KiB low-power SRAM
      • separate voltage plane
      • always-on block that stores architectural states while in various power saving modes
    • RAM
      • Maximum of 1 GiB, 2 GiB, and 4 GiB
      • dual 32-bit channels, 1 or 2 ranks per channel

Functional Units[edit]

The number of functional units were kept to minimum to cut on power consumption.

  • 2 Integer ALUs (1 for jumps, 1 for shifts)
  • 2 FP ALUs (1 adder, 1 for others)
  • No Integer multiplier & divider (shared with FP ALU instead)

Pipeline[edit]

Saltwell has an almost identical pipeline to Bonnell's with a 16-stage pipeline with a 13-stage miss penalty. It's also still a dual-issue superscalar but with in-order execution. Reordering logic is was still omitted due to power and area restrictions.

bonnell pipeline.svg

The longer pipeline allows a more evenly spreading of heat across the chip with more units. This also allows a higher clock rate.

  • Instruction Fetch
    • 3 stages
    • 48 Bytes/Cycle (lower if SMT)
  • Instruction Decode
    • 3 stages
    • Instructions with up to 3 prefixes/Cycle
  • Instruction Dispatch
    • 2 stages
  • Source Operand Read
  • Data Cache Access
    • 3 stages
      • 1 stage for calculating
      • 2 stages for reading cache
  • Execution
    • 2 clusters
      • integers
        • quick cache access due to direct connection
      • floating point & SIMD
  • Exception & MT Handling
    • 2 stages
  • Commit
    • 1 stage

Multithreading[edit]

Saltwell has support for multithreading - up to two threads per core. However each thread compete for the same resources which does inherently means they run slower than they would if they were to run alone.

Branch Prediction[edit]

  • Two-level adaptive predictor
  • 12-bit branch history register
  • Pattern history table has 8192 entries (shared between threads), twice that of Bonnell
  • Branch buffer target has 128 entries (4-way, 32 sets)
  • Unconditional jumps are ignored
  • Always-taken and never-taken are marked in the table
  • Penalties:
    • 13 stages for miss prediction
    • 7 stages for correct prediction but missing branch target buffer (BTB)

Cores[edit]

  • Penwell - SoCs specifically for smartphones
  • Cedarview - SoCs for netbooks
  • Cloverview - SoCs for tablets
  • Centerton - SoCs for Microservers; added support for Intel VT and ECC memory
  • Briarwood - SoCs for Microservers
  • Berryville - SoCs for consumer electronics (e.g. set-tops)

All Saltwell Chips[edit]

Saltwell Chips
CPUIGP
ModelµarchPlatformCoreLaunchedSDPFreqMax MemNameFreqMax Freq
codenameSaltwell +
core count1 + and 2 +
designerIntel +
first launched2011 +
full page nameintel/microarchitectures/saltwell +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerIntel +
microarchitecture typeCPU +
nameSaltwell +
phase-out2013 +
pipeline stages16 +
process32 nm (0.032 μm, 3.2e-5 mm) +