Difference between revisions of "intel/microarchitectures/saltwell"

Revision as of 16:30, 8 April 2016

Saltwell was a microarchitecture for Intel's 32 nm ultra-low power system on chips first introduced in late 2011 for the Atom family. Saltwell is a shrink of Bonnell which also incorporated all older support chips on-die. Saltwell, unlike its predecessor was aimed directly at smartphones (as opposed to MIDs).

Codenames

Platform	Core	Target
Medfield	Penwell	Smartphones
Cedar Trail	Cedarview	Netbooks
Clover Trail+	Cloverview	Tablets
Bordenville	Centerton	Microservers
Bordenville	Briarwood	Microservers
	Berryville	CE (set-tops)

Architecture

Saltwell's primary goals were:

Improve on Bonnell by getting rid of older support chips
Add enhancements using 32 nm process while transitioning to 22 nm
1. Improve GPU, power
2. Burst frequencies

Memory Hierarchy

Cache
- Hardware prefetchers
- L1 Cache:
  - 32 KB 8-way set associative instruction
    - 1 read and 1 write port
  - 24 KB 6-way set associative data
    - 1 read and 1 write port
  - 8 transistors (instead of 6) to reduce voltage
  - Per core
- L2 Cache:
  - 512 KB 8-way set associative
  - ECC
  - Shrinkable from 512 KB to 128 KB (2-way)
  - 32B/cycle and 32 outstanding cache requests
  - separate voltage rail, fixed @ 1.05V
  - Per core
- L3 Cache:
  - No level 3 cache
- Non-Cache Shared State Memory
  - 256KB low-power SRAM
  - separate voltage plane
  - always-on block that stores architectural states while in various power saving modes
- RAM
  - Maximum of 1GB, 2 GB, and 4 GB
  - dual 32-bit channels, 1 or 2 ranks per channel

Note that the L1 cache for data and instructions were originally both 32 KB (8-way), however due to power restrictions, the L1d$ was later reduced to 24 KB.

Functional Units

The number of functional units were kept to minimum to cut on power consumption.

2 Integer ALUs (1 for jumps, 1 for shifts)
2 FP ALUs (1 adder, 1 for others)
No Integer multiplier & divider

Pipeline

Saltwell has an almost identical pipeline to Bonnell's with a 16-stage pipeline with a 13-stage miss penalty. It's also still a dual-issue superscalar but with in-order execution. Reordering logic is was still omitted due to power and area restrictions.

The longer pipeline allows a more evenly spreading of heat across the chip with more units. This also allows a higher clock rate.

Instruction Fetch
- 3 stages
- 48 Bytes/Cycle (lower if SMT)
Instruction Decode
- 3 stages
- Instructions with up to 3 prefixes/Cycle
Instruction Dispatch
- 2 stages
Source Operand Read
- 1 stage
  - reading register operand
Data Cache Access
- 3 stages
  - 1 stage for calculating
  - 2 stages for reading cache
Execution
- 2 clusters
  - integers
    - quick cache access due to direct connection
  - floating point & SIMD
Exception & MT Handling
- 2 stages
Commit
- 1 stage

Multithreading

Saltwell has support for multithreading - up to two threads per core. However each thread compete for the same resources which does inherently means they run slower than they would if they were to run alone.

Branch Prediction

Two-level adaptive predictor
12-bit branch history register
Pattern history table has 8192 entries (shared between threads), twice that of Bonnell
Branch buffer target has 128 entries (4-way, 32 sets)
Unconditional jumps are ignored
Always-taken and never-taken are marked in the table
Penalties:
- 13 stages for miss prediction
- 7 stages for correct prediction but missing branch target buffer (BTB)

Cores

Penwell - SoCs specifically for smartphones
Cedarview - SoCs for netbooks
Cloverview - SoCs for tablets
Centerton - SoCs for Microservers; added support for Intel VT and ECC memory
Briarwood - SoCs for Microservers
Berryville - SoCs for consumer electronics (e.g. set-tops)

@@ Line 1: / Line 1: @@
+{{intel title|Saltwell}}
 '''Saltwell''' was a [[microarchitecture]] for [[Intel]]'s [[32 nm]] ultra-low power [[system on chip]]s first introduced in late 2011 for the {{intel|Atom}} family. Saltwell is a shrink of {{intel|Bonnell}} which also incorporated all older support chips on-die. Saltwell, unlike its predecessor was aimed directly at smartphones (as opposed to MIDs).
@@ Line 19: / Line 20: @@
 |-
 |}
+== Architecture ==
+Saltwell's primary goals were:
+# Improve on Bonnell by getting rid of older support chips
+# Add enhancements using [[32 nm]] process while transitioning to [[22 nm]]
+## Improve GPU, power
+## Burst frequencies
+=== Memory Hierarchy ===
+* Cache
+** Hardware prefetchers
+** L1 Cache:
+*** 32 KB 8-way [[set associative]] instruction
+**** 1 read and 1 write port
+*** 24 KB 6-way set associative data
+**** 1 read and 1 write port
+*** 8 transistors (instead of 6) to reduce voltage
+*** Per core
+** L2 Cache:
+*** 512 KB 8-way set associative
+*** ECC
+*** Shrinkable from 512 KB to 128 KB (2-way)
+*** 32B/cycle and 32 outstanding cache requests
+*** separate voltage rail, fixed @ 1.05V
+*** Per core
+** L3 Cache:
+*** No level 3 cache
+** Non-Cache Shared State Memory
+*** 256KB low-power SRAM
+*** separate voltage plane
+*** always-on block that stores architectural states while in various power saving modes
+** RAM
+*** Maximum of 1GB, 2 GB, and 4 GB
+*** dual 32-bit channels, 1 or 2 ranks per channel
+Note that the L1 cache for data and instructions were originally both 32 KB (8-way), however due to power restrictions, the L1d$ was later reduced to 24 KB.
+=== Functional Units ===
+The number of functional units were kept to minimum to cut on power consumption.
+* 2 Integer [[ALU]]s (1 for jumps, 1 for shifts)
+* 2 FP ALUs (1 adder, 1 for others)
+* No Integer multiplier & divider
+=== Pipeline ===
+Saltwell has an almost identical pipeline to {{intel|Bonnell|Bonnell's}} with a 16-stage pipeline with a 13-stage miss penalty. It's also still a dual-issue [[superscalar]] but with in-order execution. Reordering logic is was still omitted due to power and area restrictions.
+:[[File:bonnell pipeline.svg]]
+The longer pipeline allows a more evenly spreading of heat across the chip with more units. This also allows a higher clock rate.
+* '''Instruction Fetch'''
+** 3 stages
+** 48 Bytes/Cycle (lower if SMT)
+* '''Instruction Decode'''
+** 3 stages
+** Instructions with up to 3 prefixes/Cycle
+* '''Instruction Dispatch'''
+** 2 stages
+* '''Source Operand Read'''
+** 1 stage
+*** reading [[register]] operand
+* '''Data Cache Access'''
+** 3 stages
+*** 1 stage for calculating
+*** 2 stages for reading cache
+* '''Execution'''
+** 2 clusters
+*** integers
+**** quick cache access due to direct connection
+*** floating point & SIMD
+* '''Exception & MT Handling'''
+** 2 stages
+* '''Commit'''
+** 1 stage
+=== Multithreading ===
+Saltwell has support for multithreading - up to two threads per core. However each thread compete for the same resources which does inherently means they run slower than they would if they were to run alone.
+=== Branch Prediction ===
+* [[Two-level adaptive predictor]]
+* 12-bit branch history register
+* Pattern history table has 8192 entries (shared between threads), twice that of {{intel|Bonnell}}
+* Branch buffer target has 128 entries (4-way, 32 sets)
+* Unconditional jumps are ignored
+* Always-taken and never-taken are marked in the table
+* Penalties:
+** 13 stages for miss prediction
+** 7 stages for correct prediction but missing [[branch target buffer]] (BTB)
+== Cores ==
+* '''{{intel|Penwell}}''' - SoCs specifically for smartphones
+* '''{{intel|Cedarview}}''' - SoCs for netbooks
+* '''{{intel|Cloverview}}''' - SoCs for tablets
+* '''{{intel|Centerton}}''' - SoCs for Microservers; added support for Intel VT and ECC memory
+* '''{{intel|Briarwood}}''' - SoCs for Microservers
+* '''{{intel|Berryville}}''' - SoCs for consumer electronics (e.g. set-tops)

WikiChip

The Fuse Coverage

Social Media

Companies

Microarchitectures

Technology Nodes

Intel

AMD

ARM

Cavium

Samsung

Intel

AMD

Ampere

Apple

Cavium

HiSilicon

MediaTek

NXP

Qualcomm

Renesas