(→Codenames) |
|||
Line 31: | Line 31: | ||
| cache = Yes | | cache = Yes | ||
− | | l1i = 32 | + | | l1i = 32 KiB |
| l1i per = Core | | l1i per = Core | ||
| l1i desc = 8-way set associative | | l1i desc = 8-way set associative | ||
− | | l1d = 24 | + | | l1d = 24 KiB |
| l1d per = Core | | l1d per = Core | ||
| l1d desc = 6-way set associative | | l1d desc = 6-way set associative | ||
− | | l2 = 512 | + | | l2 = 512 KiB |
| l2 per = Cores | | l2 per = Cores | ||
| l2 desc = 8-way set associative | | l2 desc = 8-way set associative | ||
Line 95: | Line 95: | ||
** Hardware prefetchers | ** Hardware prefetchers | ||
** L1 Cache: | ** L1 Cache: | ||
− | *** 32 | + | *** 32 [[KiB]] 8-way [[set associative]] instruction |
**** 1 read and 1 write port | **** 1 read and 1 write port | ||
− | *** 24 | + | *** 24 KiB 6-way set associative data |
**** 1 read and 1 write port | **** 1 read and 1 write port | ||
*** 8 transistors (instead of 6) to reduce voltage | *** 8 transistors (instead of 6) to reduce voltage | ||
*** Per core | *** Per core | ||
** L2 Cache: | ** L2 Cache: | ||
− | *** 512 | + | *** 512 KiB 8-way set associative |
*** ECC | *** ECC | ||
− | *** Shrinkable from 512 | + | *** Shrinkable from 512 KiB to 128 KiB (2-way) |
*** 32B/cycle and 32 outstanding cache requests | *** 32B/cycle and 32 outstanding cache requests | ||
*** separate voltage rail, fixed @ 1.05V | *** separate voltage rail, fixed @ 1.05V | ||
Line 111: | Line 111: | ||
*** No level 3 cache | *** No level 3 cache | ||
** Non-Cache Shared State Memory | ** Non-Cache Shared State Memory | ||
− | *** | + | *** 256 KiB low-power SRAM |
*** separate voltage plane | *** separate voltage plane | ||
*** always-on block that stores architectural states while in various power saving modes | *** always-on block that stores architectural states while in various power saving modes | ||
** RAM | ** RAM | ||
− | *** Maximum of | + | *** Maximum of 1 [[GiB]], 2 GiB, and 4 GiB |
*** dual 32-bit channels, 1 or 2 ranks per channel | *** dual 32-bit channels, 1 or 2 ranks per channel | ||
Revision as of 12:53, 6 November 2016
Edit Values | |
Saltwell µarch | |
General Info |
Saltwell was a microarchitecture for Intel's 32 nm ultra-low power system on chips first introduced in late 2011 for the Atom family. Saltwell is a shrink of Bonnell which also incorporated all support chips on-die. Saltwell, unlike its predecessor was aimed directly at smartphones (as opposed to MIDs).
Contents
Codenames
Platform | Core | Target |
---|---|---|
Medfield | Penwell | Smartphones |
Cedar Trail | Cedar Trail | Netbooks |
Clover Trail+ | Cedar Trail+ | Tablets |
Medfield | Medfield | Tablet / Smartphone |
Bordenville | Centerton | Microservers |
Bordenville | Briarwood | Microservers |
Berryville | CE (set-tops) |
Architecture
Saltwell's primary goals were:
- Improve on Bonnell by getting rid of older support chips
- Add enhancements using 32 nm process while transitioning to 22 nm
- Improve GPU, power
- Burst frequencies
Key changes from Bonnell
- L2$ increase rate
- L2$ now seperate rail
- New low-power SRAM for machine state
- Larger instruction fetch
- Double the size of the branch prediction history table
Memory Hierarchy
- Cache
- Hardware prefetchers
- L1 Cache:
- 32 KiB 8-way set associative instruction
- 1 read and 1 write port
- 24 KiB 6-way set associative data
- 1 read and 1 write port
- 8 transistors (instead of 6) to reduce voltage
- Per core
- 32 KiB 8-way set associative instruction
- L2 Cache:
- 512 KiB 8-way set associative
- ECC
- Shrinkable from 512 KiB to 128 KiB (2-way)
- 32B/cycle and 32 outstanding cache requests
- separate voltage rail, fixed @ 1.05V
- Per core
- L3 Cache:
- No level 3 cache
- Non-Cache Shared State Memory
- 256 KiB low-power SRAM
- separate voltage plane
- always-on block that stores architectural states while in various power saving modes
- RAM
- Maximum of 1 GiB, 2 GiB, and 4 GiB
- dual 32-bit channels, 1 or 2 ranks per channel
Functional Units
The number of functional units were kept to minimum to cut on power consumption.
- 2 Integer ALUs (1 for jumps, 1 for shifts)
- 2 FP ALUs (1 adder, 1 for others)
- No Integer multiplier & divider (shared with FP ALU instead)
Pipeline
Saltwell has an almost identical pipeline to Bonnell's with a 16-stage pipeline with a 13-stage miss penalty. It's also still a dual-issue superscalar but with in-order execution. Reordering logic is was still omitted due to power and area restrictions.
The longer pipeline allows a more evenly spreading of heat across the chip with more units. This also allows a higher clock rate.
- Instruction Fetch
- 3 stages
- 48 Bytes/Cycle (lower if SMT)
- Instruction Decode
- 3 stages
- Instructions with up to 3 prefixes/Cycle
- Instruction Dispatch
- 2 stages
- Source Operand Read
- 1 stage
- reading register operand
- 1 stage
- Data Cache Access
- 3 stages
- 1 stage for calculating
- 2 stages for reading cache
- 3 stages
- Execution
- 2 clusters
- integers
- quick cache access due to direct connection
- floating point & SIMD
- integers
- 2 clusters
- Exception & MT Handling
- 2 stages
- Commit
- 1 stage
Multithreading
Saltwell has support for multithreading - up to two threads per core. However each thread compete for the same resources which does inherently means they run slower than they would if they were to run alone.
Branch Prediction
- Two-level adaptive predictor
- 12-bit branch history register
- Pattern history table has 8192 entries (shared between threads), twice that of Bonnell
- Branch buffer target has 128 entries (4-way, 32 sets)
- Unconditional jumps are ignored
- Always-taken and never-taken are marked in the table
- Penalties:
- 13 stages for miss prediction
- 7 stages for correct prediction but missing branch target buffer (BTB)
Cores
- Penwell - SoCs specifically for smartphones
- Cedarview - SoCs for netbooks
- Cloverview - SoCs for tablets
- Centerton - SoCs for Microservers; added support for Intel VT and ECC memory
- Briarwood - SoCs for Microservers
- Berryville - SoCs for consumer electronics (e.g. set-tops)
All Saltwell Chips
Saltwell Chips | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
CPU | IGP | |||||||||
Model | µarch | Platform | Core | Launched | SDP | Freq | Max Mem | Name | Freq | Max Freq |