Editing intel/microarchitectures/goldmont

{{intel title|Goldmont|arch}}
{{microarchitecture
|atype=CPU
|name=Goldmont
|designer=Intel
|manufacturer=Intel
|introduction=August 30, 2016
|process=14 nm
|cores=2
|cores 2=4
|cores 3=8
|cores 4=12
|cores 5=16
|type=Superscalar
|speculative=Yes
|renaming=Yes
|stages min=12
|stages max=14
|isa=x86-64
|extension=MOVBE
|extension 2=MMX
|extension 3=SSE
|extension 4=SSE2
|extension 5=SSE3
|extension 6=SSSE3
|extension 7=SSE4.1
|extension 8=SSE4.2
|extension 9=POPCNT
|extension 10=AES
|extension 11=PCLMUL
|extension 12=RDRND
|extension 13=XSAVE
|extension 14=XSAVEOPT
|extension 15=FSGSBASE
|extension 16=SHA
|l1i=32 KiB
|l1i per=Core
|l1i desc=8-way set associative
|l1d=24 KiB
|l1d per=Core
|l1d desc=6-way set associative
|l2=1-2 MiB
|l2 per=2 Cores
|l2 desc=16-way set associative
|core name=Apollo Lake
|core name 2=Denverton
|predecessor=Airmont
|predecessor link=intel/microarchitectures/airmont
|successor=Goldmont Plus
|successor link=intel/microarchitectures/goldmont plus
}}
'''Goldmont''' ('''GLM''') is [[Intel]]'s [[14 nm]] [[microarchitecture]] of [[system on chip]]s for the ultra-low power (ULP) devices. Goldmont-based processors and SoCs are part of the {{intel|Atom}}, {{intel|Pentium (2009)|Pentium}}, and {{intel|Celeron}} families. Goldmont superseded {{intel|Airmont}} in August of 2016. With Goldmont, Intel stopped targeting smartphones altogether, cancelling the related cores and SKUs.
[[File:Atom E3900 SoC Front.png|right|thumb|250px|Intel Atom E3900 SoC series]]

== Codenames ==
{| class="wikitable"
|-
! Platform !! Core !! Target
|-
| &nbsp; || {{intel|Apollo Lake|l=core}} || Entry-level PCs, Tablets
|- 
| &nbsp; || {{intel|Denverton|l=core}} || Ultra-low power servers, networking, storage, and IoT
|-
| &nbsp; || style="text-decoration: line-through;" | {{intel|Willow Trail|l=core}} || style="text-decoration: line-through;" | Lightweight Tablets & high-end smartphone
|- style="text-decoration: line-through;"
| {{intel|Morganfield|l=platform}} || {{intel|Broxton|l=core}} || Smartphone
|}

== Process Technology ==
{{main|intel/microarchitectures/broadwell#Process_Technology|l1=Broadwell § Process Technology}}
Goldmont-based chips are manufactured on Intel's [[14 nm process]].

== Architecture ==
[[File:atom c3000 on a wafer.png|right|350px]]
=== Key changes from {{intel|Airmont}} ===
* Pipeline
** Compared to Airmont, Goldmont is a 3-issue core.
** Throughput
*** NOPs, MOVs and many ALU operations have 3 op/cycle throughput for 16, 32 and 64-bit registers. (8-bit ALU ops throughput is 2, 1.5 or 1 op per cycle).
*** ADC, SBB have 0.5 op/cycle throughput, unchanged from Airmont.
*** INC, DEC, BTx, shift ops are not faster than on Airmont, 8-bit shifts are slightly slower (0.66 op/cycle instead of 1).
*** Rotate-through-carry (RCL, RCR) throughput continues to be slow and is slightly slower (used to be ~10 cycles per op, now ~12).
*** 16- and 64-bit shift-double (SHLD, SHRD) throughput continues to be slow and is slightly slower (used to be ~10 cycles per op, now ~14) than on Airmont. (32-bit SHLD, SHRD are fast: 2-4 cycles).
*** Variable shifts and rotates (SHL r32, CL etc) latency increased from 1 cycle to 2 cycles.
*** Bit scan (BSF, BSR) throughput improved from 10 to 8 cycles per op.
*** MUL throughput is better by 1 cycle (used to be 5/7 cycles for 32/64-bit mul, now 4/6).
*** DIV is more than twice as fast as Airmont, 13 cycles for most divides, 128-bit/64-bit are ~42 cycles.
*** PUSH to POP forwarding is improved.
*** REP MOVS streaming copy is twice as fast: now ~26 bytes/cycle.
*** REP STOS fill is not improved: ~9 bytes/cycle.
*** Some vector instructions are faster, but like on {{\\|Airmont}}, none have throughput >2 op/cycle. This includes often used ops like adds and multiplies:
**** MULPS and MULPD have 4 cycle latency and 1 op/cycle throughput (used to have L5 and T0.5).
**** ADDPD has 3 cycle latency and 1 op/cycle throughput (used to have L4 and T0.5).
*** CRC32 instruction throughput improved from 6 cycles/op to 1 cycle/op, latency is halved from 6 to 3.
* Gen 9 GPUs
** {{intel|HD Graphics 400}} '''→''' {{intel|HD Graphics 500}} (12 Execution Units, no change)
** {{intel|HD Graphics 405}} '''→''' {{intel|HD Graphics 505}} (18 Execution Units, up from 16)

====New instructions ====
Goldmont introduced a number of {{x86|extensions|new instructions}}:

* {{x86|RDSEED|<code>RDSEED</code>}} - Generates 16, 32 or 64 bit random numbers seeds ([[NIST SP 800-90B]] & [[NIST SP 800-90C]])
* {{x86|SMAP|<code>SMAP</code>}} - Supervisor Mode Access Prevention
* {{x86|MPX|<code>MPX</code>}} -Memory Protection Extensions
* {{x86|XSAVEC|<code>XSAVEC</code>}} - Save processor extended states with compaction to memory
* {{x86|XSAVES|<code>XSAVES</code>}} - Save processor supervisor-mode extended states to memory.
* {{x86|CLFLUSHOPT|<code>CLFLUSHOPT</code>}} - Flush & Invalidates memory operand and its associated cache line (All L1/L2/L3 etc..)
* {{x86|SHA|<code>SHA</code>}} - [[Hardware acceleration]] for SHA hashing operations 
* FS/GS base access

=== Block Diagram ===
{{empty section}}

=== Memory Hierarchy ===
* Cache
** Hardware prefetchers
** L1 Cache:
*** 32 [[KiB]] 8-way [[set associative]] instruction, 64 B line size
*** 24 KiB 6-way set associative data, 64 B line size
*** Per core
** L2 Cache:
*** 1 MiB 16-way set associative, 64 B line size
*** Per 2 cores
*** 32B/cycle, 17 cycle latency
** L3 Cache:
*** No level 3 cache
** RAM
*** Maximum of 1 [[GiB]], 2 GiB, 4 GiB, 8 GiB
*** dual 32-bit channels, 1 or 2 ranks per channel

=== Multithreading ===
Goldmont, like {{\\|Airmont}} has no support for Intel Hyper-Threading Technology.

== Die Shot ==
Intel {{intel|Atom}} E3900 SoC series:
:[[File:atom e3900 die shot.jpg|650px]]

== All Goldmont Chips ==
<!-- NOTE: 
           This table is generated automatically from the data in the actual articles.
           If a microprocessor is missing from the list, an appropriate article for it needs to be
           created and tagged accordingly.

           Missing a chip? please dump its name here: http://en.wikichip.org/wiki/WikiChip:wanted_chips
-->
<table class="wikitable sortable">
<tr><th colspan="12" style="background:#D6D6FF;">Goldmont Chips</th></tr>
<tr><th colspan="9">Main processor</th><th colspan="3">IGP</th></tr>
<tr><th>Model</th><th>Family</th><th>Platform</th><th>Core</th><th>Launched</th><th>SDP</th><th>TDP</th><th>Freq</th><th>Max Mem</th><th>Name</th><th>Freq</th><th>Max Freq</th></tr>
{{#ask: [[Category:microprocessor models by intel]] [[microarchitecture::Goldmont]]
 |?full page name
 |?model number
 |?microprocessor family
 |?platform
 |?core name
 |?first launched
 |?sdp
 |?tdp
 |?base frequency
 |?max memory
 |?integrated gpu
 |?integrated gpu base frequency
 |?integrated gpu max frequency
 |format=template
 |template=proc table 2
 |userparam=13
 |mainlabel=-
}}
{{table count|col=12|ask=[[Category:microprocessor models by intel]] [[microarchitecture::Goldmont]]}}
</table>
codename	Goldmont +
core count	2 +, 4 +, 8 +, 12 + and 16 +
designer	Intel +
first launched	August 30, 2016 +
full page name	intel/microarchitectures/goldmont +
instance of	microarchitecture +
instruction set architecture	x86-64 +
manufacturer	Intel +
microarchitecture type	CPU +
name	Goldmont +
pipeline stages (max)	14 +
pipeline stages (min)	12 +
process	14 nm (0.014 μm, 1.4e-5 mm) +