From WikiChip
Editing ibm/microarchitectures/power9

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 1: Line 1:
 
{{ibm title|POWER9|arch}}
 
{{ibm title|POWER9|arch}}
 
{{microarchitecture
 
{{microarchitecture
|atype=CPU
+
| atype         = CPU
|name=POWER9
+
| name         = POWER9
|designer=IBM
+
| designer     = IBM
|manufacturer=GlobalFoundries
+
| manufacturer = GlobalFoundries
|introduction=August, 2017
+
| introduction = August, 2017
|phase-out=2020
+
| phase-out     = August, 2018
|process=14 nm
+
| process       = 14 nm
|cores=4
+
| cores         = 24
|cores 2=8
+
| cores 2       =  
|cores 3=12
+
 
|cores 4=16
+
| pipeline      = Yes
|cores 5=20
+
| type          = Superscalar
|cores 6=24
+
| type 2        =  
|type=Superscalar
+
| type N        =  
|oooe=Yes
+
| OoOE          = Yes
|speculative=Yes
+
| speculative   = Yes
|renaming=Yes
+
| renaming     = Yes
|stages min=12
+
| stages        =
|stages max=16
+
| stages min   = 12
|isa=Power ISA v3.0B
+
| stages max   = 16
|l1i=32 KiB
+
| issues        =
|l1i per=core
+
 
|l1i desc=8-way set associative
+
| inst          = Yes
|l1d=32 KiB
+
| isa           = Power ISA v3.0
|l1d per=core
+
| isa 2        =
|l1d desc=8-way set associative
+
| isa N        =
|l2=512 KiB
+
| feature      =
|l2 per=core duplex
+
| extension    =
|l2 desc=8-way set associative
+
| extension 2  =
|l3=10 MiB
+
| extension N  =
|l3 per=core duplex
+
 
|l3 desc=20-way set associative
+
| cache        = Yes
|core name=Sforza
+
| l1i           = 32 KiB
|core name 2=Monza
+
| l1i per       = core
|core name 3=LaGrange
+
| l1i desc     =
|predecessor=POWER8+
+
| l1d           = 32 KiB
|predecessor link=ibm/microarchitectures/power8+
+
| l1d per       = core
|successor=POWER10
+
| l1d desc     =
|successor link=ibm/microarchitectures/power10
+
| l2           = 512 KiB
}}
+
| l2 per       = core
'''POWER9''' is [[IBM]]'s successor to {{\\|POWER8}}, a [[14 nm]] microarchitecture for [[Power]]-based server microprocessors first introduced in the 2nd half of [[2017]]. POWER9-based processors are branded under the {{ibm|POWER}} family.
+
| l2 desc       =  
 +
| l3           = 120 MiB
 +
| l3 per       = chip
 +
| l3 desc       =  
  
== Code names ==
+
| core names      = <!-- Yes if specify -->
IBM introduced three flavors of POWER9.
+
| core name        =
 +
| core name 2      =
 +
| core name N      =
  
{| class="wikitable tc1 tc2 tc3 tc4 tc5 tc6 tc7"
+
| succession      = Yes
|-
+
| predecessor      = POWER8
! SoC Codename || SoC Description || Module || Memory Channels || PCIe || {{ibm|XBUS}} || [[OpenCAPI]]
+
| predecessor link = ibm/microarchitectures/power8
|-
+
| successor        = POWER10
| rowspan="3" | Nimbus || rowspan="3" | Scale Out
+
| successor link  = ibm/microarchitectures/power10
| {{ibm|Sforza|l=core}} || 4 || 48 || 1 || {{tchk|no}}
+
}}
|-
+
'''POWER9''' is [[IBM]]'s successor to {{\\|POWER8}}, a [[14 nm]] microarchitecture for [[Power]]-based server microprocessors that is set to be introduced in the 2nd half of [[2017]]. POWER9-based processors are branded under the {{ibm|POWER9}} family.
| {{ibm|Monza|l=core}} || 8 || 34 || 1 || 48
 
|-
 
| {{ibm|LaGrange|l=core}} || 8 || 42 || 2 || 16
 
|-
 
| Cumulus || Scale Up || ? || {{ibm|Centaur}} || ? || ? || ?
 
|-
 
| Axone || Advanced I/O || ? || OMI || 48 || 3 || 48
 
|}
 
  
 
== Process Technology ==
 
== Process Technology ==
POWER9-based microprocessors are fabricated on [[GlobalFoundries]]'s High-Performance [[14 nm process|14 nm]] (14HP) [[FinFET]] [[Silicon-On-Insulator]] (SOI) process. The process was designed by IBM at what used to be their East Fishkill, New York fab which has since been sold to GlobalFoundries.
+
POWER9 is set to be fabricated on [[GlobalFoundries]]' [[14 nm process|14 nm FinFET process]], the same process that's used by [[AMD]] for their {{amd|Zen|l=arch}} microarchitecture.
 
 
== Introduction ==
 
IBM introduced the POWER9 scale out variant of POWER in December 2017. Scale up POWER9 processors were introduced in August 2018. The third variant for high I/O will be introduced in 2019.
 
  
 
== Compatibility ==
 
== Compatibility ==
Line 88: Line 82:
 
! Compiler !! CPU !! Arch-Favorable
 
! Compiler !! CPU !! Arch-Favorable
 
|-
 
|-
| [[GCC]] || style="background-color: #ffdad6;" | <code>-mcpu=power9</code> || style="background-color: #ffdad6;" | <code>-mtune=power9</code>
+
| [[GCC]] || style="background-color: #ffdad6;" | <code>-mcpu=pwr9</code> || style="background-color: #ffdad6;" | <code>-mtune=pwr9</code>
 
|-
 
|-
| [[LLVM]] || <code>-mcpu=power9</code> || style="background-color: #ffdad6;" | <code>-mtune=power9</code>
+
| [[LLVM]] || <code>-mcpu=pwr9</code> || style="background-color: #ffdad6;" | <code>-mtune=pwr9</code>
 
|-
 
|-
 
| {{ibm|XL C/C++}} || <code>-mcpu=pwr9</code> || <code>-mtune=pwr9</code>
 
| {{ibm|XL C/C++}} || <code>-mcpu=pwr9</code> || <code>-mtune=pwr9</code>
Line 96: Line 90:
  
 
== Architecture ==
 
== Architecture ==
=== Key changes from {{\\|POWER8}}/{{\\|POWER8+|+}} ===
+
=== Key changes from {{\\|POWER8}} ===
 
* [[14 nm process]] (from [[22 nm]])
 
* [[14 nm process]] (from [[22 nm]])
 
** 17-layer metal stack
 
** 17-layer metal stack
Line 103: Line 97:
 
* Higher single-thread performance
 
* Higher single-thread performance
 
* New highly modular architecture
 
* New highly modular architecture
* Pipeline
+
* Shorter pipeline
** Shorter pipeline
+
** 5 stages eliminated from fetch to compute vs {{\\|POWER8}}
*** 5 stages eliminated from fetch to compute vs {{\\|POWER8}}
 
*** Roughly 5 stages were also eliminated for fixed-point operations
 
*** Up to 8 cycles were eliminated for floating-point operations
 
** Instruction grouping at dispatch has been removed
 
** Improved hazard avoidance / reduced hazard disruption
 
* Improved branch prediction
 
 
* Cache
 
* Cache
 
** 120 MiB NUCA L3
 
** 120 MiB NUCA L3
Line 116: Line 104:
 
*** 7 TB/s on-chip bandwidth
 
*** 7 TB/s on-chip bandwidth
 
* Hardware Acceleration
 
* Hardware Acceleration
** {{ibm|PowerAXON}}
+
** Enhanced on-chip acceleration
*** Enhanced on-chip acceleration
+
** [[Nvidia]] [[NVLINK]] 2.0
*** [[Nvidia]] [[NVLink]] 2.0
+
** CAPI 2.0
*** CAPI 2.0
 
 
* I/O Subsystem
 
* I/O Subsystem
 
** [[PCIe]] Gen4
 
** [[PCIe]] Gen4
 
** Local [[SMP]] - 16 GT/s per lane interface
 
** Local [[SMP]] - 16 GT/s per lane interface
 
** Remote SMP  - 25 GT/s per lane interface
 
** Remote SMP  - 25 GT/s per lane interface
*** 48 PCIe lanes
+
*** 48-96 lanes capability
 
*** IBM's SMP connect for their scale-up systems
 
*** IBM's SMP connect for their scale-up systems
 
*** Also available for the accelerators
 
*** Also available for the accelerators
Line 133: Line 120:
 
** Hardware enforced trusted execution
 
** Hardware enforced trusted execution
  
=== Block Diagram ===
+
== Scalability ==
{{empty section}}
+
IBM offers POWER9 in two flavors: '''Scale-Out''' and '''Scale-Up'''.
 
 
=== Memory Hierarchy ===
 
* Cache
 
** L1I Cache
 
*** 32 [[KiB]], 8-way set associative
 
*** 128-byte lines (broken into four 32-byte sectors)
 
*** Per SMT4 Core
 
*** Critical-sector-first reload policy
 
** L1D Cache
 
*** 32 KiB, 8-way set associative
 
*** 128-byte cache line with support for 64-byte sectors
 
*** Per SMT4 Core
 
*** Pseudo-LRU replacement policy
 
** L2 Cache
 
*** 512 KiB 8-way set associative
 
*** 128-byte line
 
*** Per core pair
 
*** Inclusive of L1I/L1D
 
** L3 Cache
 
*** 120 MiB [[eDRAM]]
 
**** 10 MiB/core pair
 
*** 12 chunks (regions) of 10 MiB 20-way set associative
 
*** 7 TB/s on-chip bandwidth
 
 
 
== Overview ==
 
POWER9 succeeds {{\\|POWER8}}, introducing many core enhancements as well as large architectural changes. POWER9 has taken a highly modular design approach, with the same design supporting up to 12 [[physical cores|cores]] with 96 [[logical cores|threads]] (SMT8) or up to 24 cores with 96 threads (SMT4). IBM offers POWER9 as both [[scale up]] and [[scale out]] solutions. In total, there are four targeted chip implementations (24C/SO, 24C/SU, 12C/SO, and 12C/SU).
 
 
 
POWER9 comes in two flavors - [[scale out]] (SO) and [[scale up]] (SU). The scale out variations are designed for traditional datacenter clusters utilizing [[uniprocessor|single-socket]] and [[multiprocessor|dual-socket]] setups. The Scale-Up variations are designed for [[NUMA]] servers with four or more sockets, supporting large amounts of memory capacity and throughput.
 
 
 
=== Scale out ===
 
[[File:power9 so overview.svg|right|thumb|Scale-out overview]]
 
For the scale out there are two variations, a [[12-core]] SMT8 model and a [[24-core]] SMT4 model. The SMT4 is optimized for the Linux ecosystem whereas the SMT8 model is said to be optimized for the [[PowerVM]] ecosystem ({{ibm|AIX}} / {{ibm|IBM i}} customers). Those models support up to 8 channels of [[DDR4]] memory for up to 4 [[TiB]] of DDR4-2667 memory (per socket). Those models offer up to 120 GiB/s of sustained bandwidth.
 
 
 
Scale out processors have 48 {{ibm|PowerAXON}} lines (x48) and come with two [[SMP links]].
 
 
 
=== Scale up ===
 
[[File:power9 su overview.svg|right|thumb|Scale-up overview]]
 
The POWER9 [[scale up]] is designed for their enterprise servers and come with two variations, a [[12-core]] SMT8 model and a [[24-core]] SMT4 model. The SMT4 is optimized for Linux Ecosystem whereas the SMT8 is said to be optimized for the [[PowerVM]] Ecosystem community ({{ibm|AIX}} / {{ibm|IBM i}} customers). POWER9 inherits the same buffered memory architecture first introduced with {{\\|POWER8}}. POWER9 has two memory controllers capable of driving four differential memory interface (DMI) channels, each with a maximum signaling rate of 9.6 GT/s for a sustained bandwidth of up to 28.8 GB/s. Each of the DMI channels connects to one dedicated {{ibm|Centaur}} memory buffer chip which, in turn, provides four DDR4 memory channels running at up to 3200 MT/s as well as 16 MiB of L4 cache. All in all, POWER9 scale-up can use eight buffered memory channels to access up to 32 channels of DDR memory and provides an additional 128 MiB of level 4 cache.
 
 
 
:[[File:power9 memory buff.svg|700px]]
 
 
 
Scale up processors have a different set of I/O interfaces. The two memory controllers drive eight memory-agnostic interfaces, come with four times as many {{ibm|PowerAXON}} lines (x96), and 3 [[SMP]] links.
 
 
 
=== Slice Design  ===
 
'''Execution Slice Microarchitecture''' is POWER9's entirely new refactored core modular design. The same modules were used to build both the SMT4 and SMT8 cores (and in theory scale further to higher thread count although that's not offered this iteration). These modules allow IBM to address the various processor models with support for the different configurations such as bandwidth/lines (from 128 to 64 byte sectors).
 
 
 
A '''Slice''' is the basic 64-bit computing block incorporating a single '''[[Vector and Scalar Unit]]''' ('''VSU''') coupled with '''Load/Store Unit''' ('''LSU'''). VSU has a heterogeneous mix of computing capabilities including [[integer]] and [[floating point]] supporting [[scalar]] and [[vector]] operations. IBM claims this setup allows for higher utilization of resources while providing efficient exchanges of data between the individual slices.  Two slices coupled together make up the '''Super-Slice''', a 128-bit POWER9 physical design building block. Two super-slices together along with an '''Instruction Fetch Unit''' ('''IFU''') and an '''Instruction Sequencing Unit''' ('''ISU''') form a single POWER9 SMT4 core. The SMT8 variant is effectively two SMT4 units.
 
 
 
{| style="border-spacing: 10px; border-collapse: separate; text-align: center;"
 
| {{\\|POWER8}}
 
| P9 SMT8 (4x Super-Slice)
 
| P9 SMT4 (2x Super-Slice)
 
| Super-Slice
 
| Slice
 
|-
 
| [[File:p8smt8comp.png|200px]]
 
| [[File:p94xsuper-slice.png|250px]]
 
| [[File:p92xsuper-slice.png|130px]]
 
| [[File:p9super-slice.png|100px]]
 
| [[File:p9slice.png|50px]]
 
|}
 
 
 
=== Acceleration Platform (POWERAccel) ===
 
[[File:p9links.png|250px|right]]
 
'''POWERAccel''' is the collective name for all the interfaces and acceleration protocols provided by the POWER microarchitecture. POWER9 offers two sets of acceleration attachments: [[PCIe]] Gen4 which offers 48 lanes at 192 GiB/s duplex bandwidth and a new 25G link which offers an additional 48 lanes delivering up to 300 GiB/s of duplex bandwidth. On top of the two physical interfaces are a set of open standard protocols that integrated onto those signaling interfaces. The four prominent standards are:
 
 
 
* [[CAPI]] 2.0 - POWER9 introduces CAPI 2.0 over [[PCIe]] which quadruples the bandwidth offered by the original CAPI protocol offered in {{\\|POWER8}}.
 
* New CAPI - A new interface that runs on top of the POWER9 25G link (300 GiB/s) interface, designed for CPU-Accelerators applications
 
* [[NVLink]] 2.0 - High bandwidth and integration between the [[GPU]] and CPU.
 
* On-Chip Acceleration - An array of accelerators offered by the POWER9 architecture itself
 
** 1x [[GZip]]
 
** 2x [[842 Compression]]
 
** 2x [[AES]]/[[SHA]]
 
  
 
=== Pipeline ===
 
=== Pipeline ===
POWER9 modular design allowed IBM to reduce fetch-to-compute latency by 5 cycles. Similar number of cycles were also cut from fixed-point operations from [[fetch]] to [[retire]]. Additional 8 cycles were cut from fetch-to-retire for floating point instructions. POWER9 furthered increased fusion and reduced the number of instructions cracked (POWER handles complex instructions by 'cracking' them into two or three simple µOPs). Instruction grouping at dispatch that was done in {{\\|POWER8}} has also been entirely removed from POWER9.
+
{{empty section}}
 
+
== Die Shot ==
{| style="overflow-x: scroll; white-space: nowrap; font-size: 1.2em; border-spacing: 10px; border-collapse: separate; "
+
=== [[Tetracosa-Core]] ===
| colspan="9" | || B0 || B1 || RES
+
* GlobalFoundries [[14 nm process|14 nm FinFET Process]]
|-
 
| IF || IC  || D1 || D2 || Crack/Fuse || PD0 || PD1 || XFER || MAP || VS0 || VS1 || F2 || F3 || F4 || F5
 
|-
 
| colspan="9" | || LS0 || LS1 || AGEN || BRD || CA || FMT || CA
 
|}
 
 
 
==== SMT4 core ====
 
[[File:p9smt4core.png|700px]]
 
 
 
 
 
{| class="wikitable"
 
! Fetch/Branch || Slices issue VSU & AGEN || VSU Pipe || LSU Slices
 
|-
 
|
 
* 32 KiB L1I$
 
* 8 fetch, 6 decode
 
* 1x branch execution
 
||
 
* 4x scalar-64b / 2x vector-128b
 
* 4x load/store AGEN
 
||
 
* 4x [[ALU]]
 
* 4x [[FP]] + FX-MUL + Complex (64b)
 
* 2x Permute (128b)
 
* 2x Quad Fixed (128b)
 
* 2x Fixed Divide (64b)
 
* 1x Quad FP & Decimal FP
 
* 1x Cryptography
 
||
 
* 32 KiB L1D$
 
* Up to 4 DW Load or Store
 
|}
 
 
 
== Performance Claims ==
 
IBM claims a range of performance improvements for a wide array of workloads. The graph below (provided by IBM) compares POWER9 performance using POWER8 as a baseline. The graph represents a scale-out model of similar specs at a constant frequency.
 
 
 
[[File:p9performance.png|700px]]
 
 
 
== Die ==
 
=== Scale out ===
 
* GlobalFoundries [[14 nm process|14 nm FinFET on SOI Process]]
 
* 17-layer metal stack
 
* 8,000,000,000 transistors
 
** 15 miles of wire
 
* 693.37 mm² die size
 
* 25.228 mm x 27.48416 mm
 
 
 
[[File:power9 so die.png|class=wikichip_ogimage|600px]]
 
 
 
 
 
[[File:power9 so die (annotated).png|600px]]
 
 
 
=== Scale up ===
 
* GlobalFoundries [[14 nm process|14 nm FinFET on SOI Process]]
 
 
* 17-layer metal stack
 
* 17-layer metal stack
 
* 8,000,000,000 transistors
 
* 8,000,000,000 transistors
** 15 miles of wire
 
* 693.37 mm² die size
 
* 25.228 mm x 27.48416 mm
 
 
[[File:power9 su die.png|600px]]
 
  
 +
[[File:power9 die shot.jpg|800px]]
  
[[File:power9 su die (annotated).png|600px]]
+
[[File:power9 die shot (annotated).png|800px]]
 
 
== All POWER9 Processors ==
 
<!-- NOTE:
 
          This table is generated automatically from the data in the actual articles.
 
          If a microprocessor is missing from the list, an appropriate article for it needs to be
 
          created and tagged accordingly.
 
 
 
          Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 
-->
 
{{comp table start}}
 
<table class="comptable sortable tc4 tc5">
 
{{comp table header|main|9:List of POWER9-based Processors}}
 
{{comp table header 1|cols=Launched, Codename, Cores, Threads, %L2$, %L3$, %TDP, %Frequency, Turbo}}
 
{{#ask: [[Category:microprocessor models by ibm]] [[instance of::microprocessor]] [[microarchitecture::POWER9]]
 
|?full page name
 
|?model number
 
|?first launched
 
|?core name
 
|?core count
 
|?thread count
 
|?l2$ size
 
|?l3$ size
 
|?tdp
 
|?base frequency#GHz
 
|?turbo frequency#GHz
 
|format=template
 
|template=proc table 3
 
|searchlabel=
 
|sort=core count
 
|order=desc
 
|userparam=11
 
|mainlabel=-
 
|limit=100
 
|valuesep=,
 
}}
 
{{comp table count|ask=[[Category:microprocessor models by ibm]] [[instance of::microprocessor]] [[microarchitecture::POWER9]]}}
 
</table>
 
{{comp table end}}
 
 
 
== Bibliography ==
 
* {{bib|hc|28|IBM}}
 
* {{bib|hc|30|IBM}}
 
  
 
== See also ==
 
== See also ==
* [[Intel]]'s {{intel|Skylake|l=arch}} & {{intel|Cascade Lake|l=arch}}
+
* [[Intel]]'s {{intel|Skylake|l=arch}} & {{intel|Kaby Lake|l=arch}}
 
* [[AMD]]'s {{amd|Zen|l=arch}}
 
* [[AMD]]'s {{amd|Zen|l=arch}}
 
* [[Qualcomm]]'s {{qualcomm|Falkor|l=arch}}
 
* [[Qualcomm]]'s {{qualcomm|Falkor|l=arch}}

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)

This page is a member of 1 hidden category:

codenamePOWER9 +
core count24 +, 4 +, 8 +, 12 +, 16 + and 20 +
designerIBM +
first launchedAugust 2017 +
full page nameibm/microarchitectures/power9 +
instance ofmicroarchitecture +
instruction set architecturePower ISA v3.0B +
manufacturerGlobalFoundries +
microarchitecture typeCPU +
namePOWER9 +
phase-out2020 +
pipeline stages (max)16 +
pipeline stages (min)12 +
process14 nm (0.014 μm, 1.4e-5 mm) +