From WikiChip
Editing ibm/microarchitectures/power9

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 1: Line 1:
 
{{ibm title|POWER9|arch}}
 
{{ibm title|POWER9|arch}}
 
{{microarchitecture
 
{{microarchitecture
|atype=CPU
+
| atype         = CPU
|name=POWER9
+
| name         = POWER9
|designer=IBM
+
| designer     = IBM
|manufacturer=GlobalFoundries
+
| manufacturer = GlobalFoundries
|introduction=August, 2017
+
| introduction = August, 2017
|phase-out=2020
+
| phase-out     = August, 2018
|process=14 nm
+
| process       = 14 nm
|cores=4
+
| cores         = 24
|cores 2=8
+
| cores 2       =  
|cores 3=12
+
 
|cores 4=16
+
| pipeline      = Yes
|cores 5=20
+
| type          = Superscalar
|cores 6=24
+
| type 2        =  
|type=Superscalar
+
| type N        =  
|oooe=Yes
+
| OoOE          = Yes
|speculative=Yes
+
| speculative   = Yes
|renaming=Yes
+
| renaming     = Yes
|stages min=12
+
| stages        =
|stages max=16
+
| stages min   = 12
|isa=Power ISA v3.0B
+
| stages max   = 16
|l1i=32 KiB
+
| issues        =
|l1i per=core
+
 
|l1i desc=8-way set associative
+
| inst          = Yes
|l1d=32 KiB
+
| isa           = Power ISA v3.0
|l1d per=core
+
| isa 2        =
|l1d desc=8-way set associative
+
| isa N        =
|l2=512 KiB
+
| feature      =
|l2 per=core duplex
+
| extension    =
|l2 desc=8-way set associative
+
| extension 2  =
|l3=10 MiB
+
| extension N  =
|l3 per=core duplex
+
 
|l3 desc=20-way set associative
+
| cache        = Yes
|core name=Sforza
+
| l1i           = 32 KiB
|core name 2=Monza
+
| l1i per       = core
|core name 3=LaGrange
+
| l1i desc     =
|predecessor=POWER8+
+
| l1d           = 32 KiB
|predecessor link=ibm/microarchitectures/power8+
+
| l1d per       = core
|successor=POWER10
+
| l1d desc     =
|successor link=ibm/microarchitectures/power10
+
| l2           = 512 KiB
}}
+
| l2 per       = core
'''POWER9''' is [[IBM]]'s successor to {{\\|POWER8}}, a [[14 nm]] microarchitecture for [[Power]]-based server microprocessors first introduced in the 2nd half of [[2017]]. POWER9-based processors are branded under the {{ibm|POWER}} family.
+
| l2 desc       =  
 +
| l3           = 120 MiB
 +
| l3 per       = chip
 +
| l3 desc       =  
  
== Code names ==
+
| core names      = <!-- Yes if specify -->
IBM introduced three flavors of POWER9.
+
| core name        =
 +
| core name 2      =
 +
| core name N      =
  
{| class="wikitable tc1 tc2 tc3 tc4 tc5 tc6 tc7"
+
| succession      = Yes
|-
+
| predecessor      = POWER8+
! SoC Codename || SoC Description || Module || Memory Channels || PCIe || {{ibm|XBUS}} || [[OpenCAPI]]
+
| predecessor link = ibm/microarchitectures/power8+
|-
+
| successor        = POWER10
| rowspan="3" | Nimbus || rowspan="3" | Scale Out
+
| successor link  = ibm/microarchitectures/power10
| {{ibm|Sforza|l=core}} || 4 || 48 || 1 || {{tchk|no}}
+
}}
|-
+
'''POWER9''' is [[IBM]]'s successor to {{\\|POWER8}}, a [[14 nm]] microarchitecture for [[Power]]-based server microprocessors that is set to be introduced in the 2nd half of [[2017]]. POWER9-based processors are branded under the {{ibm|POWER9}} family.
| {{ibm|Monza|l=core}} || 8 || 34 || 1 || 48
 
|-
 
| {{ibm|LaGrange|l=core}} || 8 || 42 || 2 || 16
 
|-
 
| Cumulus || Scale Up || ? || {{ibm|Centaur}} || ? || ? || ?
 
|-
 
| Axone || Advanced I/O || ? || OMI || 48 || 3 || 48
 
|}
 
  
 
== Process Technology ==
 
== Process Technology ==
 
POWER9-based microprocessors are fabricated on [[GlobalFoundries]]'s High-Performance [[14 nm process|14 nm]] (14HP) [[FinFET]] [[Silicon-On-Insulator]] (SOI) process. The process was designed by IBM at what used to be their East Fishkill, New York fab which has since been sold to GlobalFoundries.
 
POWER9-based microprocessors are fabricated on [[GlobalFoundries]]'s High-Performance [[14 nm process|14 nm]] (14HP) [[FinFET]] [[Silicon-On-Insulator]] (SOI) process. The process was designed by IBM at what used to be their East Fishkill, New York fab which has since been sold to GlobalFoundries.
 
== Introduction ==
 
IBM introduced the POWER9 scale out variant of POWER in December 2017. Scale up POWER9 processors were introduced in August 2018. The third variant for high I/O will be introduced in 2019.
 
  
 
== Compatibility ==
 
== Compatibility ==
Line 88: Line 82:
 
! Compiler !! CPU !! Arch-Favorable
 
! Compiler !! CPU !! Arch-Favorable
 
|-
 
|-
| [[GCC]] || style="background-color: #ffdad6;" | <code>-mcpu=power9</code> || style="background-color: #ffdad6;" | <code>-mtune=power9</code>
+
| [[GCC]] || style="background-color: #ffdad6;" | <code>-mcpu=pwr9</code> || style="background-color: #ffdad6;" | <code>-mtune=pwr9</code>
 
|-
 
|-
| [[LLVM]] || <code>-mcpu=power9</code> || style="background-color: #ffdad6;" | <code>-mtune=power9</code>
+
| [[LLVM]] || <code>-mcpu=pwr9</code> || style="background-color: #ffdad6;" | <code>-mtune=pwr9</code>
 
|-
 
|-
 
| {{ibm|XL C/C++}} || <code>-mcpu=pwr9</code> || <code>-mtune=pwr9</code>
 
| {{ibm|XL C/C++}} || <code>-mcpu=pwr9</code> || <code>-mtune=pwr9</code>
 
|}
 
|}
 +
 +
== Variations ==
 +
IBM offers POWER9 in two flavors: '''Scale-Out''' ('''SO''') and '''Scale-Up''' ('''SU'''). The Scale-Out variations are design for traditional datacenter clusters utilizing [[uniprocessor|single-]] and [[multiprocessor|-dual]] sockets setups. The Scale-Up variations are designed for [[NUMA]] servers with four sockets and up, supporting large memory and throughput.
 +
 +
For the Scale-Out there are two variations, a [[12-core]] SMT8 model and a [[24-core]] SMT4 model. The SMT4 is optimized for Linux Ecosystem whereas the SMT8 is said to be optimized for the [[PowerVM]] Ecosystem community ({{ibm|AIX}} / {{ibm|IBM i}} customers). Those models support up to 8 channels of [[DDR4]] memory for up to 4 [[TiB]] of DDR4-2667 memory (per socket). Those models offer up to 120 GiB/s of sustained bandwidth.
 +
 +
{| class="wikitable" style="text-align: center;"
 +
|-
 +
!  !! Linux Ecosystem !! PowerVM Ecosystem
 +
|-
 +
| || [[24-core]] / 96 Threads || [[12-core]] / 96 Threads
 +
|-
 +
! rowspan="2" | Scale-Out (SO)
 +
| [[File:p9sosmt4.png|300px]] || [[File:p9sosmt8.png|300px]]
 +
|-
 +
| colspan="2" | [[File:p9somem.png|300px]]
 +
|}
 +
 +
For the Scale-Up there are two variations, a [[12-core]] SMT8 model and a [[24-core]] SMT4 model. The SMT4 is optimized for Linux Ecosystem whereas the SMT8 is said to be optimized for the [[PowerVM]] Ecosystem community ({{ibm|AIX}} / {{ibm|IBM i}} customers). Those models continue to support IBM's agnostic memory interface powered by IBM's POWER memory buffer products enabling up to 8 TiB per socket and up to 230 GiB/s of sustained bandwidth.
 +
 +
{| class="wikitable" style="text-align: center;"
 +
|-
 +
!  !! Linux Ecosystem !! PowerVM Ecosystem
 +
|-
 +
| || [[24-core]] / 96 Threads || [[12-core]] / 96 Threads
 +
|-
 +
! rowspan="2" | Scale-Up (SU)
 +
| [[File:p9susmt4.png|300px]] || [[File:p9susmt8.png|300px]]
 +
|-
 +
| colspan="2" | [[File:p9sumem.png|300px]]
 +
|}
 +
 +
== Performance Claims ==
 +
IBM claims a range of performance improvements for a wide array of workloads. The graph below (provided by IBM) compares POWER9 performance using POWER8 as a baseline. The graph represents a scale-out model of similar specs at a constant frequency.
 +
 +
[[File:p9performance.png|700px]]
  
 
== Architecture ==
 
== Architecture ==
Line 116: Line 146:
 
*** 7 TB/s on-chip bandwidth
 
*** 7 TB/s on-chip bandwidth
 
* Hardware Acceleration
 
* Hardware Acceleration
** {{ibm|PowerAXON}}
+
** Enhanced on-chip acceleration
*** Enhanced on-chip acceleration
+
** [[Nvidia]] [[NVLINK]] 2.0
*** [[Nvidia]] [[NVLink]] 2.0
+
** CAPI 2.0
*** CAPI 2.0
 
 
* I/O Subsystem
 
* I/O Subsystem
 
** [[PCIe]] Gen4
 
** [[PCIe]] Gen4
 
** Local [[SMP]] - 16 GT/s per lane interface
 
** Local [[SMP]] - 16 GT/s per lane interface
 
** Remote SMP  - 25 GT/s per lane interface
 
** Remote SMP  - 25 GT/s per lane interface
*** 48 PCIe lanes
+
*** 48-96 lanes capability
 
*** IBM's SMP connect for their scale-up systems
 
*** IBM's SMP connect for their scale-up systems
 
*** Also available for the accelerators
 
*** Also available for the accelerators
Line 140: Line 169:
 
** L1I Cache
 
** L1I Cache
 
*** 32 [[KiB]], 8-way set associative
 
*** 32 [[KiB]], 8-way set associative
*** 128-byte lines (broken into four 32-byte sectors)
 
 
*** Per SMT4 Core
 
*** Per SMT4 Core
*** Critical-sector-first reload policy
+
** LID Cache
** L1D Cache
 
 
*** 32 KiB, 8-way set associative
 
*** 32 KiB, 8-way set associative
*** 128-byte cache line with support for 64-byte sectors
 
 
*** Per SMT4 Core
 
*** Per SMT4 Core
*** Pseudo-LRU replacement policy
 
 
** L2 Cache
 
** L2 Cache
*** 512 KiB 8-way set associative
+
*** 258 KiB per SMT4 core
*** 128-byte line
 
*** Per core pair
 
*** Inclusive of L1I/L1D
 
 
** L3 Cache
 
** L3 Cache
 
*** 120 MiB [[eDRAM]]
 
*** 120 MiB [[eDRAM]]
**** 10 MiB/core pair
 
 
*** 12 chunks (regions) of 10 MiB 20-way set associative
 
*** 12 chunks (regions) of 10 MiB 20-way set associative
 
*** 7 TB/s on-chip bandwidth
 
*** 7 TB/s on-chip bandwidth
  
== Overview ==
+
=== Execution Slice Microarchitecture ===
POWER9 succeeds {{\\|POWER8}}, introducing many core enhancements as well as large architectural changes. POWER9 has taken a highly modular design approach, with the same design supporting up to 12 [[physical cores|cores]] with 96 [[logical cores|threads]] (SMT8) or up to 24 cores with 96 threads (SMT4). IBM offers POWER9 as both [[scale up]] and [[scale out]] solutions. In total, there are four targeted chip implementations (24C/SO, 24C/SU, 12C/SO, and 12C/SU).
+
'''Execution Slice Microarchitecture''' is POWER9's entirely new refactored core modular design. The same modules were used to build both the SMT4 and SMT8 cores (and in theory scale further to higher thread count although that's not going to happen in this iteration). These modules allow IBM to address the various processor models with support for the different configurations such as bandwidth/lines (from 128 to 64 byte sectors).
 
 
POWER9 comes in two flavors - [[scale out]] (SO) and [[scale up]] (SU). The scale out variations are designed for traditional datacenter clusters utilizing [[uniprocessor|single-socket]] and [[multiprocessor|dual-socket]] setups. The Scale-Up variations are designed for [[NUMA]] servers with four or more sockets, supporting large amounts of memory capacity and throughput.
 
 
 
=== Scale out ===
 
[[File:power9 so overview.svg|right|thumb|Scale-out overview]]
 
For the scale out there are two variations, a [[12-core]] SMT8 model and a [[24-core]] SMT4 model. The SMT4 is optimized for the Linux ecosystem whereas the SMT8 model is said to be optimized for the [[PowerVM]] ecosystem ({{ibm|AIX}} / {{ibm|IBM i}} customers). Those models support up to 8 channels of [[DDR4]] memory for up to 4 [[TiB]] of DDR4-2667 memory (per socket). Those models offer up to 120 GiB/s of sustained bandwidth.
 
 
 
Scale out processors have 48 {{ibm|PowerAXON}} lines (x48) and come with two [[SMP links]].
 
 
 
=== Scale up ===
 
[[File:power9 su overview.svg|right|thumb|Scale-up overview]]
 
The POWER9 [[scale up]] is designed for their enterprise servers and come with two variations, a [[12-core]] SMT8 model and a [[24-core]] SMT4 model. The SMT4 is optimized for Linux Ecosystem whereas the SMT8 is said to be optimized for the [[PowerVM]] Ecosystem community ({{ibm|AIX}} / {{ibm|IBM i}} customers). POWER9 inherits the same buffered memory architecture first introduced with {{\\|POWER8}}. POWER9 has two memory controllers capable of driving four differential memory interface (DMI) channels, each with a maximum signaling rate of 9.6 GT/s for a sustained bandwidth of up to 28.8 GB/s. Each of the DMI channels connects to one dedicated {{ibm|Centaur}} memory buffer chip which, in turn, provides four DDR4 memory channels running at up to 3200 MT/s as well as 16 MiB of L4 cache. All in all, POWER9 scale-up can use eight buffered memory channels to access up to 32 channels of DDR memory and provides an additional 128 MiB of level 4 cache.
 
 
 
:[[File:power9 memory buff.svg|700px]]
 
 
 
Scale up processors have a different set of I/O interfaces. The two memory controllers drive eight memory-agnostic interfaces, come with four times as many {{ibm|PowerAXON}} lines (x96), and 3 [[SMP]] links.
 
 
 
=== Slice Design  ===
 
'''Execution Slice Microarchitecture''' is POWER9's entirely new refactored core modular design. The same modules were used to build both the SMT4 and SMT8 cores (and in theory scale further to higher thread count although that's not offered this iteration). These modules allow IBM to address the various processor models with support for the different configurations such as bandwidth/lines (from 128 to 64 byte sectors).
 
  
 
A '''Slice''' is the basic 64-bit computing block incorporating a single '''[[Vector and Scalar Unit]]''' ('''VSU''') coupled with '''Load/Store Unit''' ('''LSU'''). VSU has a heterogeneous mix of computing capabilities including [[integer]] and [[floating point]] supporting [[scalar]] and [[vector]] operations. IBM claims this setup allows for higher utilization of resources while providing efficient exchanges of data between the individual slices.  Two slices coupled together make up the '''Super-Slice''', a 128-bit POWER9 physical design building block. Two super-slices together along with an '''Instruction Fetch Unit''' ('''IFU''') and an '''Instruction Sequencing Unit''' ('''ISU''') form a single POWER9 SMT4 core. The SMT8 variant is effectively two SMT4 units.
 
A '''Slice''' is the basic 64-bit computing block incorporating a single '''[[Vector and Scalar Unit]]''' ('''VSU''') coupled with '''Load/Store Unit''' ('''LSU'''). VSU has a heterogeneous mix of computing capabilities including [[integer]] and [[floating point]] supporting [[scalar]] and [[vector]] operations. IBM claims this setup allows for higher utilization of resources while providing efficient exchanges of data between the individual slices.  Two slices coupled together make up the '''Super-Slice''', a 128-bit POWER9 physical design building block. Two super-slices together along with an '''Instruction Fetch Unit''' ('''IFU''') and an '''Instruction Sequencing Unit''' ('''ISU''') form a single POWER9 SMT4 core. The SMT8 variant is effectively two SMT4 units.
Line 246: Line 248:
 
* Up to 4 DW Load or Store
 
* Up to 4 DW Load or Store
 
|}
 
|}
 
== Performance Claims ==
 
IBM claims a range of performance improvements for a wide array of workloads. The graph below (provided by IBM) compares POWER9 performance using POWER8 as a baseline. The graph represents a scale-out model of similar specs at a constant frequency.
 
 
[[File:p9performance.png|700px]]
 
  
 
== Die ==
 
== Die ==
=== Scale out ===
+
=== Tetracosa-Core ===
* GlobalFoundries [[14 nm process|14 nm FinFET on SOI Process]]
+
* [[Tetracosa-Core]]
 +
* GlobalFoundries [[14 nm process|14 nm FinFET Process]]
 
* 17-layer metal stack
 
* 17-layer metal stack
 
* 8,000,000,000 transistors
 
* 8,000,000,000 transistors
** 15 miles of wire
+
* 695 mm² die size
* 693.37 mm² die size
 
* 25.228 mm x 27.48416 mm
 
  
[[File:power9 so die.png|class=wikichip_ogimage|600px]]
+
[[File:power9 die shot.jpg|800px]]
  
  
[[File:power9 so die (annotated).png|600px]]
+
[[File:power9 die shot (annotated).png|800px]]
 
 
=== Scale up ===
 
* GlobalFoundries [[14 nm process|14 nm FinFET on SOI Process]]
 
* 17-layer metal stack
 
* 8,000,000,000 transistors
 
** 15 miles of wire
 
* 693.37 mm² die size
 
* 25.228 mm x 27.48416 mm
 
 
 
[[File:power9 su die.png|600px]]
 
 
 
 
 
[[File:power9 su die (annotated).png|600px]]
 
 
 
== All POWER9 Processors ==
 
<!-- NOTE:
 
          This table is generated automatically from the data in the actual articles.
 
          If a microprocessor is missing from the list, an appropriate article for it needs to be
 
          created and tagged accordingly.
 
 
 
          Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 
-->
 
{{comp table start}}
 
<table class="comptable sortable tc4 tc5">
 
{{comp table header|main|9:List of POWER9-based Processors}}
 
{{comp table header 1|cols=Launched, Codename, Cores, Threads, %L2$, %L3$, %TDP, %Frequency, Turbo}}
 
{{#ask: [[Category:microprocessor models by ibm]] [[instance of::microprocessor]] [[microarchitecture::POWER9]]
 
|?full page name
 
|?model number
 
|?first launched
 
|?core name
 
|?core count
 
|?thread count
 
|?l2$ size
 
|?l3$ size
 
|?tdp
 
|?base frequency#GHz
 
|?turbo frequency#GHz
 
|format=template
 
|template=proc table 3
 
|searchlabel=
 
|sort=core count
 
|order=desc
 
|userparam=11
 
|mainlabel=-
 
|limit=100
 
|valuesep=,
 
}}
 
{{comp table count|ask=[[Category:microprocessor models by ibm]] [[instance of::microprocessor]] [[microarchitecture::POWER9]]}}
 
</table>
 
{{comp table end}}
 
  
== Bibliography ==
+
== References ==
* {{bib|hc|28|IBM}}
+
* Brian Thompto, IBM, Senior Technical Staff Member for IBM POWER Systems, Hot Chips 28
* {{bib|hc|30|IBM}}
 
  
 
== See also ==
 
== See also ==
* [[Intel]]'s {{intel|Skylake|l=arch}} & {{intel|Cascade Lake|l=arch}}
+
* [[Intel]]'s {{intel|Skylake|l=arch}} & {{intel|Kaby Lake|l=arch}}
 
* [[AMD]]'s {{amd|Zen|l=arch}}
 
* [[AMD]]'s {{amd|Zen|l=arch}}
 
* [[Qualcomm]]'s {{qualcomm|Falkor|l=arch}}
 
* [[Qualcomm]]'s {{qualcomm|Falkor|l=arch}}

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)

This page is a member of 1 hidden category:

codenamePOWER9 +
core count24 +, 4 +, 8 +, 12 +, 16 + and 20 +
designerIBM +
first launchedAugust 2017 +
full page nameibm/microarchitectures/power9 +
instance ofmicroarchitecture +
instruction set architecturePower ISA v3.0B +
manufacturerGlobalFoundries +
microarchitecture typeCPU +
namePOWER9 +
phase-out2020 +
pipeline stages (max)16 +
pipeline stages (min)12 +
process14 nm (0.014 μm, 1.4e-5 mm) +