From WikiChip
Editing ibm/microarchitectures/power9
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
{{ibm title|POWER9|arch}} | {{ibm title|POWER9|arch}} | ||
{{microarchitecture | {{microarchitecture | ||
− | |atype=CPU | + | | atype = CPU |
− | |name=POWER9 | + | | name = POWER9 |
− | |designer=IBM | + | | designer = IBM |
− | |manufacturer=GlobalFoundries | + | | manufacturer = GlobalFoundries |
− | |introduction=August, 2017 | + | | introduction = August, 2017 |
− | |phase-out= | + | | phase-out = August, 2018 |
− | |process=14 nm | + | | process = 14 nm |
− | |cores= | + | | cores = 24 |
− | |cores 2= | + | | cores 2 = |
− | + | ||
− | | | + | | pipeline = Yes |
− | | | + | | type = Superscalar |
− | | | + | | type 2 = |
− | |type= | + | | type N = |
− | | | + | | OoOE = Yes |
− | |speculative=Yes | + | | speculative = Yes |
− | |renaming=Yes | + | | renaming = Yes |
− | |stages min=12 | + | | stages = |
− | |stages max=16 | + | | stages min = 12 |
− | |isa=Power ISA v3. | + | | stages max = 16 |
− | |l1i=32 KiB | + | | issues = |
− | |l1i per=core | + | |
− | |l1i desc= | + | | inst = Yes |
− | |l1d=32 KiB | + | | isa = Power ISA v3.0 |
− | |l1d per=core | + | | isa 2 = |
− | |l1d desc= | + | | isa N = |
− | |l2=512 KiB | + | | feature = |
− | |l2 per=core | + | | extension = |
− | |l2 desc= | + | | extension 2 = |
− | |l3= | + | | extension N = |
− | |l3 per= | + | |
− | |l3 desc= | + | | cache = Yes |
− | + | | l1i = 32 KiB | |
− | + | | l1i per = core | |
− | + | | l1i desc = | |
− | + | | l1d = 32 KiB | |
− | + | | l1d per = core | |
− | + | | l1d desc = | |
− | + | | l2 = 512 KiB | |
− | + | | l2 per = core | |
− | + | | l2 desc = | |
+ | | l3 = 120 MiB | ||
+ | | l3 per = chip | ||
+ | | l3 desc = | ||
− | == | + | | core names = <!-- Yes if specify --> |
− | + | | core name = | |
+ | | core name 2 = | ||
+ | | core name N = | ||
− | + | | succession = Yes | |
− | | | + | | predecessor = POWER8+ |
− | + | | predecessor link = ibm/microarchitectures/power8+ | |
− | | | + | | successor = POWER10 |
− | | | + | | successor link = ibm/microarchitectures/power10 |
− | + | }} | |
− | + | '''POWER9''' is [[IBM]]'s successor to {{\\|POWER8}}, a [[14 nm]] microarchitecture for [[Power]]-based server microprocessors that is set to be introduced in the 2nd half of [[2017]]. POWER9-based processors are branded under the {{ibm|POWER9}} family. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Process Technology == | == Process Technology == | ||
POWER9-based microprocessors are fabricated on [[GlobalFoundries]]'s High-Performance [[14 nm process|14 nm]] (14HP) [[FinFET]] [[Silicon-On-Insulator]] (SOI) process. The process was designed by IBM at what used to be their East Fishkill, New York fab which has since been sold to GlobalFoundries. | POWER9-based microprocessors are fabricated on [[GlobalFoundries]]'s High-Performance [[14 nm process|14 nm]] (14HP) [[FinFET]] [[Silicon-On-Insulator]] (SOI) process. The process was designed by IBM at what used to be their East Fishkill, New York fab which has since been sold to GlobalFoundries. | ||
− | |||
− | |||
− | |||
== Compatibility == | == Compatibility == | ||
Line 88: | Line 82: | ||
! Compiler !! CPU !! Arch-Favorable | ! Compiler !! CPU !! Arch-Favorable | ||
|- | |- | ||
− | | [[GCC]] || style="background-color: #ffdad6;" | <code>-mcpu= | + | | [[GCC]] || style="background-color: #ffdad6;" | <code>-mcpu=pwr9</code> || style="background-color: #ffdad6;" | <code>-mtune=pwr9</code> |
|- | |- | ||
− | | [[LLVM]] || <code>-mcpu= | + | | [[LLVM]] || <code>-mcpu=pwr9</code> || style="background-color: #ffdad6;" | <code>-mtune=pwr9</code> |
|- | |- | ||
| {{ibm|XL C/C++}} || <code>-mcpu=pwr9</code> || <code>-mtune=pwr9</code> | | {{ibm|XL C/C++}} || <code>-mcpu=pwr9</code> || <code>-mtune=pwr9</code> | ||
|} | |} | ||
+ | |||
+ | == Variations == | ||
+ | IBM offers POWER9 in two flavors: '''Scale-Out''' ('''SO''') and '''Scale-Up''' ('''SU'''). The Scale-Out variations are design for traditional datacenter clusters utilizing [[uniprocessor|single-]] and [[multiprocessor|-dual]] sockets setups. The Scale-Up variations are designed for [[NUMA]] servers with four sockets and up, supporting large memory and throughput. | ||
+ | |||
+ | For the Scale-Out there are two variations, a [[12-core]] SMT8 model and a [[24-core]] SMT4 model. The SMT4 is optimized for Linux Ecosystem whereas the SMT8 is said to be optimized for the [[PowerVM]] Ecosystem community ({{ibm|AIX}} / {{ibm|IBM i}} customers). Those models support up to 8 channels of [[DDR4]] memory for up to 4 [[TiB]] of DDR4-2667 memory (per socket). Those models offer up to 120 GiB/s of sustained bandwidth. | ||
+ | |||
+ | {| class="wikitable" style="text-align: center;" | ||
+ | |- | ||
+ | ! !! Linux Ecosystem !! PowerVM Ecosystem | ||
+ | |- | ||
+ | | || [[24-core]] / 96 Threads || [[12-core]] / 96 Threads | ||
+ | |- | ||
+ | ! rowspan="2" | Scale-Out (SO) | ||
+ | | [[File:p9sosmt4.png|300px]] || [[File:p9sosmt8.png|300px]] | ||
+ | |- | ||
+ | | colspan="2" | [[File:p9somem.png|300px]] | ||
+ | |} | ||
+ | |||
+ | For the Scale-Up there are two variations, a [[12-core]] SMT8 model and a [[24-core]] SMT4 model. The SMT4 is optimized for Linux Ecosystem whereas the SMT8 is said to be optimized for the [[PowerVM]] Ecosystem community ({{ibm|AIX}} / {{ibm|IBM i}} customers). Those models continue to support IBM's agnostic memory interface powered by IBM's POWER memory buffer products enabling up to 8 TiB per socket and up to 230 GiB/s of sustained bandwidth. | ||
+ | |||
+ | {| class="wikitable" style="text-align: center;" | ||
+ | |- | ||
+ | ! !! Linux Ecosystem !! PowerVM Ecosystem | ||
+ | |- | ||
+ | | || [[24-core]] / 96 Threads || [[12-core]] / 96 Threads | ||
+ | |- | ||
+ | ! rowspan="2" | Scale-Up (SU) | ||
+ | | [[File:p9susmt4.png|300px]] || [[File:p9susmt8.png|300px]] | ||
+ | |- | ||
+ | | colspan="2" | [[File:p9sumem.png|300px]] | ||
+ | |} | ||
+ | |||
+ | == Performance Claims == | ||
+ | IBM claims a range of performance improvements for a wide array of workloads. The graph below (provided by IBM) compares POWER9 performance using POWER8 as a baseline. The graph represents a scale-out model of similar specs at a constant frequency. | ||
+ | |||
+ | [[File:p9performance.png|700px]] | ||
== Architecture == | == Architecture == | ||
Line 116: | Line 146: | ||
*** 7 TB/s on-chip bandwidth | *** 7 TB/s on-chip bandwidth | ||
* Hardware Acceleration | * Hardware Acceleration | ||
− | + | ** Enhanced on-chip acceleration | |
− | + | ** [[Nvidia]] [[NVLINK]] 2.0 | |
− | + | ** CAPI 2.0 | |
− | |||
* I/O Subsystem | * I/O Subsystem | ||
** [[PCIe]] Gen4 | ** [[PCIe]] Gen4 | ||
** Local [[SMP]] - 16 GT/s per lane interface | ** Local [[SMP]] - 16 GT/s per lane interface | ||
** Remote SMP - 25 GT/s per lane interface | ** Remote SMP - 25 GT/s per lane interface | ||
− | *** 48 | + | *** 48-96 lanes capability |
*** IBM's SMP connect for their scale-up systems | *** IBM's SMP connect for their scale-up systems | ||
*** Also available for the accelerators | *** Also available for the accelerators | ||
Line 140: | Line 169: | ||
** L1I Cache | ** L1I Cache | ||
*** 32 [[KiB]], 8-way set associative | *** 32 [[KiB]], 8-way set associative | ||
− | |||
*** Per SMT4 Core | *** Per SMT4 Core | ||
− | ** | + | ** LID Cache |
− | |||
*** 32 KiB, 8-way set associative | *** 32 KiB, 8-way set associative | ||
− | |||
*** Per SMT4 Core | *** Per SMT4 Core | ||
− | |||
** L2 Cache | ** L2 Cache | ||
− | *** | + | *** 258 KiB per SMT4 core |
− | |||
− | |||
− | |||
** L3 Cache | ** L3 Cache | ||
*** 120 MiB [[eDRAM]] | *** 120 MiB [[eDRAM]] | ||
− | |||
*** 12 chunks (regions) of 10 MiB 20-way set associative | *** 12 chunks (regions) of 10 MiB 20-way set associative | ||
*** 7 TB/s on-chip bandwidth | *** 7 TB/s on-chip bandwidth | ||
− | + | === Execution Slice Microarchitecture === | |
− | + | '''Execution Slice Microarchitecture''' is POWER9's entirely new refactored core modular design. The same modules were used to build both the SMT4 and SMT8 cores (and in theory scale further to higher thread count although that's not going to happen in this iteration). These modules allow IBM to address the various processor models with support for the different configurations such as bandwidth/lines (from 128 to 64 byte sectors). | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | === Slice | ||
− | '''Execution Slice Microarchitecture''' is POWER9's entirely new refactored core modular design. The same modules were used to build both the SMT4 and SMT8 cores (and in theory scale further to higher thread count although that's not | ||
A '''Slice''' is the basic 64-bit computing block incorporating a single '''[[Vector and Scalar Unit]]''' ('''VSU''') coupled with '''Load/Store Unit''' ('''LSU'''). VSU has a heterogeneous mix of computing capabilities including [[integer]] and [[floating point]] supporting [[scalar]] and [[vector]] operations. IBM claims this setup allows for higher utilization of resources while providing efficient exchanges of data between the individual slices. Two slices coupled together make up the '''Super-Slice''', a 128-bit POWER9 physical design building block. Two super-slices together along with an '''Instruction Fetch Unit''' ('''IFU''') and an '''Instruction Sequencing Unit''' ('''ISU''') form a single POWER9 SMT4 core. The SMT8 variant is effectively two SMT4 units. | A '''Slice''' is the basic 64-bit computing block incorporating a single '''[[Vector and Scalar Unit]]''' ('''VSU''') coupled with '''Load/Store Unit''' ('''LSU'''). VSU has a heterogeneous mix of computing capabilities including [[integer]] and [[floating point]] supporting [[scalar]] and [[vector]] operations. IBM claims this setup allows for higher utilization of resources while providing efficient exchanges of data between the individual slices. Two slices coupled together make up the '''Super-Slice''', a 128-bit POWER9 physical design building block. Two super-slices together along with an '''Instruction Fetch Unit''' ('''IFU''') and an '''Instruction Sequencing Unit''' ('''ISU''') form a single POWER9 SMT4 core. The SMT8 variant is effectively two SMT4 units. | ||
Line 246: | Line 248: | ||
* Up to 4 DW Load or Store | * Up to 4 DW Load or Store | ||
|} | |} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
== Die == | == Die == | ||
− | === | + | === Tetracosa-Core === |
− | * GlobalFoundries [[14 nm process|14 nm FinFET | + | * [[Tetracosa-Core]] |
+ | * GlobalFoundries [[14 nm process|14 nm FinFET Process]] | ||
* 17-layer metal stack | * 17-layer metal stack | ||
* 8,000,000,000 transistors | * 8,000,000,000 transistors | ||
− | * | + | * 695 mm² die size |
− | |||
− | |||
− | [[File:power9 | + | [[File:power9 die shot.jpg|800px]] |
− | [[File:power9 | + | [[File:power9 die shot (annotated).png|800px]] |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == | + | == References == |
− | * | + | * Brian Thompto, IBM, Senior Technical Staff Member for IBM POWER Systems, Hot Chips 28 |
− | |||
== See also == | == See also == | ||
− | * [[Intel]]'s {{intel|Skylake|l=arch}} & {{intel| | + | * [[Intel]]'s {{intel|Skylake|l=arch}} & {{intel|Kaby Lake|l=arch}} |
* [[AMD]]'s {{amd|Zen|l=arch}} | * [[AMD]]'s {{amd|Zen|l=arch}} | ||
* [[Qualcomm]]'s {{qualcomm|Falkor|l=arch}} | * [[Qualcomm]]'s {{qualcomm|Falkor|l=arch}} |
Facts about "POWER9 - Microarchitectures - IBM"
codename | POWER9 + |
core count | 24 +, 4 +, 8 +, 12 +, 16 + and 20 + |
designer | IBM + |
first launched | August 2017 + |
full page name | ibm/microarchitectures/power9 + |
instance of | microarchitecture + |
instruction set architecture | Power ISA v3.0B + |
manufacturer | GlobalFoundries + |
microarchitecture type | CPU + |
name | POWER9 + |
phase-out | 2020 + |
pipeline stages (max) | 16 + |
pipeline stages (min) | 12 + |
process | 14 nm (0.014 μm, 1.4e-5 mm) + |