From WikiChip
Editing amd/microarchitectures/zen
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 472: | Line 472: | ||
The FP deals with all vector operations. The simple integer vector operations (e.g. shift, add) can all be done in one cycle, half the latency of AMD's previous architecture. Basic [[floating point]] math has a latency of three cycles including [[multiplication]] (one additional cycle for double precision). [[Fused multiply-add]] are five cycles. | The FP deals with all vector operations. The simple integer vector operations (e.g. shift, add) can all be done in one cycle, half the latency of AMD's previous architecture. Basic [[floating point]] math has a latency of three cycles including [[multiplication]] (one additional cycle for double precision). [[Fused multiply-add]] are five cycles. | ||
− | The FP has a single pipe for 128-bit load operations. In fact, the entire FP side is optimized for 128-bit operations. Zen supports all the latest instructions such as SSE and {{x86|AVX1}}/{{x86|AVX2|2}}. The way 256-bit AVX was designed was so that they can be carried out as two independent 128-bit operations. Zen takes advantage of that by operating on those instructions as two operations; i.e., Zen splits up 256-bit operations into two µOPs so they are effectively half the throughput of their 128-bit operations counterparts. Likewise, stores are also done on 128-bit chunks, making 256-bit loads have an effective throughput of one store every two cycles. The pipes are fairly well balanced, therefore most operations will have at least two pipes to be scheduled on retaining the throughput of at least one such instruction each cycle. As implies, 256-bit operations will use up twice the resources to complete (i.e., 2x register, scheduler, and ports). This is a compromise [[AMD]] has taken which helps conserve die space and power. By contrast, [[Intel]]'s competing design, {{intel|Skylake}}, does have dedicated 256-bit circuitry. It's also worth noting that Intel's contemporary {{intel|Skylake SP|server class models|l=core}} have extended this further to incorporate dedicated 512-bit circuitry supporting {{x86|AVX-512}} with the highest performance models {{intel|Skylake | + | The FP has a single pipe for 128-bit load operations. In fact, the entire FP side is optimized for 128-bit operations. Zen supports all the latest instructions such as SSE and {{x86|AVX1}}/{{x86|AVX2|2}}. The way 256-bit AVX was designed was so that they can be carried out as two independent 128-bit operations. Zen takes advantage of that by operating on those instructions as two operations; i.e., Zen splits up 256-bit operations into two µOPs so they are effectively half the throughput of their 128-bit operations counterparts. Likewise, stores are also done on 128-bit chunks, making 256-bit loads have an effective throughput of one store every two cycles. The pipes are fairly well balanced, therefore most operations will have at least two pipes to be scheduled on retaining the throughput of at least one such instruction each cycle. As implies, 256-bit operations will use up twice the resources to complete (i.e., 2x register, scheduler, and ports). This is a compromise [[AMD]] has taken which helps conserve die space and power. By contrast, [[Intel]]'s competing design, {{intel|Skylake}}, does have dedicated 256-bit circuitry. It's also worth noting that Intel's contemporary {{intel|Skylake SP|server class models|l=core}} have extended this further to incorporate dedicated 512-bit circuitry supporting {{x86|AVX-512}} with the highest performance models {{intel|Skylake#Execution_engine_2|l=arch|having a whole second}} dedicated AVX-512 unit. |
Additionally Zen also supports {{x86|SHA}} and {{x86|AES}} with 2 AES units implemented in an attempt to improve encryption performance. Those units can be found on pipes 0 and 1 of the floating point scheduler. | Additionally Zen also supports {{x86|SHA}} and {{x86|AES}} with 2 AES units implemented in an attempt to improve encryption performance. Those units can be found on pipes 0 and 1 of the floating point scheduler. |
Facts about "Zen - Microarchitectures - AMD"
codename | Zen + |
core count | 4 +, 6 +, 8 +, 16 +, 24 +, 32 + and 12 + |
designer | AMD + |
first launched | March 2, 2017 + |
full page name | amd/microarchitectures/zen + |
instance of | microarchitecture + |
instruction set architecture | x86-64 + |
manufacturer | GlobalFoundries + |
microarchitecture type | CPU + |
name | Zen + |
pipeline stages | 19 + |
process | 14 nm (0.014 μm, 1.4e-5 mm) + |