From WikiChip
Editing amd/microarchitectures/zen

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 472: Line 472:
 
The FP deals with all vector operations. The simple integer vector operations (e.g. shift, add) can all be done in one cycle, half the latency of AMD's previous architecture. Basic [[floating point]] math has a latency of three cycles including [[multiplication]] (one additional cycle for double precision). [[Fused multiply-add]] are five cycles.
 
The FP deals with all vector operations. The simple integer vector operations (e.g. shift, add) can all be done in one cycle, half the latency of AMD's previous architecture. Basic [[floating point]] math has a latency of three cycles including [[multiplication]] (one additional cycle for double precision). [[Fused multiply-add]] are five cycles.
  
The FP has a single pipe for 128-bit load operations. In fact, the entire FP side is optimized for 128-bit operations. Zen supports all the latest instructions such as SSE and {{x86|AVX1}}/{{x86|AVX2|2}}. The way 256-bit AVX was designed was so that they can be carried out as two independent 128-bit operations. Zen takes advantage of that by operating on those instructions as two operations; i.e., Zen splits up 256-bit operations into two µOPs so they are effectively half the throughput of their 128-bit operations counterparts. Likewise, stores are also done on 128-bit chunks, making 256-bit loads have an effective throughput of one store every two cycles. The pipes are fairly well balanced, therefore most operations will have at least two pipes to be scheduled on retaining the throughput of at least one such instruction each cycle. As implies, 256-bit operations will use up twice the resources to complete (i.e., 2x register, scheduler, and ports). This is a compromise [[AMD]] has taken which helps conserve die space and power. By contrast, [[Intel]]'s competing design, {{intel|Skylake}}, does have dedicated 256-bit circuitry. It's also worth noting that Intel's contemporary {{intel|Skylake SP|server class models|l=core}} have extended this further to incorporate dedicated 512-bit circuitry supporting {{x86|AVX-512}} with the highest performance models {{intel|Skylake (server)#Execution_engine|l=arch|having a whole second}} dedicated AVX-512 unit.
+
The FP has a single pipe for 128-bit load operations. In fact, the entire FP side is optimized for 128-bit operations. Zen supports all the latest instructions such as SSE and {{x86|AVX1}}/{{x86|AVX2|2}}. The way 256-bit AVX was designed was so that they can be carried out as two independent 128-bit operations. Zen takes advantage of that by operating on those instructions as two operations; i.e., Zen splits up 256-bit operations into two µOPs so they are effectively half the throughput of their 128-bit operations counterparts. Likewise, stores are also done on 128-bit chunks, making 256-bit loads have an effective throughput of one store every two cycles. The pipes are fairly well balanced, therefore most operations will have at least two pipes to be scheduled on retaining the throughput of at least one such instruction each cycle. As implies, 256-bit operations will use up twice the resources to complete (i.e., 2x register, scheduler, and ports). This is a compromise [[AMD]] has taken which helps conserve die space and power. By contrast, [[Intel]]'s competing design, {{intel|Skylake}}, does have dedicated 256-bit circuitry. It's also worth noting that Intel's contemporary {{intel|Skylake SP|server class models|l=core}} have extended this further to incorporate dedicated 512-bit circuitry supporting {{x86|AVX-512}} with the highest performance models {{intel|Skylake#Execution_engine_2|l=arch|having a whole second}} dedicated AVX-512 unit.
  
 
Additionally Zen also supports {{x86|SHA}} and {{x86|AES}} with 2 AES units implemented in an attempt to improve encryption performance. Those units can be found on pipes 0 and 1 of the floating point scheduler.
 
Additionally Zen also supports {{x86|SHA}} and {{x86|AES}} with 2 AES units implemented in an attempt to improve encryption performance. Those units can be found on pipes 0 and 1 of the floating point scheduler.

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)

This page is a member of 1 hidden category:

codenameZen +
core count4 +, 6 +, 8 +, 16 +, 24 +, 32 + and 12 +
designerAMD +
first launchedMarch 2, 2017 +
full page nameamd/microarchitectures/zen +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerGlobalFoundries +
microarchitecture typeCPU +
nameZen +
pipeline stages19 +
process14 nm (0.014 μm, 1.4e-5 mm) +