From WikiChip
Editing amd/microarchitectures/zen 2

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 420: Line 420:
 
A two-level translation lookaside buffer (TLB) assists load and store address translation. The fully-associative L1 data TLB contains 64 entries and holds 4-Kbyte, 2-Mbyte, and 1-Gbyte page table entries. The L2 data TLB is a unified 12-way set-associative cache with 2048 entries, up from 1536 entries in Zen, holding 4-Kbyte and 2-Mbyte page table entries, as well as page directory entries (PDEs) to speed up DTLB and ITLB table walks. 1-Gbyte pages are <i>smashed</i> into 2-Mbyte entries but installed as 1-Gbyte entries when reloaded into the L1 TLB.
 
A two-level translation lookaside buffer (TLB) assists load and store address translation. The fully-associative L1 data TLB contains 64 entries and holds 4-Kbyte, 2-Mbyte, and 1-Gbyte page table entries. The L2 data TLB is a unified 12-way set-associative cache with 2048 entries, up from 1536 entries in Zen, holding 4-Kbyte and 2-Mbyte page table entries, as well as page directory entries (PDEs) to speed up DTLB and ITLB table walks. 1-Gbyte pages are <i>smashed</i> into 2-Mbyte entries but installed as 1-Gbyte entries when reloaded into the L1 TLB.
  
Two hardware page table walkers handle L2 TLB misses, presumably one serving the DTLB, another the ITLB. In addition to the PDE entries, the table walkers include a 64-entry page directory cache which holds page-map-level-4 and page-directory-pointer entries speeding up DTLB and ITLB walks. Like the Zen/Zen+ microarchitecture, Zen 2 supports page table entry (PTE) coalescing. When the table walker loads a PTE, which occupies 8 bytes in the x86-64 architecture, from memory it also examines the other PTEs in the same 64-byte cache line. If a 16-Kbyte aligned block of four consecutive 4-Kbyte pages are also consecutive and 16-Kbyte aligned in physical address space and have identical page attributes, they are stored into a single TLB entry greatly improving the efficiency of this cache. This is only done when the processor is operating in long mode.
+
Two hardware page table walkers handle L2 TLB misses, presumably one serving the DTLB, another the ITLB. In addition to the PDE entries, the table walkers include a 64-entry page directory cache which holds page-map-level-4 and page-directory-pointer entries speeding up DTLB and ITLB walks. The Zen/Zen+ microarchitecture supports page table entry (PTE) coalescing; this feature is probably also present in Zen 2. When the table walker loads a PTE, which occupies 8 bytes in the x86-64 architecture, from memory it also examines the seven other PTEs in the same 64-byte cache line. If they assign a contiguous 32 KiB block of physical pages with the same attributes to the 8 virtual pages, they are coalesced into a single TLB entry greatly improving the efficiency of this cache.
  
 
The LS unit relies on a 32 KiB, 8-way set associative, write-back, ECC-protected L1 data cache (DC). It supports two loads, if they access different DC banks, and one store per cycle, each up to 256 bits wide. The line width is 64 bytes, however cache stores are aligned to a 32-byte boundary. Loads spanning a 64-byte boundary and stores spanning a 32-byte boundary incur a penalty of one cycle. In the Zen/Zen+ microarchitecture 256-bit vectors are loaded and stored as two 128-bit halves; the load and store boundaries are 32 and 16 bytes respectively. Zen 2 can load and store 256-bit vectors in a single operation, but stores must be 32-byte aligned now to avoid the penalty. As in Zen aligned and unaligned load and store instructions (for example MOVUPS/MOVAPS) provide identical performance.
 
The LS unit relies on a 32 KiB, 8-way set associative, write-back, ECC-protected L1 data cache (DC). It supports two loads, if they access different DC banks, and one store per cycle, each up to 256 bits wide. The line width is 64 bytes, however cache stores are aligned to a 32-byte boundary. Loads spanning a 64-byte boundary and stores spanning a 32-byte boundary incur a penalty of one cycle. In the Zen/Zen+ microarchitecture 256-bit vectors are loaded and stored as two 128-bit halves; the load and store boundaries are 32 and 16 bytes respectively. Zen 2 can load and store 256-bit vectors in a single operation, but stores must be 32-byte aligned now to avoid the penalty. As in Zen aligned and unaligned load and store instructions (for example MOVUPS/MOVAPS) provide identical performance.

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameZen 2 +
core count4 +, 6 +, 8 +, 12 +, 16 +, 24 +, 32 + and 64 +
designerAMD +
first launchedJuly 2019 +
full page nameamd/microarchitectures/zen 2 +
instance ofmicroarchitecture +
instruction set architecturex86-64 +
manufacturerTSMC + and GlobalFoundries +
microarchitecture typeCPU +
nameZen 2 +
pipeline stages19 +