From WikiChip
Difference between revisions of "x86/persistent memory extensions"
< x86

(table format fix)
(Corrected a typo)
 
Line 11: Line 11:
 
| <code>CLFLUSHOPT</code> || Optimized CLFLUSH; Behaves similarly to <code>CLFLUSH</code> but without the serialization, thereby optimized for performance by allowing for some concurrency when executing multiple CLFLUSHOPT instructions back-to-back.
 
| <code>CLFLUSHOPT</code> || Optimized CLFLUSH; Behaves similarly to <code>CLFLUSH</code> but without the serialization, thereby optimized for performance by allowing for some concurrency when executing multiple CLFLUSHOPT instructions back-to-back.
 
|-
 
|-
| <code>CLWB</code> || Cache line write back; behaves similarly to <code>CLFLUSHOPT</code> but keeps the cache line valid (i.e., the cache line is flushed and then marked as no longer dirty) thereby optimized for performance by keeping the line in the cache, increasing the cache of a [[cache hit]].
+
| <code>CLWB</code> || Cache line write back; behaves similarly to <code>CLFLUSHOPT</code> but keeps the cache line valid (i.e., the cache line is flushed and then marked as no longer dirty) thereby optimized for performance by keeping the line in the cache, increasing the chance of a [[cache hit]].
 
|}
 
|}
  

Latest revision as of 19:08, 13 May 2021

Persistent memory extensions (PMEM) are a set of x86 instructions designed to improve the usability of working with storage-class memory.

Overview[edit]

Intel adopted the SNIA NVM Programming Model for working with persistent memory. This model allows for direct access (DAX) using byte-addressable operations (i.e., load/store), however, the persistence of the data in the cache is not guaranteed until it has entered the persistence domain. x86 provides a set of instructions for flushing cache lines in a more optimized way. In addition to existing x86 instructions such as non-temporal stores, CLFLUSH, and WBINVD (kernel only), two new instructions were added:

Instruction Description
CLFLUSHOPT Optimized CLFLUSH; Behaves similarly to CLFLUSH but without the serialization, thereby optimized for performance by allowing for some concurrency when executing multiple CLFLUSHOPT instructions back-to-back.
CLWB Cache line write back; behaves similarly to CLFLUSHOPT but keeps the cache line valid (i.e., the cache line is flushed and then marked as no longer dirty) thereby optimized for performance by keeping the line in the cache, increasing the chance of a cache hit.

Both of the new instructions must follow by a SFENCE to ensure all flushes are completed before continuing.

Detection[edit]

CPUID Instruction Set
Input Output
EAX=07H, ECX=0 EBX[bit 23] CLFLUSHOPT
EBX[bit 24] CLWB

Microarchitecture support[edit]

Instruction Introduction
Intel AMD
CLFLUSHOPT Skylake (server)
Skylake (client)
Goldmont
Zen
CLWB Skylake (server)
Ice Lake (client)
Zen 2

Intrinsic functions[edit]

#include <immintrin.h>

# clflushopt
void _mm_clflushopt (void const * p)

# clwb
void _mm_clwb (void const * p)

See also[edit]