(table format fix) |
(Corrected a typo) |
||
Line 11: | Line 11: | ||
| <code>CLFLUSHOPT</code> || Optimized CLFLUSH; Behaves similarly to <code>CLFLUSH</code> but without the serialization, thereby optimized for performance by allowing for some concurrency when executing multiple CLFLUSHOPT instructions back-to-back. | | <code>CLFLUSHOPT</code> || Optimized CLFLUSH; Behaves similarly to <code>CLFLUSH</code> but without the serialization, thereby optimized for performance by allowing for some concurrency when executing multiple CLFLUSHOPT instructions back-to-back. | ||
|- | |- | ||
− | | <code>CLWB</code> || Cache line write back; behaves similarly to <code>CLFLUSHOPT</code> but keeps the cache line valid (i.e., the cache line is flushed and then marked as no longer dirty) thereby optimized for performance by keeping the line in the cache, increasing the | + | | <code>CLWB</code> || Cache line write back; behaves similarly to <code>CLFLUSHOPT</code> but keeps the cache line valid (i.e., the cache line is flushed and then marked as no longer dirty) thereby optimized for performance by keeping the line in the cache, increasing the chance of a [[cache hit]]. |
|} | |} | ||
Latest revision as of 19:08, 13 May 2021
Instruction Set Architecture
- Instructions
- Addressing Modes
- Registers
- Model-Specific Register
- Assembly
- Interrupts
- Micro-Ops
- Timer
- Calling Convention
- Microarchitectures
- CPUID
Persistent memory extensions (PMEM) are a set of x86 instructions designed to improve the usability of working with storage-class memory.
Overview[edit]
Intel adopted the SNIA NVM Programming Model for working with persistent memory. This model allows for direct access (DAX) using byte-addressable operations (i.e., load/store), however, the persistence of the data in the cache is not guaranteed until it has entered the persistence domain. x86 provides a set of instructions for flushing cache lines in a more optimized way. In addition to existing x86 instructions such as non-temporal stores, CLFLUSH
, and WBINVD
(kernel only), two new instructions were added:
Instruction | Description |
---|---|
CLFLUSHOPT |
Optimized CLFLUSH; Behaves similarly to CLFLUSH but without the serialization, thereby optimized for performance by allowing for some concurrency when executing multiple CLFLUSHOPT instructions back-to-back.
|
CLWB |
Cache line write back; behaves similarly to CLFLUSHOPT but keeps the cache line valid (i.e., the cache line is flushed and then marked as no longer dirty) thereby optimized for performance by keeping the line in the cache, increasing the chance of a cache hit.
|
Both of the new instructions must follow by a SFENCE
to ensure all flushes are completed before continuing.
Detection[edit]
CPUID | Instruction Set | |
---|---|---|
Input | Output | |
EAX=07H, ECX=0 | EBX[bit 23] | CLFLUSHOPT |
EBX[bit 24] | CLWB |
Microarchitecture support[edit]
Instruction | Introduction | |
---|---|---|
Intel | AMD | |
CLFLUSHOPT |
Skylake (server) Skylake (client) Goldmont |
Zen |
CLWB |
Skylake (server) Ice Lake (client) |
Zen 2 |
Intrinsic functions[edit]
#include <immintrin.h>
# clflushopt
void _mm_clflushopt (void const * p)
# clwb
void _mm_clwb (void const * p)