From WikiChip
Memory Scrubbing

Memory scrubbing refers to the process of correcting or 'scrubbing' erroneously flipped bits in memory as a result of transient faults such as those caused by physical phenomena. Scrubbing is considered a RAS feature.

Motivation[edit]

Protecting data integrity in memory is a critical aspect of modern computer systems. Some aspects of the hardware, such as soft errors, are difficult or impossible to predict or detect. This is due to their random nature caused by physical phenomena such as neutron and alpha particle hitting the chip. Additionally, for some of the emerging memories being researched, various drifts (e.g., resistance) in the cells worsen over time which can flip its state.

To protect the memory, mechanisms such as error correcting codes (ECC) are often employed. Those typically provide single-bit error correction and double-bit error detection (SECDED). Although individual bit errors are independent events, two separate hits are still capable of flipping two different bits in the same word. In other words, non-adjacent multi-bit errors (e.g. cosmic rays striking the same row) are not correctable.

Overview[edit]

Scrubbing is designed to address the multi-bit error problem which cannot be corrected by the various multi-bit ECC algorithms. Scrubbing takes advantage of the low probability of having two strikes in the same word at any given time, to periodically cycle the memory through the ECC logic in order to correct single-bit errors before they accumulate.

Modes[edit]

  • patrol scrubbing - System periodically goes through every data entry in the entire memory, checks for a correctable error, corrects if possible/needed and writes the value back to memory.
  • demand scrubbing - System checks for a correctable error upon a data request.