Modern RAID Must Protect Against Multiple Temporally Correlated Errors

Modern data protection needs to adapt to protecting modern media. RAID is no exception. In this article I will explain why modern storage consumers need to be asking for certain kinds of protection and not settling for less.

To summarize, don’t bother with storage that can’t provide at least dual parity protection for any given piece of data (whether that’s an array, HCI or the cloud, it doesn’t matter).

Why? Two big reasons:

  1. Because media these days is both larger and fails differently than in the past. Which means Temporally Correlated Errors are far more likely to happen, so you need protection against those. It’s not doom-mongering. It’s based on data.
  2. In the olden days, arrays had small RAID groups that each held a handful of volumes. If something was damaged in a RAID group, at most you’d just lose that handful of volumes. Modern arrays use pools of space, typically made up of multiple RAID groups. This means that you can potentially damage all volumes in an array merely by losing data integrity in a single RAID group in the pool. I’m sure you aren’t exactly looking forward to experiencing that.

I will take you step by step through this, as is my idiom. It is though rather sad that I have to write this kind of thing in 2020…

Continue reading “Modern RAID Must Protect Against Multiple Temporally Correlated Errors”