Modern RAID Must Protect Against Multiple Temporally Correlated Errors

Modern data protection needs to adapt to protecting modern media. RAID is no exception. In this article I will explain why modern storage consumers need to be asking for certain kinds of protection and not settling for less.

To summarize, don’t bother with storage that can’t provide at least dual parity protection for any given piece of data (whether that’s an array, HCI or the cloud, it doesn’t matter).

Why? Two big reasons:

  1. Because media these days is both larger and fails differently than in the past. Which means Temporally Correlated Errors are far more likely to happen, so you need protection against those. It’s not doom-mongering. It’s based on data.
  2. In the olden days, arrays had small RAID groups that each held a handful of volumes. If something was damaged in a RAID group, at most you’d just lose that handful of volumes. Modern arrays use pools of space, typically made up of multiple RAID groups. This means that you can potentially damage all volumes in an array merely by losing data integrity in a single RAID group in the pool. I’m sure you aren’t exactly looking forward to experiencing that.

I will take you step by step through this, as is my idiom. It is though rather sad that I have to write this kind of thing in 2020…

When Terrified Vendors Attack: The Dell PowerStore Edition

Dell is at it again. This time, they paid Principled Technologies to do some tests in order to produce a ridiculous report trying to compare the high-end HPE Primera to the midrange Dell EMC PowerStore.

I’ll expose some of the more egregious errors in their methodology and overall thinking, but first I want to direct readers to an easy way to impartially compare for themselves, without having to read a FUD document sponsored by anyone at all.

Executive Summary: A Primera 670 is multiple times faster than a PowerStore 9000T, has stronger data protection, and much higher uptime.

The Harsh Realities of PCIe Lane Shortage in Storage Systems

There are a lot of myths and misinformation, plus more than a modicum of misunderstanding, regarding how storage systems can use available bandwidth, especially with certain newer kinds of media.

I wanted to explain some of the harsh facts of storage system design in the real world, and why one shouldn’t just add up drive speeds to estimate performance.

HCI Failure Modes and Maintenance

I got the idea for this blog after speaking with multiple customers that were contemplating switching to certain kinds of grid computing/storage (like HCI) without fully understanding the ramifications of doing so.

You see, they were (rightly so) enamored by concepts such as automation, ease of consumption and scaling. But they forgot to ask some very important questions. See here for the dangers of getting too carried away with something new and taking things for granted.

This isn’t a post claiming HCI and grid-type storage constructs are bad. Like any tool, they can be used in various ways, some of them aggressively ill-advised. The point of this post is to help customers ask for the right configuration so they don’t get stuck with a sub-optimal and risky design.

I tried to make this post as short as possible but as someone once said, “Everything should be made as simple as possible, but not simpler”. Which, ironically, is a simplification of what Einstein actually said 🙂

The Loss of Important Knowledge and Acumen Through Perceived Commoditization

I posit that we now have a whole new class of consumer that is completely oblivious to certain hitherto fundamental concepts – and this can lead to poor business decisions and overall sub-optimal execution and results.

I got the idea after a discussion with an ex colleague (that’s now working for a cloud vendor) where he proudly proclaimed that infrastructure is unimportant and uninteresting.

I’ll start generically and shift to IT. The generic aspect of this problem is very interesting, since it’s lowering quality in all sorts of fields.

And never forget: Just because something is widely and easily available doesn’t mean it’s better. It simply means that more people have access to it.

