In modern storage devices (especially All Flash Arrays), extensive data reduction techniques are commonplace and expected by customers.
This has, unavoidably, led to various marketing schemes that aim to make certain systems seem more appealing than the rest. Or at least not less appealing…
I will attempt to explain what customers should be looking for when trying to decipher capacity claims from a manufacturer.
In a nutshell – and for the ADD-afflicted – the most important number you should be looking for is the Effective Capacity Ratio, which is simply: (Effective Capacity)/(Raw Capacity). Ignore the more common but far less useful Data Reduction Ratio, which is: (Effective Capacity)/(Usable Capacity).
Read on for more detail…
Most modern (and some not so modern) storage vendors claim a similar Data Reduction Ratio for their devices. Numbers like 5:1 seem commonplace these days. Which, to anyone with a modicum of common sense, simply means “I can store five times more stuff in the same space”. Unfortunately, the same 5:1 ratio does not mean the same end result for all vendors…
Why this is a problem
Customers simply want to get a good deal for their money. The reality is that the exact same 5:1 Data Reduction Ratio may mean very different things between storage devices.
For instance: Not all vendors count space savings using the same math. Is the virtual size of snapshots calculated in the final savings ratio? (for example, if I take 5 snaps of a 1TB volume one after another, is that showing up as 6:1 savings?)
How about Thin Provisioning? Is that included? Such math can wildly alter the overall efficiency numbers.
Here’s a clear example of why counting thin provisioning in the overall ratio is probably misleading, and best used only for educational purposes… Remember that overall savings ratios are multiplicative. For example, 2:1 compression and 2:1 deduplication mean 4:1 overall savings:
Does anyone really believe that such a system is actually providing 1000:1 savings? Or even 10:1? Yet that’s what many vendors are counting towards savings ratios, without separating the thin provisioning savings from the overall savings number.
However, there is another dimension, and it has to do with how efficiently the raw capacity is actually utilized. It will be one of my many Captain Obvious moments for some of you, but I’ve seen enough people get confused, so it’s worth explaining.
The Effective Capacity Ratio
Different storage systems have different ways of utilizing their raw capacity. For example, a mirrored system can never have better than a 50% Usable:Raw ratio. By definition, it’s mathematically impossible since 2 copies are needed. That’s not even counting spares and other possible overheads.
Systems that do triple mirroring can’t do better than a theoretical 33.3% Usable:Raw etc.
Some definitions are in order:
- Usable Capacity: How much data I can store in a system after overheads such as RAID, sparing etc. but before data reduction techniques.
- Raw Capacity: Add up the capacity of all the storage media in the system. Usually a Base 10 number (TB/GB, not Base 2, TiB/GiB)
- Effective Capacity: How much data I can store in a system after data reduction techniques like Deduplication and Compression but not Thin Provisioning
- Data Reduction Ratio: (Effective Capacity)/(Usable Capacity)
- Effective Capacity Ratio: (Effective Capacity)/(Raw Capacity)
Who is More Efficient?
If every vendor is claiming 5:1 average savings, who is truly more efficient? A Reductio ad Absurdum example makes it pretty clear:
- A vendor that can do a Data Reduction Ratio of 5:1 but has a Usable Capacity of 10% vs Raw or…
- A vendor that can do a Data Reduction Ratio of 5:1 but has a Usable Capacity of 70% vs Raw?
Let’s put some numbers on a table. They roughly correspond to some existing storage vendors today (there may be some variation depending on whether the numbers for each line are TB vs TiB – everyone shows numbers differently – but the overall point remains the same):
As you can see, the same Data Reduction Ratio, on the same amount of Raw Capacity, can have wildly different Effective Capacity results, depending on the system.
Clearly, a truly efficient system is one that can both:
- Provide a high Usable:Raw ratio and
- Provide a high Data Reduction Ratio (that does not include fluff like Thin Provisioning)
The Business Benefits of a High Effective Capacity Ratio
There are multiple business reasons why chasing a high Effective Capacity Ratio is important:
- High rack density. Some vendors are eight times more dense than others (ironically, the ones claiming “white box economics” are the least dense and most expensive). This could lead to significant power/cooling and $/rack savings
- Lower CapEx – if a vendor needs 2x the Raw Capacity vs another, guess who’s paying for the difference?
- Less complexity – denser devices means less devices, less cables, less switching
What you can do as a Customer
Level the playing field. It’s actually pretty easy:
- Insist on seeing all capacity numbers expressed as TiB from every vendor – and if they don’t know what that means, run… (or at least subtract 9% from any capacity number you see and you’ll be safer)
- Insist on seeing the Data Reduction Ratio without including Thin Provisioning or Snapshots. If they can’t break out the savings by category, run…
- Insist on seeing the Usable:Raw percentage for various configurations. Is it better in larger configs vs smaller? Is it following all best practices? What are the best practices for space consumption? If they tell you to ignore the man behind the curtain, run… (there is a theme developing)
- Do your own Effective Capacity Ratio calculation and assign a $/Effective TiB to each vendor