In modern storage devices (especially All Flash Arrays), extensive data reduction techniques are commonplace and expected by customers.

This has, unavoidably, led to various marketing schemes that aim to make certain systems seem more appealing than the rest. Or at least not *less* appealing…

I will attempt to explain what customers should be looking for when trying to decipher capacity claims from a manufacturer.

In a nutshell – and for the ADD-afflicted – **the most important number you should be looking for is the Effective Capacity Ratio, which is simply: (Effective Capacity)/(Raw Capacity)**

*. Ignore*the more common but far less useful Data Reduction Ratio, which is: (Effective Capacity)/(Usable Capacity).

Read on for more detail…

## The problem

Most modern (and some not so modern) storage vendors claim a similar Data Reduction Ratio for their devices. Numbers like 5:1 seem commonplace these days. Which, to anyone with a modicum of common sense, simply means “I can store five times more stuff in the same space”. Unfortunately, the same 5:1 ratio does *not* mean the same end result for all vendors…

*Why* this is a problem

Customers simply want to get a good deal for their money. The reality is that the exact same 5:1 Data Reduction Ratio may mean *very* different things between storage devices.

For instance: Not all vendors count space savings using the same math. Is the virtual size of snapshots calculated in the final savings ratio? (for example, if I take 5 snaps of a 1TB volume one after another, is that showing up as 6:1 savings?)

How about Thin Provisioning? Is that included? Such math can *wildly* alter the overall efficiency numbers.

Here’s a clear example of why counting thin provisioning in the overall ratio is probably misleading, and best used only for educational purposes… *Remember that overall savings ratios are multiplicative*. For example, 2:1 compression and 2:1 deduplication mean 4:1 overall savings:

Does anyone *really* believe that such a system is actually providing *1000:1 *savings? Or even 10:1? **Yet that’s what many vendors are counting towards savings ratios, without separating the thin provisioning savings from the overall savings number**.

However, there is *another* dimension, and it has to do with **how efficiently the raw capacity is actually utilized**. It will be one of my many *Captain Obvious* moments for some of you, but I’ve seen enough people get confused, so it’s worth explaining.

## The Effective Capacity Ratio

Different storage systems have different ways of utilizing their raw capacity. For example, a mirrored system can never have better than a 50% Usable:Raw ratio. By definition, it’s mathematically *impossible* since 2 copies are needed. That’s not even counting spares and other possible overheads.

Systems that do triple mirroring can’t do better than a theoretical 33.3% Usable:Raw etc.

### Some definitions are in order:

*Usable Capacity*: How much data I can store in a system*after*overheads such as RAID, sparing etc. but*before*data reduction techniques.*Raw Capacity*: Add up the capacity of all the storage media in the system. Usually a Base 10 number (TB/GB, not Base 2, TiB/GiB)*Effective Capacity*: How much data I can store in a system*after*data reduction techniques like Deduplication and Compression but*not*Thin Provisioning*Data Reduction Ratio*: (Effective Capacity)/(Usable Capacity): (Effective Capacity)/(Raw Capacity)**Effective Capacity Ratio**

### Who is *More* Efficient?

If every vendor is claiming 5:1 average savings, who is *truly* more efficient? A *Reductio ad Absurdum *example makes it pretty clear:

- A vendor that can do a Data Reduction Ratio of 5:1 but has a Usable Capacity of
**10%**vs Raw or… - A vendor that can do a Data Reduction Ratio of 5:1 but has a Usable Capacity of
**70%**vs Raw?

Let’s put some numbers on a table. They roughly correspond to some existing storage vendors today (there may be some variation depending on whether the numbers for each line are TB vs TiB – everyone shows numbers differently – but the overall point remains the same):

As you can see, the *same* Data Reduction Ratio, on the *same* amount of Raw Capacity, can have *wildly* different Effective Capacity results, depending on the system.

Clearly, a truly efficient system is one that can both:

- Provide a high Usable:Raw ratio
*and* - Provide a high Data Reduction Ratio (that does
*not*include fluff like Thin Provisioning)

## The Business Benefits of a High Effective Capacity Ratio

There are multiple business reasons why chasing a high Effective Capacity Ratio is important:

- High rack density. Some vendors are
*eight times*more dense than others (ironically, the ones claiming “white box economics” are the*least*dense and most expensive). This could lead to*significant*power/cooling and $/rack savings - Lower CapEx – if a vendor needs 2x the Raw Capacity vs another, guess who’s paying for the difference?
- Less complexity – denser devices means
*less*devices, less cables, less switching

## What *you* can do as a Customer

*Level the playing field*. It’s actually pretty easy:

- Insist on seeing all capacity numbers expressed as
*TiB*from every vendor – and if they don’t know what*that*means,*run*… (or at least subtract 9% from any capacity number you see and you’ll be safer) - Insist on seeing the Data Reduction Ratio
*without*including Thin Provisioning or Snapshots. If they can’t break out the savings by category,*run*… - Insist on seeing the Usable:Raw percentage for various configurations. Is it better in larger configs vs smaller? Is it following all best practices? What
*are*the best practices for space consumption? If they tell you to ignore the man behind the curtain,*run*… (there is a theme developing) and assign a $/Effective TiB to each vendor**Do your own Effective Capacity Ratio calculation**

D

Technorati Tags: Nimble, Storage, Deduplication, Compression, Thin Provisioning, Snapshot, Effective Capacity Ratio