I want to arm you with the knowledge needed to properly navigate storage efficiency guarantee contracts and arrive at a safe system sizing, with reasonable assumptions.
This is another of my generic, vendor-neutral posts aimed at helping the audience be aware of certain important things that I often see overlooked.
How does one navigate the small print around storage data reduction guarantees? What are you entitled to if the vendor misses the mark? And how do you minimize your risk when faced with certain sales teams that are determined to win even if it means huge customer risk?
Do you know your data makeup? And how it may affect a capacity guarantee? But, more importantly, how it will affect your overall efficiency?
Let’s start with a nice reductio ad absurdum example to illustrate what I mean.
Imagine you have 500TB of stuff to store.
A certain vendor tells you “we will guarantee you’ll get 5:1” and without asking more questions, sizes your solution for 100TB usable. Best price out of everyone else by a rather large amount. Impressed, you buy this system, and have a celebratory dinner since you saved so much money.
You proceed to store your stuff but discover you can’t do it: the array quickly fills up. Because, as it eventually turns out, all you’re storing is videos.
You ask the vendor to honor their 5:1 guarantee.
Do you honestly think the vendor will give you an extra 400TB usable for free? When you initially bought 100TB?
Let’s go over how all this works.
Data Reduction is Science
Compressing and deduping data isn’t voodoo, it’s just math. Some data can be reduced in size more than others, and some reduction approaches are better than others, but, in general, if a piece of data can truly be compressed at 5:1, it’s not like one solution will do 5:1, another 10:1 and a third 1.5:1.
However, there are lots of things that can’t generally be non-destructively reduced in size, like:
- MP3, aac, or other similar perceptually encoded audio
- Videos
- Compressed files of any type (including DBs or backups)
- Encrypted data (for example, encrypted DBs)
- In general, whenever the data host server does compression or encryption – for example, turning on filesystem compression means EVERY file you store will be compressed by the OS itself and sent as already compressed data to the storage system.
The amount of this non-reducible data will vastly affect the overall data reduction ratio possible by the storage system.
If you’re unsure what any of this means, try zipping a text file on your PC. Compare the size before and after compression. Then try to zip the zipped file. Compare the size of the two zipped files… 🙂
Non-Reducible Data vs Efficiency Ratio
Check out this table showing how non-reducible data affects overall data reduction ratios.
If your reducible data can be reduced 5:1, and half of your overall data is non-reducible, your final reduction ratio will be 1.67:1.
Even a relatively small percentage of non-reducible data like 20% is enough to massively erode data reduction ratios:
| Reducible Data DRR | 10% Non-Reducible | 20% Non-Reducible | 30% Non-Reducible | 40% Non-Reducible | 50% Non-Reducible |
| Reducible at 3:1 | 2.5:1 | 2.14:1 | 1.88:1 | 1.67:1 | 1.50:1 |
| Reducible at 5:1 | 3.57:1 | 2.78:1 | 2.27:1 | 1.92:1 | 1.67:1 |
Why is this important? Because the overall ratio is what determines your sizing, not just the reducible ratio. Underestimating the amount of non-reducible data can hugely affect storage sizing.
What if your Reducible data Reduces by a Lot?
Back to the original scenario – slightly modified. You have 500TB of stuff, you were sized for 100TB usable. This time, 400TB of it is videos. 100TB is highly reducible and your array manages to shrink the reducible stuff at a bit over 5:1. Your total capacity requirement is 420TB of usable space.
Your overall reduction is still abysmal – at 80% non-reducible, you are at about 1.2:1 data reduction!
How much capacity do you think the vendor will give you for free?
If you guessed zero, the answer is right, since the contract is for the reducible data ratio, and they achieved that.
Non-reducible data is not a part of such guarantees unless explicitly factored into the initial sizing and shown in writing.
How Much Maximum Free Capacity are you Entitled to?
Depending on the vendor contract, there may be a limit to how much capacity you may get as part of a remediation.
For example, one contract may state you will get up to the same amount as you originally purchased, another may state you will get 50%.
Let’s go with a different example this time. One that’s the worst case scenario for the vendor, not for you:
Again you have 500TB of stuff (to stick to the same theme). Once more, you are sized for 100TB usable.
This time you get 2:1 overall reduction but array telemetry analysis shows you aren’t storing ANY incompressible data. Your data just doesn’t reduce more than 2:1. So clearly you’re not in breach of anything here, it’s all on the vendor to remediate since they promised 5:1 and you are getting 2:1 and all your data shows as reducible.
Well, that’s where the small print comes into play.
This particular vendor contract has the stipulation that, at most, you are entitled to 50% of the originally sized capacity as remediation.
So let’s do some math:
- You have 100TB usable but
- You need 250TB usable since your 500TB of data only reduces by 2:1 but
- Because of the 50% maximum remediation stipulation, at most you can get another 50TB for free (50% of 100TB initial capacity is 50TB).
- You’re still short 100TB usable space!
Rest assured, you will promptly get an invoice for 100TB usable space, at whatever cost that vendor deems appropriate, which may or may not bear any resemblance to what you originally paid for the first 100TB.
Other Sizing Considerations
How much free space will you have after migration? I’ve seen sizings that would result in a completely full system even if the array really could overall reduce the data as promised and nothing was out of the ordinary.
Are you sure you want zero headroom?
Does the vendor sizer add any safety buffer to the solution?
Overall though, the biggest culprit for a risky initial sizing is a poor estimate of the non-reducible amount of data you have (or, worse, assuming zero for that). I trust you’ve been warned sufficiently about how important this is, by now.
Yesterday’s Technology Today! (RAID 5)
Is the vendor trying to cut even more corners by using single parity protection? It’s depressing we still need to have this discussion in 2024, but in modern systems with huge amounts of global metadata, large single capacity pools and increasing drive sizes, single parity RAID is… contra-indicated 🙂
Any data loss problems won’t affect a LUN or two like in the days of old, but rather you’ll lose your entire system.
Don’t risk it just to gain a bit of space back. Stick to a minimum of dual parity.
Read this for more info.
How to Protect Yourself
All this isn’t fear-mongering. Just trying to illustrate that it’s important to know the rules. The rules aren’t there to be broken, but rather to protect both customer and storage vendor.
But to summarize,
- Understand Data Types: Know which types of your data are non-reducible. Know if your DBAs are compressing and/or encrypting DBs. Have some idea of your data classification. Know if your admins are using filesystem compression or host-level encryption. This is never a bad idea.
- If you’re being given a ratio guarantee, is that ratio on overall savings, or just on the reducible data?
- Read the Fine Print: Carefully review the vendor’s capacity guarantee terms.
- Get It in Writing: Ensure all agreements are documented, including specific data types, non-reducible assumptions and remediation limits.
- Don’t cut corners for mission critical systems! Ensure you have some free space after migration, and that you’re using strong RAID protection.
And always remember that if there is a huge difference in sizing between vendors, you may want to double check why.
“The bitterness of poor quality remains long after the sweetness of low price is forgotten.” – Benjamin Franklin
D



Very well explained and reasoned, as always, Dimitris.
One category of non-compactible or nearly non-compactible data that often gets overlooked is compiled code. All systems have executables, and the actual amount of compiled code varies hugely across OS types, VMs, containers, and vendor libraries (their installation folders). And these sneak into the darndest places, as ISVs and OS providers drop files with very little consistency in nooks and crannies of your OS.
Make sure your customers and admins who say “we don’t have any of those audio or video or encrypted files, no worries” think about all their compiled code and files.