Before all the variable-block aficionados go up in arms, I freely admit variable-block deduplication may overall squeeze more dedupe out of your data.
I won’t go into a laborious explanation of variable vs fixed, but, in a nutshell, fixed-block deduplication means that data is split into equal chunks, each chunk given a signature, compared to a DB and the common chunks are not stored.
Variable-block basically means the chunk size is variable, with more intelligent algorithms also having a sliding window, so that even if the content in a file is shifted, the commonality will still be discovered.
With that out of the way, let’s get to the FUD part of the post.
I recently had a TLA vendor tell my customer: “NetApp deduplication is fixed-block vs our variable-block, therefore far less efficient, therefore you must be sick in the head to consider buying that stuff for primary storage!”
This is a very good example of FUD that is based on accurate facts which, in addition, focuses the customer’s mind on the tech nitty-gritty and away from the big picture (that being “primary storage” in this case).
Using the argument for a pure backup solution is actually valid. But what if the customer is not just shopping for a backup solution? Or, what if, for the same money, they could have it all?
My question is: Why do we use deduplication?
At the most basic level, deduplication will reduce the amount of data stored on a medium, enabling you to buy less of said medium yet still store quite a bit of data.
So, backups were the most obvious place to deploy deduplication. Backup-to-Disk is all the rage, what if you can store more backups on target disk with less gear? That’s pretty compelling. In that space you have of course Data Domain and the Quantum DXi as the two of the more usual backup target suspects.
Another reason to deduplicate is to not only achieve more storage efficiency but also improve backup times by not even transferring over the network data that’s already been transferred. In that space there’s Avamar, PureDisk, Asigra, Evault and others.
NetApp simply came up with a few more reasons to deduplicate, not mutually exclusive with the other 2 use cases above:
- What if you could deduplicate your primary storage â€“ typically the most expensive part of any storage investment â€“ and as a result buy less?
- What if deduplication could actually dramatically improve your performance in some cases, while not hindering it in most cases? (the cache is deduplicated as well, more info later).
- What if deduplication was not limited to infrequently-accessed data but, instead, could be used for high-performance access?
For the uninitiated, NetApp is the only vendor, to date, that can offer block-level deduplication for all primary storage protocols for production data – block and file, FC, iSCSI, CIFS, NFS.
Which is a pretty big deal, as is anything useful AND exclusive.
What the FUD carefully fails to mention is that:
- Deduplication is free to all NetApp customers (whoever didn’t have it before can get it via a firmware upgrade for free)
- NetApp customers that use this free technology see primary storage savings that I’ve seen range anywhere from 10% to 95%, despite all the limitations the FUD-slingers keep mentioning
- It works amazingly well with virtualization and actually greatly speeds things up especially for VDI
- Things that would defeat NetApp dedupe will also defeat the other vendors’ dedupe (movies, compressed images, large DBs with a lot of block shuffling). There is no magic.
So, if a customer is considering a new primary storage system, like it or not, NetApp is the only game in town with deduplication across all storage protocols.
Which brings us back to whether fixed-block is less efficient than variable-block:
WHO CARES? If, even with whatever limitations it may have, NetApp dedupe can reduce your primary storage footprint by any decent percentage, you’re already ahead! Heck, even 20% savings can mean a lot of money in a large primary storage system!
Not bad for a technology given away with every NetApp systemâ€¦