Should your backups to disk consume more disk than you use for production? Seriously?

So, let’s talk about this not-so-hypothetical customer… They have:

  • A few sites
  • A lot of data per site
  • Much of the data is DBs and Multimedia
  • No replication currently
  • Can’t back up everything currently
  • No proper DR
  • Fairly significant rate of change
  • Not the fastest pipes between sites

They asked me to propose a solution that will back everything up and cross-replicate the backups between the sites. They want to move as far away from tape as possible.

After much deliberation and examination of the data and requirements, we concluded that, in order to back everything up (and to stick to their requirements), even with various kinds of dedupe (I sized the solution with best practices for the usual suspects), due to the rate of change and the large amount of data with poor undedupability (that can’t possibly be a word), they will need about 3x the total amount of production space in order to achieve backups to disk (including dedupe!)

So, we declined to propose a solution. I want to sell something as much as the next guy but primarily I want repeat customers and the only way to get a happy repeat customer is to not screw him the first time… And selling them 3x the space only for backups doesn’t make too much sense to me when they could be spending their money much more wisely.

I explained how it doesn’t make sense to spend that kind of money on disk that’s just for backups! After all, backups are a last resort. My list of preferred methods for recovery (from best to worst):

  1. Local and remote replication + application-aware snapshots
  2. Backups to disk
  3. Backups to tape
  4. Snot, a claw hammer, duct tape and bailing wire (sometimes actually works better than tape but anyway…)

Wouldn’t it be a slightly better idea to use maybe 2x the disk, possibly even spend less money compared to the backup-only solution, and instead:

  • Cross-replicate the production data for rapid recovery
  • Achieve full local and remote DR
  • Be able to go back in time with snapshots both locally and remotely
  • Replicate the snapshots themselves automatically
  • Still get dedupe but this time on primary storage (make the current storage last longer)
  • Not need a forklift upgrade (investment protection)
  • Reduce or eliminate tape and reliance on the backup software
  • Get even longer retention than with backups to disk
  • No pipe upgrades
  • Drastically simplify administration
  • Potentially save millions over the next few years!

We’ll see what they decide to do. There was tremendous resistance to what I and a horde of seasoned engineers believe is the proper solution, with all kinds of very reasonable excuses being voiced (“we have no time, no resources, the stakeholders don’t care” etc). However, my position on this is clear. Yes, there’s more short-term pain in order to transform the infrastructure to the utopic vision of the bullets above, but the long-term gains are staggering!

I’ll let everyone know what happened the moment I hear. This one is really interesting…

[update: they opted for a deduping backup appliance, that didn’t scale nearly enough so they had to scrap millions of dollars’ worth of gear]


, , , , , , , ,

One Reply to “Should your backups to disk consume more disk than you use for production? Seriously?”

  1. D-

    Sounds like this customer is trying to lump their DR requirements together with their backup solution. It is probably difficult for them to do proper replication on their front-line disk arrays due to cost or lack of features, so the task has fallen to the backup team. Once the problem has reached this point, it is a slippery slope: costs, speeds & feeds, disk space, dedupe, replication — all have to be beefed up to accommodate the sheer amount of data.

    This problem should be tackled earlier up the chain when the data is created in production. Use a disk solution that dedupes the production data and replicates only what’s needed for business recovery to the DR site. Even better if it replicates only the changed blocks, and the data stays deduped across the WAN and when it lands at the DR site.

    For backups, the only thing better than dedupe technology on your backup device is to not duplicate the data in the first place! “Non-duplicating backup technology.” Some vendors are able to backup their primary systems to a disk-based target using block-level infinite-incremental technology. After the initial baseline transfer, all subsequent backups only transfer the blocks that have changed since the last backup. In other words, it never sends duplicate data. And the data is readable on the disk target by any unix or windows browser – it’s not in some proprietary format that needs to be retrieved via the backup agent. This is much more efficient that doing daily “thick” backups to a giant disk-based sausage grinder that must boil out the common data, then manage it downstream.

    Alright enough. Just my .02 Good luck with this customer.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.