I wonder when dedup will make it to the arrays

Anyone feel that deduplication is not finding its final resting place in backups and WAN accelerators?

It’s only a matter of time before the algorithms are run as a matter of choice on the array processors.

Of course, that means fewer disk sales, but also bigger/faster/more expensive processors.

Replication will also become more efficient – see EMC’s recent acquisition of Kashya (now RecoverPoint – one of its functions is dedup during replication from array to array, how long do you think it will take them to move this functionality to the array processors?)

Just some random thoughts…

D

3 Replies to “I wonder when dedup will make it to the arrays”

  1. You are asking two questions:

    1: Are backups/WAN accelerators the ultimate endpoint for de-dupe?

    2: When will de-dupe be put into the array processors?

    For #1, backup and WAN accelerators are vastly superior to other applications because of their data profile. De-duplication is useful when you are seeing the same data over and over again. When you do your weekly full backup, you’re going to get high duplication levels. Other applications — with structured data or not — do not have such high levels of repetition, so they will only get limited benefits out of de-dupe to the extent that the pain of a new system and the ROI just isn’t there.

    For #2, there are already disk systems that include integrated de-dupe. Take a look at the NAS approach of Data Domain or NEC’s HydraStor — they both have de-dupe built in. And the Hifn Express DR 250/255 card will do de-dupe hashing in hardware to speed it up. File-system approaches make a lot more sense than doing block-based storage for the reasons in #1 — the data profile for block-based applications just don’t fit: With the typical backup currently starting with disk-to-disk, there’s no killer need for block de-dupe.

Leave a comment for posterity...