Beware of benchmarking storage that does inline compression

In this post I will examine the effects of benchmarking highly compressible data and why that’s potentially a bad idea.

Compression is not a new storage feature. Of the large storage vendors, at a minimum HPE, NetApp, EMC and IBM can do it (depending on the array). <EDIT (thanks to Matt Davis for reminding me): Some arrays also do zero detection and will not write zeroes to disk – think of it as a specialized form of compression that ONLY works on zeroes>

A lot of storage vendors also have real-time compression for all data (sometimes used instead of true deduplication – it’s just easier to implement compression).

Nothing wrong with real-time compression. However, here’s where I have a problem with some of the sales approaches some vendors follow:

Real-time compression can provide grossly unrealistic benchmark results if the benchmarks used are highly compressible!

Compression can indeed provide a performance benefit for various data types (simply since less data has to be read and written from disk), with the tradeoff being CPU. However, most normal data isn’t composed of all zeroes. Typically, compressing data will provide a decent benefit on average, but usually not several times.

So, what will typically happen is, a vendor will drop off one of their storage appliances and provide the prospect with some instructions on how to benchmark it with your garden variety benchmark apps. Nothing crazy.

Here’s the benchmark problem

A lot of the popular benchmarks just write zeroes. Which of course are extremely easy for compression and zero-detect algorithms to deal with and get amazing efficiency out of, resulting in extremely high benchmark performance.

I wanted to prove this out in an easy way that anyone can replicate with free tools. So I installed Fedora 18 with the btrfs filesystem and ran the bonnie++ benchmark with and without compression. The raw data with mount options etc. is here. An explanation of the various fields here. Not everything is accelerated by btrfs compression in the bonnie++ benchmark, but a few things really are (sequential writes, rewrites and reads):


Notice the gigantic improvement (in write throughput especially) btrfs compression affords with all-zero data.

Now, does anyone think that, in general, the write throughput will be 300MB/s for a decrepit 5400 RPM SATA disk? ย That will be impossible unless the user is constantly writing all-zero data, at which point the bottlenecks lie elsewhere.

Some easy ways for dealing with the compressible benchmark issue

So what can you do in order to ensure you get a more realistic test for your data? Here are some ideas:

  • Always best is to use your own applications and not benchmarks. This is of course more time-consuming and a bigger commitment. If you cant do that, then…
  • Create your own test data using, for example, dd and /dev/random as a source in some sort of Unix/Linux variant. Some instructions here. You can even move that data to use with Windows and IOmeter – just generate the random test data in UNIX-land and move the file(s) to Windows.
  • Another, far more realistic way: Use your own data. In IOmeter, you just copy one of your large DB files to iobw.tst and IOmeter will use your own data to testโ€ฆ Just make sure it’s large enough and doesn’t all fit in array cache. If not large enough, you could probably make it large enough by concatenating multiple data files and random data together.
  • vdbench seems to be a very solid benchmark tool – with tunable compression and dedupe settings.
  • And don’t forget the obvious but often forgotten rule: never test with a data set that fits entirely in RAM!

In all cases though, be aware of how you are testing. There is no magic ๐Ÿ™‚



Technorati Tags: , ,

8 Replies to “Beware of benchmarking storage that does inline compression”

  1. If the data is slightly compressible, does that typically lead to faster or slower performance in an array that does real-time compression?

    (Disclosure–I work for a large storage vendor.)

    1. Mr. Man of few words,

      Comment etiquette suggests you should actually state which vendor you work for ๐Ÿ™‚ Anyway…

      What does “slightly” compressible mean in this case? 5%

      I’ll theorize here that if the data is very slightly compressible, then a clever array will figure out it’s not worth spending the CPU cycles to compress it and just bypass compression.

      A not-so-clever array may rely on beefy CPUs to try to compress anyway, with the end result depending on the CPU and the efficiency of the compression algorithm. It could increase latencies a bit. Maybe not. Too many variables. Something customers can test I guess when doing their benchmarks (pick something that zip will only compress by 5-10% and use that as the test data, for example).

      Ultimately, the article was written for the cases where vendors do testing with all zeros in order to show crazy high benchmark results.


  2. A lot of the newer storage vendors are now touting real-time compression for all data (often used instead of true deduplication โ€“ itโ€™s just easier to implement compression).

    If it’s so easy to implement why nettap can’t do it? I mean today you have compression algorithm too but for example it doesn’t work with your pam card.

    Plus if you put 2 xeon E5 on an array you can spent cpu cycle for compression even if you use gzip and not lzjb

    And why are you saying true deduplication until now you still doesn’t make inline deduplication (the process is just made when the array have time so it can be often like never). Plus at the begining of deduplication you use a so little hash than your deduplication ratio were bad i admit with version 8+ it’s not true anymore.

    1. Please focus on the point of the article, which is based on scientific fact. It’s not a critique of different architectures. It’s about how to benchmark accurately.

      Since I don’t want to ignore your comment… Everything has tradeoffs. Inline compression is easy and we do have it. That doesn’t mean compression in general has no tradeoffs.

      As an example… The basic design principle of NetApp FAS storage is that all boxes work the same, from smallest to largest.

      The largest boxes have enough CPU and memory to do as you suggest.

      The smallest ones don’t. If we put the same CPUs and RAM in the smallest and largest boxes, then we just end up with the largest boxes ๐Ÿ™‚

      Same goes for deduplication. Doing it inline for large datasets without affecting latencies long-term is untenable – as has been proven to be the case with certain vendors that do inline dedupe and claim it as a superior feature.

      Typically, NetApp customers will use deduplication for most workloads, compression + dedupe for others, and no space efficiency techniques for certain data like videos, where trying to reduce size further is impossible without re-coding. Then those same customers will combine snapshots and clones with data reduction techniques to get even more efficiency.

      Could you gain better compression than NetApp by going to certain competitors? Certainly. Can you get an overall better storage system from those same vendors for all the use cares NetApp is good for? Not really.

      Think of a dragster. It is really efficient at going very fast in a straight line. Faster than pretty much any normal car. Can you drive it in all the conditions where you would drive an AWD Audi? Without killing yourself or others? And actually arriving somewhere on time? ๐Ÿ™‚


      1. I’m ok with the point that you can cheat when you make a benchmark and that’s why is really difficult to make a configuration for the client because finally everything will depend on the workload type.

        But now we are in 2013 so deduplication inline or not or compression are part of the game.

        Yeah it’s really to maintain good performance with inline deduplication but some real solution start to appear by using algorithm lighter than sha256 and with the new intel platform who can support a lot of ram.

        About the thing all nettap can do the same … just check the limitation of flash pool on a fas 2240 for example plus i don’t see the point between the cpu performance of the array and the fact than your pam card doesn’t work with deduplication.

        After that i don’t say netapp are bad or something like that, for many things particularly integration with your snapxxx software you are really good

  3. I agree that you need to take care when benchmarking with compression, and certainly testing with all zeros is unrealistic. Equally, testing with random data is perhaps unrealistic, as random data will likely not compress at all. Most real data is a mix of compressible and non-compressible. If you’re hosting a wiki, or a database that consists largely of textual data (no blobs) then it would probably compress OK. So I’m good with your suggestion to use real data from your real site, less good with your suggestion to use random data.

    I’d also note that many of the products that people use on top of the storage layer also offer compression, ranging from btrfs doing compression in the file system through to Oracle doing compression in the database. If you already have compression in your application layer somewhere, then clearly a storage subsystem with built-in compression is going to add little incremental value – so make sure you know whether your application is already compressing before you waste your money.

  4. I’ve recently benchmarked an all-SSD array that does inline dedupe and compression.

    4k writes, 100% random, 16 QD.

    Iometer with “Repeating Bytes” data : 93K IOPs
    Iometer with “Full Random” data : 45K IOPs

    this is the difference between very compressible and incompressible data,

    perhaps its as simple as taking an average of the two for “real world”……

  5. I have been a consumer of several SAN vendors now ranging from NetApp to Xio to Equallogic and several others.
    I have first hand experience with the company I believe this article is directed toward. I have seen contradictory performance, that I have tested personally, not listening to any one storage vendor, but based on the workloads I feel are most important in our environment..

    We have been using this competitor to NetApp for a couple years now and I will say bang for the buck and ease of usability as well as their support structure trumps the NetApp offerings hands down.

    Since this is a NetApp oriented website I will not disclose the vendor who I believe the article is directed toward out of respect for Dimitris.

    However, I suggest you run your own tests of the workloads that are important to you on demo units if available. Ultimately my suggestion is don’t fall into the trap of believing there’s only one vendor out there for everyone, if that were the case that vendor would have a monopoly.

    I just want to make sure everyone gets what they need.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.