8 responses

  1. Man of Few Words
    February 26, 2013

    If the data is slightly compressible, does that typically lead to faster or slower performance in an array that does real-time compression?

    (Disclosure–I work for a large storage vendor.)

    • Dimitris
      February 26, 2013

      Mr. Man of few words,

      Comment etiquette suggests you should actually state which vendor you work for :) Anyway…

      What does “slightly” compressible mean in this case? 5%

      I’ll theorize here that if the data is very slightly compressible, then a clever array will figure out it’s not worth spending the CPU cycles to compress it and just bypass compression.

      A not-so-clever array may rely on beefy CPUs to try to compress anyway, with the end result depending on the CPU and the efficiency of the compression algorithm. It could increase latencies a bit. Maybe not. Too many variables. Something customers can test I guess when doing their benchmarks (pick something that zip will only compress by 5-10% and use that as the test data, for example).

      Ultimately, the article was written for the cases where vendors do testing with all zeros in order to show crazy high benchmark results.

      D

  2. compression easy?
    February 27, 2013

    A lot of the newer storage vendors are now touting real-time compression for all data (often used instead of true deduplication – it’s just easier to implement compression).

    If it’s so easy to implement why nettap can’t do it? I mean today you have compression algorithm too but for example it doesn’t work with your pam card.

    Plus if you put 2 xeon E5 on an array you can spent cpu cycle for compression even if you use gzip and not lzjb

    And why are you saying true deduplication until now you still doesn’t make inline deduplication (the process is just made when the array have time so it can be often like never). Plus at the begining of deduplication you use a so little hash than your deduplication ratio were bad i admit with version 8+ it’s not true anymore.

    • Dimitris
      February 27, 2013

      Please focus on the point of the article, which is based on scientific fact. It’s not a critique of different architectures. It’s about how to benchmark accurately.

      Since I don’t want to ignore your comment… Everything has tradeoffs. Inline compression is easy and we do have it. That doesn’t mean compression in general has no tradeoffs.

      As an example… The basic design principle of NetApp FAS storage is that all boxes work the same, from smallest to largest.

      The largest boxes have enough CPU and memory to do as you suggest.

      The smallest ones don’t. If we put the same CPUs and RAM in the smallest and largest boxes, then we just end up with the largest boxes :)

      Same goes for deduplication. Doing it inline for large datasets without affecting latencies long-term is untenable – as has been proven to be the case with certain vendors that do inline dedupe and claim it as a superior feature.

      Typically, NetApp customers will use deduplication for most workloads, compression + dedupe for others, and no space efficiency techniques for certain data like videos, where trying to reduce size further is impossible without re-coding. Then those same customers will combine snapshots and clones with data reduction techniques to get even more efficiency.

      Could you gain better compression than NetApp by going to certain competitors? Certainly. Can you get an overall better storage system from those same vendors for all the use cares NetApp is good for? Not really.

      Think of a dragster. It is really efficient at going very fast in a straight line. Faster than pretty much any normal car. Can you drive it in all the conditions where you would drive an AWD Audi? Without killing yourself or others? And actually arriving somewhere on time? :)

      D

      • Compression easy?
        February 27, 2013

        I’m ok with the point that you can cheat when you make a benchmark and that’s why is really difficult to make a configuration for the client because finally everything will depend on the workload type.

        But now we are in 2013 so deduplication inline or not or compression are part of the game.

        Yeah it’s really to maintain good performance with inline deduplication but some real solution start to appear by using algorithm lighter than sha256 and with the new intel platform who can support a lot of ram.

        About the thing all nettap can do the same … just check the limitation of flash pool on a fas 2240 for example plus i don’t see the point between the cpu performance of the array and the fact than your pam card doesn’t work with deduplication.

        After that i don’t say netapp are bad or something like that, for many things particularly integration with your snapxxx software you are really good

  3. PaulL
    March 2, 2013

    I agree that you need to take care when benchmarking with compression, and certainly testing with all zeros is unrealistic. Equally, testing with random data is perhaps unrealistic, as random data will likely not compress at all. Most real data is a mix of compressible and non-compressible. If you’re hosting a wiki, or a database that consists largely of textual data (no blobs) then it would probably compress OK. So I’m good with your suggestion to use real data from your real site, less good with your suggestion to use random data.

    I’d also note that many of the products that people use on top of the storage layer also offer compression, ranging from btrfs doing compression in the file system through to Oracle doing compression in the database. If you already have compression in your application layer somewhere, then clearly a storage subsystem with built-in compression is going to add little incremental value – so make sure you know whether your application is already compressing before you waste your money.

  4. LeeC
    July 24, 2013

    I’ve recently benchmarked an all-SSD array that does inline dedupe and compression.

    4k writes, 100% random, 16 QD.

    Iometer with “Repeating Bytes” data : 93K IOPs
    Iometer with “Full Random” data : 45K IOPs

    this is the difference between very compressible and incompressible data,

    perhaps its as simple as taking an average of the two for “real world”……

  5. Jeff R
    November 8, 2013

    I have been a consumer of several SAN vendors now ranging from NetApp to Xio to Equallogic and several others.
    I have first hand experience with the company I believe this article is directed toward. I have seen contradictory performance, that I have tested personally, not listening to any one storage vendor, but based on the workloads I feel are most important in our environment..

    We have been using this competitor to NetApp for a couple years now and I will say bang for the buck and ease of usability as well as their support structure trumps the NetApp offerings hands down.

    Since this is a NetApp oriented website I will not disclose the vendor who I believe the article is directed toward out of respect for Dimitris.

    However, I suggest you run your own tests of the workloads that are important to you on demo units if available. Ultimately my suggestion is don’t fall into the trap of believing there’s only one vendor out there for everyone, if that were the case that vendor would have a monopoly.

    I just want to make sure everyone gets what they need.

Leave a Reply

 

 

 

Back to top
mobile desktop