Due to the craziness in the previous blog, I decided to post an actual graph showing a NetApp system I/O latency while under load and a disk rebuild. It was a bakeoff vs another large storage vendor (which NetApp won).
The test was done at a large media company with over 70,000 Exchange seats. It was with no more than 84 drives, so we’re not talking about some gigantic lab queen system (I love Marc Farley’s term). The box was set up per best practices, with aggregate size being 28 disks in this case.
(Edited at the request of EMC’s CTO to include the performance tidbit): Over 4K IOPS were hitting each aggregate (much more than the customer needed) and the system had quite a lot of steam left in it.
There were several Exchange clusters hitting the box in parallel.
All of the testing for both vendors was conducted by Microsoft personnel for the customer. The volume names have been removed from the graph to protect the identity of the customer:
Under a 53:47 read/write ratio 8K-size IOPS, a single disk was pulled.Â Pretty realistic failure scenario, a disk breaks while the system is under production-level load. Plenty of writes, too, almost 50%.
Ok… The fuzzy line around 6ms is the read latency. At point 1 a disk was pulled and at point 2 the rebuild completed. Read latency increased to 8ms during the rebuild, but dropped back down to 5 after the rebuild completed. The line at less than 1 ms response time straight across the bottom is the write latency. Yes it’s that good.
So – there was a tiny bit of performance degradation for the reads but I wouldn’t say that it “killed” performance as a competitor alleged.
The rebuild time is a tad faster than 30 hours as well (look at the graph 🙂 ) but then again the box used faster, 15K drives (and smaller, 300GB vs 500GB), so before anyone complains, it’s not apples-to-apples compared to the Demartek report.
I just wanted to illustrate a real example from a real test at a real customer using a real application, and show the real effects of drive failures in a properly-implemented RAID-DP system.
The FUD-busting will continue, stay tuned…