When Terrified Vendors Attack: The Dell Edition

It recently came to my attention that Dell is now advertising some kind of benchmark that shows one of their platforms can be faster than Nimble in some very specific test of their own concoction.

While I don’t doubt that’s possible (indeed, we could do it the other way around), it may be worthwhile investigating what’s prompting the attack.

I also wanted to point out the various technically fishy points of the benchmark.

Unclear Whether Best Practices Were Followed

There’s a fairly detailed writeup (that I won’t link here, I gave them enough publicity as it is). What is not clear is whether Nimble best practices were followed. We can safely assume EMC best practices were followed…

For instance, was the latest NCM installed? Nimble would have the customer install Nimble Connection Manager (NCM) on each host, to automatically take care of host-side best practices (for existing Nimble customers, check out the new NCM for Linux, it includes the awesome nimbletune utility).

Unclear How Much Headroom Was Left For Dell

This is extremely important: Nimble systems operate with always 100% headroom. This means that even if a system is 100% busy, if a controller and a drive are lost, performance will remain the same as before! This is an extremely important step to reduce customer business risk. See more here.

With Dell, to get maximum performance, both controllers would have to be busy. Which means that if one controller is lost, performance would not only plummet, but failover might actually be hard. Not safe at all.

It was unclear how much headroom the Dell system had left.

Unclear Whether Compression Was Used for Dell

Nimble systems typically get a performance boost with compression and compressible data (in addition to very nice space savings). Even with incompressible data, the performance hit is negligible (the entire architecture was designed from the ground up with data reduction in mind).

However, a lot of competitor systems suffer a rather dramatic performance degradation when using compression (this has already been published for EMC’s Unity – compression destroys its performance).

This means that several vendors may choose to test without compression in order to show their best performance. But in the age of modern AFAs, compression is something taken for granted, especially for DB workloads! The savings are too important to ignore (often better than 3:1 just with compression alone).

So the question is: Did Dell use compression? It was unclear.

Unclear What RAID Type Was Used for Dell

All Nimble AFA systems use Triple+ RAID (ANY 3 drives can be pulled/failed simultaneously, AND even if there are consecutive sector read errors from the remaining drives while rebuilding from such a scenario, there will still be no data corruption). We are talking extreme data protection.

This is hundreds of thousands of times more resilient than most other RAID schemes. It’s easy to do this math on your own: Find an MTTDL (Mean Time To Data Loss) calculator (one that also shows RAID Z3) and compare. Then multiply the time RAID Z3 gets times five (the “+” in Triple+ makes it 5x more resilient than normal triple parity RAID). Then divide that time by the time RAID 5, 10 or 6 get. FYI the number of drives per Nimble RAID group is 24, please make sure you put that in the calculator.

Why this is important: Most vendors need the capacity-wasteful (yet far less reliable) RAID 10 to get high performance.

If the Dell system was configured with a protection mechanism which is not as effective as Triple+, most likely RAID10, what would the performance be if the customer insisted on better resiliency and space efficiency?

Again, it’s unclear.

Unclear Whether Data Progression Was Used For Dell

Typical deployments of Dell SC will move ingest data from RAID10 to, potentially, other RAID types, to avoid short-term performance problems. This is called Data Progression. They even need 2 sets of SSDs to do all this, and this needs to be sized appropriately (write-intensive and read-intensive drive sets).

Did the test run long enough for Data Progression to even kick in? Unclear. If the benchmark data stayed in the RAID10 “staging area”, yet the rest of the array was set up as, say, RAID6, this would mean the test didn’t run long enough to force this to happen for any I/O to even hit RAID6.

Again, unclear. But I suspect it was all RAID 10 without Data Progression since only the “read intensive” SSDs were used in the test.

The Benchmarked Dell Platform Is Not Listed in Gartner’s Critical Capabilities Report

This point isn’t technical as such but it’s a show of fitness for the purpose.

Gartner has an interesting class of reports called “Critical Capabilities”. In those reports, instead of comparing entire companies in general (as in the Magic Quadrant), specific product families are compared for specific workloads. Which makes sense since that’s how people buy stuff.

The report for All Flash Arrays is here.

What you’ll notice is that the benchmarked Dell platform isn’t even listed in that report. HPE platforms on the other hand are always within the top 3 for any application category.

Perhaps they’re trying to drum up recognition for a product not widely recognized as an AFA?

Bottom Line

Given the sheer amount of information in that report, why would anyone omit commonly listed things like RAID levels and compression? It did state deduplication was turned off for Nimble but no mention of the other items, especially as they pertain to the Dell system.

How would the Dell system perform if the test was run in a sustained fashion the way people expect to run modern AFAs, with plenty of controller headroom plus dedupe and compression enabled, and with RAID6? (They can’t do Triple+ or even plain Triple so RAID6 is the closest, even if it’s way less reliable – seriously, do the MTTDL math).

Sometimes, omission is as revealing as admission. At best, this is sloppy and unprofessional. At worst, intentionally misleading and unethical. I like this variation of Hanlon’s Razor:

Never attribute to malice that which is adequately explained by stupidity, but don’t rule out malice.

Dell is worried about HPE, and for good reason. Not only is HPE a “one stop shop” for enterprise needs, but we have incredibly innovative solutions like Nimble with InfoSight AI (which we are already extending to other HPE platforms). This allows us to lower business risk in ways Dell cannot.

Throwing Down The Gauntlet

If Dell wants to compare their platforms with Nimble, we are happy to oblige. We may even have some really cool testing suggestions… 😉

Gauntlet

Reach out if your inner pugilist (and your legal department) dares.

D

Leave a comment for posterity...

This site uses Akismet to reduce spam. Learn how your comment data is processed.