How to decipher EMC’s new VNX pre-announcement and look behind the marketing.

It was with interest that I watched some of EMC’s announcements during EMC World. Partly due to competitor awareness, and partly due to being an irrepressible nerd, hoping for something really cool.

BTW: Thanks to Mark Kulacz for assisting with the proof points. Mark, as much as it pains me to admit so, is quite possibly an even bigger nerd than I am.

So… EMC did deliver something. A demo of the possible successor to VNX (VNX2?), unavailable as of this writing (indeed, a lot of fuss was made about it being lab only etc).

One of the things they showed was increased performance vs their current top-of-the-line VNX7500.

The aim of this article is to prove that the increases are not proportionally as much as EMC claims they are, and/or they’re not so much because of software, and, moreover, that some planned obsolescence might be coming the way of the VNX for no good reason. Aside from making EMC more money, that is.

A lot of hoopla was made about software being the key driver behind all the performance increases, and how they are now able to use all CPU cores, whereas in the past they couldn’t. Software this, software that. It was the theme of the party.

OK – I’ll buy that. Multi-core enhancements are a common thing in IT-land. Parallelization is key.

So, they showed this interesting chart (hopefully they won’t mind me posting this – it was snagged from their public video):

MCX core util arrow

I added the arrows for clarification.

Notice that the chart above left shows the current VNX using, according to EMCmaybe a total of 2.5 out of the 6 cores if you stack everything up (for instance, Core 0 is maxed out, Core 1 is 50% busy, Cores 2-4 do little, Core 5 does almost nothing). This is important and we’ll come back to it. But, currently, if true, this shows extremely poor multi-core utilization. Seems like there is a dedication of processes to cores – Core 0 does RAID only, for example. Maybe a way to lower context switches?

Then they mentioned how the new box has 16 cores per controller (the current VNX7500 has 6 cores per controller).

OK, great so far.

Then they mentioned how, By The Holy Power Of Software,  they can now utilize all cores on the upcoming 16-core box equally (chart above, right).

Then, comes the interesting part. They did an IOmeter test for the new box only.

They mentioned how the current VNX 7500 would max out at 170,000 8K random reads from SSD (this in itself a nice nugget when dealing with EMC reps claiming insane VNX7500 IOPS). And that the current model’s relative lack of performance is due to the fact its software can’t take advantage of all the cores.

Then they showed the experimental box doing over 5x that I/O. Which is impressive, indeed, even though that’s hardly a realistic way to prove performance, but I accept the fact they were trying to show how much more read-only speed they could get out of extra cores, plus it’s a cooler marketing number.

Writes are a whole separate wrinkle for arrays, of course. Then there are all the other ways VNX performance goes down dramatically.

However, all this leaves us with a few big questions:

  1. If this is really all about just optimized software for the VNX, will it also be available for the VNX7500?
  2. Why not show the new software on the VNX7500 as well? After all, it would probably increase performance by over 2x, since it would now be able to use all the cores equally. Of course, that would not make for good marketing. But if with just a software upgrade a VNX7500 could go 2x faster, wouldn’t that decisively prove EMC’s “software is king” story? Why pass up the opportunity to show this?
  3. So, if, with the new software the VNX7500 could do, say, 400,000 read IOPS in that same test, the difference between new and old isn’t as dramatic as EMC claims… right? :)
  4. But, if core utilization on the VNX7500 is not as bad as EMC claims in the chart (why even bother with the extra 2 cores on a VNX7500 vs a VNX5700 if that were the case), then the new speed improvements are mostly due to just a lot of extra hardware. Which, again, goes against the “software” theme!
  5. Why do EMC customers also need XtremeIO if the new VNX is that fast? What about VMAX? :)

Point #4 above is important. For instance, EMC has been touting multi-core enhancements for years now. The current VNX FLARE release has 50% better core efficiency than the one before, supposedly. And, before that, in 2008, multi-core was advertised as getting 2x the performance vs the software before that. However, the chart above shows extremely poor core efficiency. So which is it? 

Or is it maybe that the box demonstrated is getting most of its speed increase not so much by the magic of better software, but mostly by vastly faster hardware – the fastest Intel CPUs (more clockspeed, not just more cores, plus more efficient instruction processing), latest chipset, faster memory, faster SSDs, faster buses, etc etc. A potential 3-5x faster box by hardware alone.

It doesn’t quite add up as being a software “win” here.

However – I (or at least current VNX customers) probably care more about #1. Since it’s all about the software after all:)

If the new software helps so much, will they make it available for the existing VNX? Seems like any of the current boxes would benefit since many of their cores are doing nothing according to EMC. A free performance upgrade!

However… If they don’t make it available, then the only rational explanation is that they want to force people into the new hardware – yet another forklift upgrade (CX->VNX->”new box”).

Or maybe that there’s some very specific hardware that makes the new performance levels possible. Which, as mentioned before, kinda destroys the “software magic” story.

If it’s all about “Software Defined Storage”, why is the software so locked to the hardware?

All I know is that I have an ancient NetApp FAS3070 in the lab. The box was released ages ago (2006 vintage), and yet it’s running the most current GA ONTAP code. That’s going back 3-4 generations of boxes, and it launched with software that was very, very different to what’s available today. Sometimes I think we spoil our customers.

Can a CX3-80 (the beefiest of the CX3 line, similar vintage to the NetApp FAS3070) take the latest code shown at EMC World? Can it even take the code currently GA for VNX? Can it even take the code available for CX4? Can a CX4-960 (again, the beefiest CX4 model) take the latest code for the shipping VNX? I could keep going. But all this paints a rather depressing picture of being able to stretch EMC hardware investments.

But dealing with hardware obsolescence is a very cool story for another day.



Technorati Tags: , , , , , , ,

9 thoughts on “How to decipher EMC’s new VNX pre-announcement and look behind the marketing.”

  1. Denis here. And Yes, I work for EMC and did the slide you posted. Thanks for helping spreading the word. Core efficiency is a BIG DEAL in a FLASH world!

    I am glad we agree that Parallelization is key. That is exactly what the demo showed. What you did seem not catch on to here is the NEW core efficiency needed for FLASH. A small group of SSDs can easily swamp a single CPU core. And to remain relevant in the array business, we need to scale to large numbers of SSDs. In fact, EMC has NO LIMITS beyond the drive slots on our platforms as to how many SSDs a customer can deploy on a given array model.

    The very point of the demo was that when you drive say 96 modern SSDs, the pressure on the storage processors are dramatically increased. Which in turn exposes older software stacks that was written for HDDs. Older architectures were static multi-core. MCX(tm) is DYNAMIC multi-core. And as demonstrated takes full advantage of 32 cores.

    The EMC demo showed 1M 8K read-through IOPS across 32 cores of CPU. That equates to 31,250 8K IOPS per core.

    Netapp has 24 cores in the new FAS6290. So what is your 8K read through IOPS from 96 SSDs? How many IOPS per core? What is NetApp’s core efficiency?

    1. Hi Denis, thanks for posting.

      Actually, I think you’re missing the entire point of the article.

      It’s not at all attacking the fact the fast lab box can do 1 million 8K read IOPS.

      Indeed, I was complimentary of that.

      I also won’t comment on our own lab gear. We don’t take the same stance as EMC regarding showing lab stuff (whether I agree with our practice or not is a different matter, there’s something to be said for showmanship and I applaud EMC’s).

      By EMC’s own admission during the demo, the VNX7500 does about 170,000 8K reads, that’s the shipping box and that’s what we’ll compare to.

      And, of course, we’re not limited to 2 controllers for block protocols, but rather 8 and 64TB cache… :)


  2. “Older architectures were static multi-core. MCX(tm) is DYNAMIC multi-core. And as demonstrated takes full advantage of 32 cores …”

    Was that 32 cores or 16 cores in 2SPs ? Within the context of the claims of support for masively multi-core support that’s a big difference, and I smell a lot of “markitechture” around this supposed technical master-stroke. So when you say “DYNAMIC” (I note your use of capitals for emphasis) are you talking about something like this ?

    If so, then you can colour me impressed, and I’d have to assume EMC is doing something like actually writing brand new microprocessor architectures for Intel. If not then I suspect that what you’re really talking about is a hardly revolutionary. Maybe an actual SMP scheduler within the Clarrion code base ? While that is cool and dandy, that is hardly the
    kind of thing that classifies as “Dynamic Multicore Architecture” in my book. It’s more like an admission of past failure. That would make sense to me because based on that flat random read workload the smallest NetApp array controller (FAS2220) would come in around the same performance as the SP’s in a current midrange VNX5100 (if it were writes, the 2220 would smoke the 5100 .. VNX SPs make really poor use of their CPU resources which is why the X-Blades need to be overpowered)

    What I think you’re saying is that in the past EMC’s architecture statically tied specific tasks (e.g RAID parity calculations) to specific cores (see in order to minimize the overheads of context switching. In other words, you didn’t have an SMP style architecture. So now you’ve moved to something better that takes advantage of multi-core processors better .. maybe something like this

    This is indeed a dynamic scheduler, maybe your scavenged that from the wreck of XtemeIO (where is that product ?), so maybe this is the kind of thing EMC is now claiming as a revolutionary “DYNAMIC” architecture, and trying to shoe-horn that into the VNX in a big hurry… It will be interesting to see what impact that has on the rest of the Clariion architecture, assuming this is actually going to be part of

    Before you try and brand NetApp with the same kind of “Older architectures” label that seem to apply to the Clariion part of VNX, then you should know that ONTAP hasn’t had a completely static mult-icore architecture since I joined, though I will say that like Clariion it did have one many years ago. The trend to large scale multi-core architectures was clear to NetApp before I joined, so I’m pretty confident in saying that NetApp been safely and gradually introducing the necessary changes to ONTAP for well over 6 years.

    Over that time the SMP scheduler built into ONTAP has been gradually improved over a series of releases. Major enhancements to the core (excuse the pun) architecture which is built for the massively multi-core future were introduced in ONTAP 8.0. This was further exploited in 8.1, and has been leveraged really nicely in 8.2. From my perspective I’m pretty happy with our progress through to a massively multi-core architecture in a single controller, (I’ve got some really nice graphs of customer loads on the latest versions of Clustered Data ONTAP driving even utilization over all the cores on a 6280 today) but if you want to leverage the power of over 50+ cores today, just implement Clustered Data ONTAP and that will give you more usable IOPS than most people are likely to need with plenty of room for expansion. GIving customers choice in letting them scale up and scale out is something NetApp thinks is valuable, though it seems EMC is still focused on asking their customers to do one or the other.

    On a final note, I’m kind of curious,

    1. what processor speed was EMC running those 32 cores at ?
    2. Why didn’t EMC include a 50:50 read/write mix, or even something vaguely realistic.
    3. By the looks of things EMC only stressed one part of the array code (read from “disk” assuming the majority of the reads werent from DRAM via fastpath code), what about the rest of it, (write, raid, mirroring, snapshots, reconstructs, disk scrubs, auto-tiering, check-sum operations, TCP/IP, and iSCSI protocol handling ? Does that scale just as nicely ?
    4. how do the context switches between the various array functions in a real world workload impact EMC’s 1 meeeeeelion IOPS number ?
    5 . How much of all of the above is still a “work in progress” ?
    6. Will any of EMC’s current VNX arrays be able to take advantage of this DYNAMIC improvement, or will this require yet another forklift upgrade ?

    All in all, it seems like this is EMC leaving the Assymetric Multi Processing world of the 1970’s behind and finally got around to showing a modular block based array controller with some SMP architecture, tissied up as “all new DYNAMIC yada yada” .. From an innovation perspective all I see is marketecture covering some hurried development.

  3. The EMC software… Where’s my Block Dedup? Next release….. of HARDWARE. ohh Thanks for leaving that little point out.

    It’s amazing ow much stuff is “left out”

  4. Wait till next week and you will get the answers in detail what is coming. And Jason you will definitely get an answer next week ;-)

Leave a comment for posterity...