EMC’s VNX2 forklift: The importance of hardware re-use, slowing down obsolescence, and maximizing your investment

It was with interest that I watched the launch of EMC’s VNX refresh. The updated boxes got some long-awaited features, EMC talked a lot about how some pretty severe single-threaded bottlenecks were removed, more CPU and memory was put in, and there was much rejoicing.

Not really trying to pick on the new boxes (that will be in a future post, relax :)), but what I thought was interesting was that the code that makes most of the new features a possibility cannot be loaded on current-gen VNX boxes (not even the biggest, the 7500, which has plenty of CPU and RAM juice even compared to the next gen boxes).

Software-defined storage indeed.

The existing VNX disk shelves also seemingly can’t be re-used at the moment (correct me if I’m wrong please).

This forced obsolescence has been a theme with EMC: Clariion -> VNX -> VNX2 are all complete forklift upgrades. When the original VNX was released, it was utterly incompatible with the CX (Clariion) shelves (SAS replaced FC connectivity). Despite using the exact same code (FLARE).

Other vendors are guilty of this too – a new controller is released, and all prior investments on disk shelves are rendered useless (HDS did this with several iterations of AMS, maybe even with AMS -> HUS, HP with EVA…)

I understand that as technology progresses we sometimes have to abandon the old to make room for the new but, at the same time, customers make significant investments in N-1 technology – and often want to be able to re-use some of their investment with N (and sometimes N+1).

I just had this conversation with a customer, and he said “well, I throw away my gear every 3 years, why should I care?

Let’s try a thought experiment.

Imagine you just bought a system that’s running the fastest controllers a company sells today, and you got 1PB of storage behind it.

Now, imagine that a mere month after you purchased your controllers, new ones are released that are significantly faster. OK, that stuff happens.

Your gear is not 3 years old yet. It’s 1 month old. It’s running OK.

Now, imagine that your array runs out of steam 6 months later due to unprecedented performance growth. Your system is now 7 months old. You can’t just throw it away and start fresh.

You could buy a new storage system and migrate some of the data to share the load. However, you don’t need more space – you just ran out of controller headroom. Indeed, you still have tons of free space.

But what if you could replace the controllers with the new, beefier ones? And maintain your investment in the 1PB of storage, cache etc? Wouldn’t that be nice?

Or at least be able to move some of the storage pools you may have to the new family of controllers? Even if you had to reformat the disk?

Well – most vendors will tell you “sorry, no, you need to migrate your data to the new box”.

Let’s try another thought experiment.

You bought a storage system a year ago. It performs fine, but it lacks true deduplication capabilities. You have determined it would save you a lot of storage space (= money) if your array had deduplication.

The vendor you purchased the system from announces the refreshed storage OS that finally includes deduplication. And that same vendor made a truly gigantic fuss about software-defined storage, which made everyone feel software would be the big enabler and that coolness was a mere firmware upgrade away.

However, you are eventually told they will not allow the code that enables deduplication to be loaded to your array, and, instead, they ask you to migrate to the refreshed array that supports deduplication. Since the updated code somehow only runs on the new box. Something to do with unicorn milk.

But your array has plenty of CPU headroom to handle deduplication… and you could reformat the disks given some swing storage shelves if the underlying disk format is the issue. But the option is not provided.

How NetApp does things instead

At NetApp we sort of take it for granted so we don’t make a big fuss about software-defined storage, but hardware was always considered an enabler for the software and not the other way around. 

For instance: deduplication was released as a free software upgrade for Data ONTAP (the OS for our main line of storage). Back in 2007. For all storage protocols.

In general, we try to let systems be able to load at a minimum N+1 software releases, but most of the time we utterly spoil customers and go far above and beyond, unless we’re talking about the smallest boxes, which naturally have less headroom.

For example, the now aging FAS 3070 I have in the local lab (the bigger of the older midrange boxes, released in 2006) supports anything from ONTAP 7.2.1 (what it was released with) to ONTAP 8.1.3 (released in mid-2013).

This spans multiple major ONTAP releases – huge changes in the code have happened during those releases: 7.3, 8.0, 8.1… Multiple newer arrays were also released as replacements for the 3070: 3170, 3270, 3250.

Yet the 3070 soldiers on with a fully supported, modern OS, 7 years later.

What arrays did our competitors have back then? What is the most modern OS those same arrays can run today? What is that OS missing vs the OS that competitor’s more modern arrays have?

Let’s talk disk shelves.

We used FC loop connectivity for the older shelves (DS14). We then switched to fancy multi-channel SAS and totally different shelves and disks, but never stopped supporting the older shelf technology, even with newer controllers.

That’s the big one. I have customers with DS14 shelves that they purchased for a 3070 that they now have on a 3270 running 8.2. It all works, all supported. Other vendors cut support off after the transition from FC to SAS.

Will we support those older shelves forever? No, that’s impossible, but at least we give our customers a lot of leeway and let them stretch their hardware investments significantly longer than any other major storage vendor I can think of.

Think long term

I encourage customers to always think long term. Try to stop thinking in 3-year increments. Start thinking of other ways you can stretch your investment, other ways to deploy older gear while still keeping it interoperable with newer hardware.

And start thinking about what will happen to your investment once newer gear is released.


Technorati Tags: , , ,

How to decipher EMC’s new VNX pre-announcement and look behind the marketing.

It was with interest that I watched some of EMC’s announcements during EMC World. Partly due to competitor awareness, and partly due to being an irrepressible nerd, hoping for something really cool.

BTW: Thanks to Mark Kulacz for assisting with the proof points. Mark, as much as it pains me to admit so, is quite possibly an even bigger nerd than I am.

So… EMC did deliver something. A demo of the possible successor to VNX (VNX2?), unavailable as of this writing (indeed, a lot of fuss was made about it being lab only etc).

One of the things they showed was increased performance vs their current top-of-the-line VNX7500.

The aim of this article is to prove that the increases are not proportionally as much as EMC claims they are, and/or they’re not so much because of software, and, moreover, that some planned obsolescence might be coming the way of the VNX for no good reason. Aside from making EMC more money, that is.

A lot of hoopla was made about software being the key driver behind all the performance increases, and how they are now able to use all CPU cores, whereas in the past they couldn’t. Software this, software that. It was the theme of the party.

OK – I’ll buy that. Multi-core enhancements are a common thing in IT-land. Parallelization is key.

So, they showed this interesting chart (hopefully they won’t mind me posting this – it was snagged from their public video):

MCX core util arrow

I added the arrows for clarification.

Notice that the chart above left shows the current VNX using, according to EMCmaybe a total of 2.5 out of the 6 cores if you stack everything up (for instance, Core 0 is maxed out, Core 1 is 50% busy, Cores 2-4 do little, Core 5 does almost nothing). This is important and we’ll come back to it. But, currently, if true, this shows extremely poor multi-core utilization. Seems like there is a dedication of processes to cores – Core 0 does RAID only, for example. Maybe a way to lower context switches?

Then they mentioned how the new box has 16 cores per controller (the current VNX7500 has 6 cores per controller).

OK, great so far.

Then they mentioned how, By The Holy Power Of Software,  they can now utilize all cores on the upcoming 16-core box equally (chart above, right).

Then, comes the interesting part. They did an IOmeter test for the new box only.

They mentioned how the current VNX 7500 would max out at 170,000 8K random reads from SSD (this in itself a nice nugget when dealing with EMC reps claiming insane VNX7500 IOPS). And that the current model’s relative lack of performance is due to the fact its software can’t take advantage of all the cores.

Then they showed the experimental box doing over 5x that I/O. Which is impressive, indeed, even though that’s hardly a realistic way to prove performance, but I accept the fact they were trying to show how much more read-only speed they could get out of extra cores, plus it’s a cooler marketing number.

Writes are a whole separate wrinkle for arrays, of course. Then there are all the other ways VNX performance goes down dramatically.

However, all this leaves us with a few big questions:

  1. If this is really all about just optimized software for the VNX, will it also be available for the VNX7500?
  2. Why not show the new software on the VNX7500 as well? After all, it would probably increase performance by over 2x, since it would now be able to use all the cores equally. Of course, that would not make for good marketing. But if with just a software upgrade a VNX7500 could go 2x faster, wouldn’t that decisively prove EMC’s “software is king” story? Why pass up the opportunity to show this?
  3. So, if, with the new software the VNX7500 could do, say, 400,000 read IOPS in that same test, the difference between new and old isn’t as dramatic as EMC claims… right? 🙂
  4. But, if core utilization on the VNX7500 is not as bad as EMC claims in the chart (why even bother with the extra 2 cores on a VNX7500 vs a VNX5700 if that were the case), then the new speed improvements are mostly due to just a lot of extra hardware. Which, again, goes against the “software” theme!
  5. Why do EMC customers also need XtremeIO if the new VNX is that fast? What about VMAX? 🙂

Point #4 above is important. For instance, EMC has been touting multi-core enhancements for years now. The current VNX FLARE release has 50% better core efficiency than the one before, supposedly. And, before that, in 2008, multi-core was advertised as getting 2x the performance vs the software before that. However, the chart above shows extremely poor core efficiency. So which is it? 

Or is it maybe that the box demonstrated is getting most of its speed increase not so much by the magic of better software, but mostly by vastly faster hardware – the fastest Intel CPUs (more clockspeed, not just more cores, plus more efficient instruction processing), latest chipset, faster memory, faster SSDs, faster buses, etc etc. A potential 3-5x faster box by hardware alone.

It doesn’t quite add up as being a software “win” here.

However – I (or at least current VNX customers) probably care more about #1. Since it’s all about the software after all… 🙂

If the new software helps so much, will they make it available for the existing VNX? Seems like any of the current boxes would benefit since many of their cores are doing nothing according to EMC. A free performance upgrade!

However… If they don’t make it available, then the only rational explanation is that they want to force people into the new hardware – yet another forklift upgrade (CX->VNX->”new box”).

Or maybe that there’s some very specific hardware that makes the new performance levels possible. Which, as mentioned before, kinda destroys the “software magic” story.

If it’s all about “Software Defined Storage”, why is the software so locked to the hardware?

All I know is that I have an ancient NetApp FAS3070 in the lab. The box was released ages ago (2006 vintage), and yet it’s running the most current GA ONTAP code. That’s going back 3-4 generations of boxes, and it launched with software that was very, very different to what’s available today. Sometimes I think we spoil our customers.

Can a CX3-80 (the beefiest of the CX3 line, similar vintage to the NetApp FAS3070) take the latest code shown at EMC World? Can it even take the code currently GA for VNX? Can it even take the code available for CX4? Can a CX4-960 (again, the beefiest CX4 model) take the latest code for the shipping VNX? I could keep going. But all this paints a rather depressing picture of being able to stretch EMC hardware investments.

But dealing with hardware obsolescence is a very cool story for another day.



Technorati Tags: , , , , , , ,