Examining value for money regarding the SPEC benchmarks

Some of the comments in my previous post asked about $/IOPS and $/TB.

Since SPEC doesn’t require prices to be listed, I did my own analysis.

The NetApp numbers are simply 4x the existing 6240 result, which is what EMC did with their submission, they used 4x separate VNX systems and aggregated the result.

I used this clarifying analogy over at Nigel’s blog to explain why this makes sense before anyone yells “but this is not published”:

A storage system typically has some kind of bottleneck – cluster interconnect, number of drives, bandwidth to the controller, etc.

When you’re testing a single system, you’re ultimately hitting one of those bottlenecks.

If you’re testing multiple systems independent of each other, they do not share the bottlenecks (since they’re separate), and your performance will scale linearly as you add systems.

For example, if 1 truck can hold 10 tons of stuff, 4 like trucks will hold 40 tons of stuff, 10 trucks 100 tons, etc. There’s no limit.

Once you inject a limiting factor (“the trucks all have to fit on a bridge and the bridge can take this much load and it’s this big”) then you will have a limitation on how many trucks you can load and put on that bridge.

EMC tested 4 separate “trucks”. In that same way, I can add up the result of 4 separate NetApp “trucks”. Here are the results:

EMC NetApp Difference
Cost (approx. USD List) 6,000,000 5,000,000 NetApp is over 16% cheaper in absolute terms
SPEC SFS NFS IOPS 497,623 762,700 NetApp is 53% faster in absolute terms
Average Latency (ORT) 0.96 1.17 EMC offers a mere 18% less latency (with less NFS OPS) despite using only SSDs!
Space (TB) 60 343 NetApp offers 5.7 times more usable space
$/SPEC NFS IOPS 12.06 6.56 Netapp is 45.6% less expensive per SPEC NFS operation
$/TB 100,000 14,577 NetApp is less than 1/6 the price of EMC per TB
RAID RAID5 RAID-DP NetApp is thousands of times more reliable
Boxes needed to accomplish result 15 (4x separate VNX, each with 2 controllers, plus a total of 5x Celerra VG8 heads and 2 Control Stations) 8x unified controllers NetApp is far less complex (the benefit of a truly unified architecture)

Who can spot the better deal? Smile

I added the latency in the chart, thanks to my buddy Mark Twomey for pointing it out.

You see, people needing enterprise NAS with that kind of performance usually need speed, plenty of space and high reliability. Not just one of the three. BTW, here’s a paper on relative RAID reliability.

NetApp provides all three, in spades, plus great value for money, a truly simple, flexible unified system, and efficiency.

Most customers want to see how a real configuration performs. I refer customers to our SPEC and SPC results constantly since quite frequently their desired configuration is very similar.

Which makes benchmarking realistic configurations actually useful – imagine that.

Maybe EMC needs to submit results with VNX the way they sell it to people, for example:

  • A mix of SSD cache, SSD, high-speed SAS and high-capacity SAS
  • Autotiering
  • RAID6
  • A typical amount of space for a configuration that size

Then submit results.

Keep your existing result of course, but also show the people how what you actually sell them really performs.

I still don’t understand why this is such a hard concept.

D

Technorati Tags: ,,,,

EMC conclusively proves that VNX bottlenecks NAS performance

A bit of a controversial title, no?

Allow me to elaborate.

EMC posted a new SPEC SFS result as part of a marketing stunt (which is working, look at what I’m doing – I’m talking about them, if only to clear the air).

In simple terms, EMC got almost 500,000 SPEC SFS NFS IOPS (not to be confused with, say, block-based SPC-1 IOPS) with the following configuration:

  1. Four (4) totally separate VNX arrays, each loaded with SSD storage, utterly unaware of each other (8 total controllers since each box has 2)
  2. Five (5) Celerra VG8 NAS heads/gateways (1 spare), one on top of each VNX box
  3. 2 Control Stations
  4. 8 exported filesystems (2 per VG8 head/VNX system)
  5. Multiple pools of storage (at least 1 per VG8) – not shared among the various boxes, no data mobility between boxes
  6. Only 60TB NAS space with RAID5 (or 15TB per box)

Now, this post is not about whether this configuration is unrealistic and expensive (almost nobody would pay $6m for merely 60TB of NAS, not today). I get it that EMC is trying to publish the best possible number by loading a bunch of separate arrays with SSD. It’s OK as long as everyone understands the details.

My beef has to do with how it’s marketed.

EMC is very vague about the configuration, unless you look at the actual SPEC website. In the marketing materials they just mention VNX, as in “The EMC VNX performed at 497,623 SPECsfs2008_nfs.v3 operations per second”. Kinda like saying it’s OK to take 3 5-year olds and a 6-year old to a bar because their age adds up to 21.

No – the far more accurate statement is “four separate VNXs working independently and utterly unaware of each other did 124,405 SPEC fs2008_nfs.v3 operations per second each“.

All EMC did was add up the result of 4 boxes.

Heck, that’s easy to do!

NetApp already has a result for the 6240 (just 2 controllers doing a respectable 190,675 SPEC NFS ops taking care of NAS and RAID all at once since they’re actually unified, no cornucopia of boxes there) without using Solid State Drives (common SAS drives plus a large cache were used instead – a standard, realistic config we sell every day, and not a “lab queen”).

If all we’re doing is adding up the result of different boxes, simply multiply this by 4 (plus we do have Cluster-Mode for NAS so it would count as a single clustered system with failover etc. among the nodes) and end up with the following result:

  1. 762,700 SPEC SFS NFS operations
  2. 8 exported filesystems
  3. 343TB usable with RAID-DP (thousands of times more resilient than RAID5)

So, which one do you think is the better deal? More speed, 343TB and better protection, or less speed, 60TB and far less protection? 🙂

Customers curious about other systems can do the same multiplication trick for other configs, the sky is the limit!

The other, more serious part, and what prompted me to title the post the way I did, is that EMC’s benchmarking made pretty clear the fact that the VNX is the bottleneck, only able to really support a single VG8 head at top speed, necessitating the need for 4 separate VNX systems to accomplish the final result. So, the fact that a VNX can have up to 8 Celerra heads on top of it means nothing since the back-end is your limiting factor. You might as well stick to a dual-head VG8 config (1 active 1 passive) since that’s all it can comfortably drive (otherwise why benchmark it that way?)

But with only 1 active NAS head you’d be limited to just 256TB max NAS capacity, since that’s how much total space a Celerra head can address as of the time of this writing. Which is probably enough for most people.

I wonder if the NAS heads that can be bought as a package with VNX are slower than VG8 heads, and by how much. You see, most people buying the VNX will be getting the NAS heads that can be packaged with it since it’s cheaper that way. How fast does that go? I’m sure customers would like to know, since that’s what they will typically buy.

I also wonder how fast it would be with RAID6.

Here’s a novel idea: benchmark what customers will actually buy!

So apples-to-apples comparisons can become easier instead of something like this:

Bothapples

For the curious: on the left you see an “Autumn Glory” Malus Floribunda (miniature apple). Photo courtesy of John Fullbright.

D

Technorati Tags: , , , , , , , ,

EMC finally joining the SPC – plus some advice

Just came to my attention that EMC is finally joining the SPC (Storage Performance Council). As I’ve pointed in their past, their absence from this most standard industry benchmark was puzzling, kudos for rectifying this omission.

I do have some advice for EMC (and all other vendors that have already posted results):

  1. Do show results for your various supported RAID types, not just RAID10 – after all, if most of your customers don’t just deploy RAID10, it makes sense to show RAID5 and RAID6, especially if you want to compare results with NetApp RAID-DP (that’s the protection equivalent of RAID6). This will increase your credibility. The argument that you only show RAID10 since that’s the best performing doesn’t hold water – everyone knows the other RAID types will have different performance levels, providing results for everything will at least enable customers to get an idea of how much of a performance hit they’ll have with the different RAID types with a write-intensive workload.
  2. Do show long-running benchmarks with auto-tiering enabled. After all, if you are claiming your auto-tiering implementation doesn’t hurt and can even improve results, this is your chance to show it.
  3. Enable features like snapshots.
  4. Use your large cache if you have one, especially if you keep advertising that it accelerates writes. It will just solidify your claims.

Welcome to the club!

D

 

Technorati Tags: , , , , , , , , , , , ,

Stack Wars: The Clone Wars

It seems that everyone and their granny is trying to create some sort of stack offering these days. Look at all the brouhaha – HP buying 3Par, Dell buying Compellent, all kinds of partnerships being formed left and right. Stacks are hot.

To the uninitiated, a stack is what you can get when a vendor is able to offer multiple products under a single umbrella. For instance, being able to get servers, an OS, a DB, an email system, storage and network switches from a single manufacturer (not a VAR) is an example of a single-sourced stack.

The proponents of stacks maintain that with stacks, customers potentially get simpler service and better integration – a single support number to call, “one throat to choke”, no finger-pointing between vendors. And that’s partially true. A stack potentially provides simpler access to support. On the “better integration” part – read on.

My main problem with stacks is that nobody really offers a complete stack, and those that are more complete than others, don’t necessarily offer best-of-breed products (not even “good enough”), nor do they offer particularly great integration between the products within the stack.

Personal anecdote: a few years ago I had the (mis)fortune of being the primary backup and recovery architect for one of the largest airlines in the world. Said airline had a very close relationship with a certain famous vendor. Said vendor only flew with that airline, always bought business class seats, the airline gave the vendor discounts, the vendor gave the airline discounts, and in general there was a lot of mutual back-scratching going on. So much so that the airline would give that vendor business before looking at anyone else, and only considered alternative vendors if the primary vendor didn’t have anything that even smelled like what the airline was looking for.

All of which resulted in the airline getting a backup system that was designed for small businesses since that’s all that vendor had to offer (and still does).

The problem is, that backup product simply could not scale to what I needed it to. I ended up having to stop file-level logging and could only restore entire directories since the backup database couldn’t handle the load, despite me running multiple instances of the tool for multiple environments. Some of those directories were pretty large, so you can imagine the hilarity that ensued when trying to restore stuff…

The vendor’s crack development team came over from overseas and spent days with me trying to figure out what they needed to change about the product to make it work in my environment (I believe it was the single largest installation they had).

Problem is, they couldn’t deliver soon enough, so, after much pain, the airline moved to a proper enterprise backup system from another vendor, which fixed most problems, given the technology I had to work with at the time.

Had the right decision been made up front, none of that pain would have been experienced. The world would have been one IT anecdote short, but that’s a small price to pay for a working environment. And this is but just one way that single vendor stacks can fail.

How does one decide on a stack?

Let’s examine a high-level view of a few stack offerings. By no means an all-inclusive list.

Microsoft: They offer an OS (catering from servers to phones), a virtualization engine, a DB, a mail system, a backup tool and the most popular office apps in the world, among many other things. Few will argue that all the bits are best-of-breed, despite being hugely popular. Microsoft doesn’t like playing in the hardware space, so they’re a pure software stack. Oh, there’s the XBox, too.

EMC: Various kinds of storage platforms (7-10 depending on how you count), all but one (Symmetrix) coming from acquisition. A virtualization engine (80% owner of VMware – even though it’s run as a totally separate entity). DB, many kinds of backup, document management, security and all other kinds of software. Some bits are very strong, others not so much.

Oracle: They offer an OS, a virtualization engine, a DB, middleware, servers and storage. An office suite. No networking. Oracle is a good example of an incomplete software/hardware stack. Aside from the ultra-strong DB and good OS, few will say their products are all best-of-breed.

Dell: They offer servers desktops, laptops, phones, various flavors of storage, switches. Dell is an example of a pure hardware stack. Not many software aspirations here. Few will claim any of their products are best-of-breed.

HP: They offer servers desktops, laptops, phones, even more flavors of storage, a UNIX OS with its own type of virtualization (can’t run x86), switches, backup software, a big services arm, printers, calculators… All in all, great servers, calculators and printers, not so sure about the rest. Fairly complete stack.

IBM: Servers, 2 strong DBs, at least 3 different OSes, CPUs, many kinds of storage,  email system, middleware, backup software, immense services arm. No x86 virtualization (they do offer virtualization for their other platforms). Very complete stack, albeit without networking.

Cisco: All kinds of networking (including telephony), servers. Limited stack if networking is not your bag, but what it offers can be pretty good.

Apple: Desktops, laptops, phones, tablets, networking gear, software. Great example of a consumer-oriented hardware and software stack. They used to offer storage and servers but they exited that business.

Notice anything common about the various single-vendor stacks? Did you find a stack that can truly satisfy all your IT needs without giving anything up?

The fact of the matter is that none of the above companies, as formidable as they are, offers a complete stack – software and hardware. You can get some of the way there, but it’s next to impossible to single-source everything without shooting yourself in the foot. At a minimum, you’re probably looking at some kind of Microsoft stack + server/storage stack + networking stack – mixing a minimum of 3 different vendors to get what you need without sacrificing much (the above example assumes you either don’t want to virtualize or are OK with Hyper-V).

Most companies have something like this: Microsoft stack + virtualization + DB + server + storage + networking – 6 total stacks.

So why do people keep pushing single-vendor stacks?

Only a few valid reasons (aside from it being fashionable to talk about). One of them is control – the more stuff you have from a company, the tighter their hold on you. The other is that it at least limits the support points you have to deal with, and can potentially get you better pricing (theoretically). For instance, Dell has “given away” many an Equallogic box. Guess what – the cost of that box was blended into everything else you purchased, it’s all a shell game. But if someone does buy a lot of gear from a single vendor, there are indeed ways to get better deals. You just won’t necessarily get best-of-breed or even good enough gear.

What about integration?

One would think that buying as much as possible from a single vendor gets you better integration between the bits. Not necessarily. For instance, most large vendors acquire various technologies and/or have OEM deals – if one looks just at storage as an example, Dell has their own storage (Equallogic and Compellent – two different acquisitions) plus an OEM deal with EMC. There’s not much synergy between the various technologies.

HP has their own storage (EVA, Lefthand, Ibrix, Polyserve, 3Par – four different acquisitions) and two OEM deals with Dot Hill for the MSA boxes and HDS for the high-end XP systems. That’s a lot of storage tin to keep track of (all of which comes from 7 different places and 7 totally different codebases), and any HP management software needs to be able to work with all of those boxes (and doesn’t).

IBM has their own storage (XIV, DS6000, DS8000, SONAS, SVC – I believe three homegrown and two acquisitions) and two different OEM deals (NetApp for N Series and LSI Logic for DS5000 and below). The integration between those products and the rest of the IBM landscape should be examined on a case-by-case basis.

EMC’s challenge is that they have acquired too much stuff, making it difficult to provide proper integration for everything. Supporting and developing for that plethora of systems is not easy and teams end up getting fragmented and inconsistent. Far too many codebases to keep track of. This dilutes the R&D dollars available and prolongs development cycles.

Aren’t those newfangled special-purpose multi-vendor stacks better?

There’s another breed of stack, the special-purpose one where a third party combines gear from different vendors, assembles it and sells it as a supported and pre-packaged solution for a specific application. Such stacks are actually not new – they have been sold for military, industrial and healthcare applications for the longest time. Recently, Netapp and EMC have been promoting different versions of what a “virtualization stack” should be (as usual, with very different approaches, check out FlexPod and Vblock).

The idea behind the “virtualization stack” is that you sell the customer a rack that has inside it network gear, servers, storage, management and virtualization software. Then, all the customer has to do is load up the gear with VMs and off they go.

With such a stack, you don’t limit the customer by making them buy their gear all from one vendor, but instead you limit them by pre-selecting vendors that are “best of breed”. Not everyone will be OK with the choice of gear, of course.

Then there’s the issue of flexibility – some of the special-purpose stacks literally are like black boxes – you are not supposed to modify them or you lose support. To the point where you’re not allowed to add RAM to servers or storage to arrays, both limitations that annoy most customers, but are viewed as a positive by some.

Is it a product or a “kit”?

Back to the virtualization-specific stacks: This is the main argument, do you buy a ready-made “product” or a “kit” some third party assembles after following a detailed design guide. As of this writing (there have been multiple changes to how this is marketed), Vblock is built by a company known as VCE – efectively a third party that puts together a custom stack made of different kinds of EMC storage, Cisco switches and servers, VMware, and a management tool called UIM. It is not built by EMC, VMware or Cisco. VCE then resells the assembled system to other VARs or directly to customers.

NetApp’s FlexPod is built by VARs. The difference is that more than one VAR can build FlexPods (as long as they meet some specific criteria) and don’t need to involve a middleman (also translating to more profits for VARs). All VARs building FlexPods need to follow specific guidelines to build the product (jointly designed by VMware, Cisco and NetApp), use components and firmware tested and certified to work together, and add best-of-breed management software to manage the stack.

The FlexPod emphasis is on sizing and performance flexibility (from tiny to gigantic), Secure Multi Tenancy (SMT – a unique differentiator), space efficiency, application integration, extreme resiliency and network/workload isolation – all highly important features in virtualized environments. In addition, it supports non-virtualized workloads.

Ultimately, in both cases the customer ends up with a pre-built and pre-tested product.

What about support?

This has been both the selling point and the drawback of such multi-vendor stacks. In my opinion, it has been the biggest selling point for Vblock, since a customer calls VCE for support. VCE has support staff that is trained on Cisco, VMware and EMC and can handle many support cases via a single support number – obviously a nice feature.

Where this breaks down a bit: VCE has to engage VMware, EMC and Cisco for anything that’s serious. Furthermore, Vblock support doesn’t support the entire stack but stops at the hypervisor.

For instance, if a customer hits an Enginuity (Symmetrix OS) bug, then the EMC Symm team will have to be engaged, and possibly write a patch for the customer and communicate with the customer. VCE support simply cannot fix such issues, and is best viewed as first-level support. Same goes for Cisco or VMware bugs, and in general deeper support issues that the VCE support staff simply isn’t trained enough to resolve. In addition, Vblocks can be based on several different kinds of EMC storage, that itself requires different teams to support it.

Finally – ask VCE if they are legally responsible for maintaining the support SLAs. For instance, who is responsible if there is a serious problem with the array and the vendor takes 2 days to respond instead of 30 minutes?

FlexPod utilizes a cooperative support model between NetApp, Cisco and VMware, and cases are dealt with by experts from all three companies working in concert. The first company to be called owns the case.

When the going gets tough, both approaches work similarly. For easy cases that can be resolved by the actual VCE support personnel, Vblock probably has an edge.

Who needs the virtualization stack?

I see several kinds of customers that have a need for such a stack (combinations are possible):

  1. The technically/time constrained. For them, it might be easier to get a somewhat pre-integrated solution. That way they can get started more quickly.
  2. The customers needing to hide behind contracts for legal/CYA reasons.
  3. The large. They need such huge amounts of servers and storage, and so frequently, that they simply don’t have the time to put it in themselves, let alone do the testing. They don’t even have time to have a PS team come onsite and build it. They just want to buy large, ready-to-go chunks, shipped as ready to use as possible.
  4. The rapidly growing.
  5. Anyone wanting pre-tested, repeatable and predictable configurations.

Interesting factoid for the #3 case (large customer): They typically need extensive customizations and most of the time would prefer custom-built infrastructure pods with the exact configuration and software they need. For instance, some customers might prefer certain management software over others, and/or want systems to come preconfigured with some of their own customizations in-place – but they are still looking for the packaged product experience. FlexPod is flexible enough to allow that without deviating from the design. Of course, if the customer wants to dramatically deviate (i.e. not use one of the main components like Cisco switches or servers, for instance) – then it stops being a FlexPod and you’re back to building systems the traditional way.

What customers should really be looking for when building a stack?

In my opinion, customers should be looking for a more end-to-end experience.

You see – even with the virtualization stack in place, you will need to add:

  • OSes
  • DBs
  • Email
  • File Services
  • Security
  • Document Management
  • Chargeback
  • Backup
  • DR
  • etc etc.

You should partner with someone that can help you not just with the storage/virtualization/server/network stack, but also with:

  • Proper alignment of VMs to maintain performance
  • Application-level integration
  • Application protection
  • Application acceleration

In essense, treat things holistically. The vendor that provides your virtualization stack needs to be able to help you all the way to, say, configuring Exchange properly so it doesn’t break best practices, and ensuring that firmware revs on the hardware don’t clash with, say, software patch levels.

You still won’t avoid support complexity. Sure, maybe you can have Microsoft do a joint support exercise with the hardware stack VAR, but, no matter how you package it, you are going to be touching the support of the various vendors making up the entire stack.

And you know what?

It’s OK.

D

 

Technorati Tags: , , , , ,