Tag Archives: RAID

How to decipher EMC’s new VNX pre-announcement and look behind the marketing.

It was with interest that I watched some of EMC’s announcements during EMC World. Partly due to competitor awareness, and partly due to being an irrepressible nerd, hoping for something really cool.

BTW: Thanks to Mark Kulacz for assisting with the proof points. Mark, as much as it pains me to admit so, is quite possibly an even bigger nerd than I am.

So… EMC did deliver something. A demo of the possible successor to VNX (VNX2?), unavailable as of this writing (indeed, a lot of fuss was made about it being lab only etc).

One of the things they showed was increased performance vs their current top-of-the-line VNX7500.

The aim of this article is to prove that the increases are not proportionally as much as EMC claims they are, and/or they’re not so much because of software, and, moreover, that some planned obsolescence might be coming the way of the VNX for no good reason. Aside from making EMC more money, that is.

A lot of hoopla was made about software being the key driver behind all the performance increases, and how they are now able to use all CPU cores, whereas in the past they couldn’t. Software this, software that. It was the theme of the party.

OK – I’ll buy that. Multi-core enhancements are a common thing in IT-land. Parallelization is key.

So, they showed this interesting chart (hopefully they won’t mind me posting this – it was snagged from their public video):

MCX core util arrow

I added the arrows for clarification.

Notice that the chart above left shows the current VNX using, according to EMCmaybe a total of 2.5 out of the 6 cores if you stack everything up (for instance, Core 0 is maxed out, Core 1 is 50% busy, Cores 2-4 do little, Core 5 does almost nothing). This is important and we’ll come back to it. But, currently, if true, this shows extremely poor multi-core utilization. Seems like there is a dedication of processes to cores – Core 0 does RAID only, for example. Maybe a way to lower context switches?

Then they mentioned how the new box has 16 cores per controller (the current VNX7500 has 6 cores per controller).

OK, great so far.

Then they mentioned how, By The Holy Power Of Software,  they can now utilize all cores on the upcoming 16-core box equally (chart above, right).

Then, comes the interesting part. They did an IOmeter test for the new box only.

They mentioned how the current VNX 7500 would max out at 170,000 8K random reads from SSD (this in itself a nice nugget when dealing with EMC reps claiming insane VNX7500 IOPS). And that the current model’s relative lack of performance is due to the fact its software can’t take advantage of all the cores.

Then they showed the experimental box doing over 5x that I/O. Which is impressive, indeed, even though that’s hardly a realistic way to prove performance, but I accept the fact they were trying to show how much more read-only speed they could get out of extra cores, plus it’s a cooler marketing number.

Writes are a whole separate wrinkle for arrays, of course. Then there are all the other ways VNX performance goes down dramatically.

However, all this leaves us with a few big questions:

  1. If this is really all about just optimized software for the VNX, will it also be available for the VNX7500?
  2. Why not show the new software on the VNX7500 as well? After all, it would probably increase performance by over 2x, since it would now be able to use all the cores equally. Of course, that would not make for good marketing. But if with just a software upgrade a VNX7500 could go 2x faster, wouldn’t that decisively prove EMC’s “software is king” story? Why pass up the opportunity to show this?
  3. So, if, with the new software the VNX7500 could do, say, 400,000 read IOPS in that same test, the difference between new and old isn’t as dramatic as EMC claims… right? :)
  4. But, if core utilization on the VNX7500 is not as bad as EMC claims in the chart (why even bother with the extra 2 cores on a VNX7500 vs a VNX5700 if that were the case), then the new speed improvements are mostly due to just a lot of extra hardware. Which, again, goes against the “software” theme!
  5. Why do EMC customers also need XtremeIO if the new VNX is that fast? What about VMAX? :)

Point #4 above is important. For instance, EMC has been touting multi-core enhancements for years now. The current VNX FLARE release has 50% better core efficiency than the one before, supposedly. And, before that, in 2008, multi-core was advertised as getting 2x the performance vs the software before that. However, the chart above shows extremely poor core efficiency. So which is it? 

Or is it maybe that the box demonstrated is getting most of its speed increase not so much by the magic of better software, but mostly by vastly faster hardware – the fastest Intel CPUs (more clockspeed, not just more cores, plus more efficient instruction processing), latest chipset, faster memory, faster SSDs, faster buses, etc etc. A potential 3-5x faster box by hardware alone.

It doesn’t quite add up as being a software “win” here.

However – I (or at least current VNX customers) probably care more about #1. Since it’s all about the software after all:)

If the new software helps so much, will they make it available for the existing VNX? Seems like any of the current boxes would benefit since many of their cores are doing nothing according to EMC. A free performance upgrade!

However… If they don’t make it available, then the only rational explanation is that they want to force people into the new hardware – yet another forklift upgrade (CX->VNX->”new box”).

Or maybe that there’s some very specific hardware that makes the new performance levels possible. Which, as mentioned before, kinda destroys the “software magic” story.

If it’s all about “Software Defined Storage”, why is the software so locked to the hardware?

All I know is that I have an ancient NetApp FAS3070 in the lab. The box was released ages ago (2006 vintage), and yet it’s running the most current GA ONTAP code. That’s going back 3-4 generations of boxes, and it launched with software that was very, very different to what’s available today. Sometimes I think we spoil our customers.

Can a CX3-80 (the beefiest of the CX3 line, similar vintage to the NetApp FAS3070) take the latest code shown at EMC World? Can it even take the code currently GA for VNX? Can it even take the code available for CX4? Can a CX4-960 (again, the beefiest CX4 model) take the latest code for the shipping VNX? I could keep going. But all this paints a rather depressing picture of being able to stretch EMC hardware investments.

But dealing with hardware obsolescence is a very cool story for another day.

D

 

Technorati Tags: , , , , , , ,

More EMC VNX caveats

Lately, when competing with VNX, I see EMC using several points to prove they’re superior (or at least not deficient).

I’d already written this article a while back, and today I want to explore a few aspects in more depth since my BS pain threshold is getting pretty low. The topics discussed:

  1. VNX space efficiency
  2. LUNs can be served by either controller for “load balancing”
  3. Claims that autotiering helps most workloads
  4. Claims that storage pools are easier
  5. Thin provisioning performance (this one’s interesting)
  6. The new VNX snapshots

References to actual EMC documentation will be used. Otherwise I’d also be no better than a marketing droid.

VNX space efficiency

EMC likes claiming they don’t suffer from the 10% space “tax” NetApp has. Yet, linked here are the best practices showing that in an autotiered pool, at least 10% free space should be available per tier in order for autotiering to be able to do its thing (makes sense).

Then there’s also a 3GB minimum overhead per LUN, plus metadata overhead, calculated with a formula in the linked article. Plus possibly more metadata overhead if they manage to put dedupe in the code.

My point is: There’s no free lunch. If you want certain pool-related features, there is a price to pay. Otherwise, keep using the old-fashioned RAID groups that don’t offer any of the new features but at least offer predictable performance and capacity utilization.

LUNs can be served by either controller for “load balancing”

This is a fun one. The claim is that LUN ownership can be instantly switched over from one VNX controller to another in order to load balance their utilization. Well – as always, it depends. It’s also important to note that VNX as of this writing does not do any sort of automatic load balancing of LUN ownership based on load.

  1. If it’s using old-fashioned RAID LUNs: Transferring LUN ownership is indeed doable with no issues. It’s been like that forever.
  2. If the LUN is in a pool – different story. There’s no quick way to shift LUN ownership to another controller without significant performance loss.

There’s copious information here. Long story short: You don’t change LUN ownership with pools, but rather need to do a migration of the LUN contents to the other controller (to another LUN, you can’t just move the LUN as-is – this also creates issues), otherwise there will be a performance tax to pay.

Claims that autotiering helps most workloads

Not so FAST. EMC’s own best practice guides are rife with caveats and cautions regarding autotiering. Yet this feature is used as a gigantic differentiator at every sales campaign.

For example, in the very thorough “EMC Scaling performance for Oracle Virtual Machine“, the following graph is shown on page 35:

NewImage

The arrows were added by me. Notice that most of the performance benefit is provided once cache is appropriately sized. Adding an extra 5 SSDs for VNX tiering provides almost no extra benefit for this database workload.

One wonders how fast it would go if an extra 4 SSDs were added for even more cache instead of going to the tier… :)

Perchance the all-cache line with 8 SSDs would be faster than 4 cache SSDs and 5 tier SSDs, but that would make for some pretty poor autotiering marketing.

Claims that storage pools are easier

The typical VNX pitch to a customer is: Use a single, easy, happy, autotiered pool. Despite what marketing slicks show, unfortunately, complexity is not really reduced with VNX pools – simply because single pools are not recommended for all workloads. Consider this typical VNX deployment scenario, modeled after best practice documents:

  1. RecoverPoint journal LUNs in a separate RAID10 RAID group
  2. SQL log LUNs in a separate RAID10 RAID group
  3. Exchange 2010 log LUNs in a separate RAID10 RAID group
  4. Exchange 2010 data LUNs can be in a pool as long as it has a homogeneous disk type, otherwise use multiple RAID groups
  5. SQL data can be in an autotiered pool
  6. VMs might have to go in a separate pool or maybe share the SQL pool
  7. VDI linked clone repository would probably use SSDs in a separate RAID10 RAID group

OK, great. I understand that all the I/O separation above can be beneficial. However, the selling points of pooling and autotiering are that they should reduce complexity, reduce overall cost, improve performance and improve space efficiency. Clearly, that’s not the case at all in real life. What is the reason all the above can’t be in a single pool, maybe two, and have some sort of array QoS to ensure prioritization?

And what happens to your space efficiency if you over-allocate disks to the old-fashioned RAID groups above? How do you get the space back?

What if you under-allocated? How easy would it be to add a bit more space or performance? (not 2-3x – let’s say you need just 20% more). Can you expand an old-fashioned VNX RAID group by a couple of disks?

And what’s the overall space efficiency now that this kind of elaborate split is necessary? Hmm… ;)

For more detail, check these Exchange and SQL design documents.

Thin provisioning performance

This is just great.

VNX thin provisioning performs very poorly relative to thick and even more poorly relative to standard RAID groups. The performance issue makes complete sense due to how space is allocated when writing thin on a VNX, with 8KB blocks assigned as space is being used. A nice explanation of how pool space is allocated is here. A VNX writes to pools using 1GB slices. Thick LUNs pre-allocate as many 1GB slices as necessary, which keeps performance acceptable. Thin LUNs obviously don’t pre-allocate space and currently have no way to optimize writes or reads – the result is fragmentation, in addition to the higher CPU, disk and memory overhead to maintain thin LUNs :)

From the Exchange 2010 design document again, page 23:

NewImage

Again, I added the arrows to point out a couple of important things:

  1. Thin provisioning is not recommended for high performance workloads on VNX
  2. Indeed, it’s so slow that you should run your thin pools with RAID10!!!

But wait – thin provisioning is supposed to help me save space, and now I have to run it with RAID10, which chews up more space?

Kind of an oxymoron.

And what if the customer wants the superior reliability of RAID6 for the whole pool? How fast is thin provisioning then?

Oh, and the VNX has no way to fix the fragmentation that’s rampant in its thin LUNs. Short of a migration to another LUN (kind of a theme it seems).

The new VNX snapshots

The VNX has a way to somewhat lower the traditionally extreme impact of FLARE snapshots by switching from COFW (Copy On First Write) to ROFW (Redirect On First Write).

The problem?

The new VNX snapshots need a pool, and need thin LUNs. It makes sense from an engineering standpoint, but…

Those are exactly the 2 VNX features that lower performance.

There are many other issues with the new VNX snapshots, but that’s a story for another day. It’s no wonder EMC pushes RecoverPoint far more than their snaps…

The takeaway

There’s marketing, and then there’s engineering reality.

Since the VNX is able to run both pools and old-fashioned RAID groups, marketing wisely chooses to not be very specific about what works with what.

The reality though is that all the advanced features only work with pools. But those come with significant caveats.

If you’re looking at a VNX – at least make sure you figure out whether the marketed features will be usable for your workload. Ask for a full LUN layout.

And we didn’t even talk about having uniform RAID6 protection in pools, which is yet another story for another day.

D

Technorati Tags: , , , , , , , , ,

Are you doing a disservice to your company with RFPs?

Whether we like it or not, RFPs (Request For Proposal) are a fact of life for vendors.

It usually works like this: A customer has a legitimate need for something. They decide (for whatever reason) to get bids from different vendors. They then craft an RFP document that is either:

  1. Carefully written, with the best intentions, so that they get the most detailed proposal possible given their requirements, or
  2. Carefully tailored by them and the help of their preferred vendor to box out the other vendors.

Both approaches have merit, even if #2 seems unethical and almost illegal. I understand that some people are just happy with what they have so they word their document to block anyone from changing their environment, turning the whole RFP process into an exercice in futility. I doubt that whatever I write here will change that kind of mindset.

However – I want to focus more on #1. The carefully written RFP that truly has the best intentions (and maybe some of it will rub off on the #2 “blocking” RFP type folks).

Here’s the major potential problem with the #1 approach:

You don’t know what you don’t know. For example, maybe you are not an expert on how caching works at a very low level, but you are aware of caching and what it does. So – you know that you don’t know about the low-level aspects of caching (or whatever other technology) and word your RFP so that you learn in detail how the various vendors do it.

The reality is – there are things whose existence you can’t even imagine – indeed, most things:

WhatUknow

By crafting your RFP around things you are familiar with, you are potentially (and unintentionally) eliminating solutions that may do things that are entirely outside your past experiences.

Back to our caching example – suppose you are familiar with arrays that need a lot of write cache in order to work well for random writes, so you put in your storage RFP requirements about very specific minimum amounts of write cache.

That’s great and absolutely applicable to the vendors that write to disk the way you are familiar with.

But what if someone writes to disk entirely differently than what your experience dictates and doesn’t need large amounts of write cache to do random writes even better than what you’re familiar with? What if they use memory completely differently in general?

Another example where almost everyone gets it wrong is specifying performance requirements. Unless you truly understand the various parameters that a storage system needs in order to properly be sized for what you need, it’s almost guaranteed the requirements list will be incomplete. For example, specifying IOPS without an I/O size and read/write blend and latency and sequential vs random – at a minimum – will not be sufficient to size a storage system (there’s a lot more here in case you missed it).

By setting an arbitrary limit to something that doesn’t apply to certain technologies, you are unintentionally creating a Type #2 RFP document – and you are boxing out potentially better solutions, which is ultimately not good for your business. And by not providing enough information, you are unintentionally making it almost impossible for the solution providers to properly craft something for you.

So what to do to avoid these RFP pitfalls?

Craft your RFP by asking questions about solving the business problem, not by trying to specify how the vendor should solve the business problem.

For example: Say something like this about space savings:

“Describe what, if any, technologies exist within the gizmo you’re proposing that will result in the reduction of overall data space consumption. In addition, describe what types of data and what protocols such technologies can work with, when they should be avoided, and what, if any, performance implications exist. Be specific.”

Instead of this:

“We need the gizmo to have deduplication that works this way with this block size plus compression that uses this algorithm but not that other one“.

Or, say something like this about reliability:

“Describe the technologies employed to provide resiliency of data, including protection from various errors, like lost or misplaced writes”.

Instead of:

“The system needs to have RAID10 disk with battery-backed write cache”.

It’s not easy. Most of us try to solve the problem and have at least some idea of how we think it should be solved. Just try to avoid that instinct while writing the RFP…

And, last but not least:

Get some help for crafting your RFP. We have this website that will even generate one for you. It’s NetApp-created, so take it with a grain of salt, but it was designed so the questions were fair and open-ended and not really vendor-specific. At least go through it and try building an RFP with it. See if it puts in questions you hadn’t thought of asking, and see how things are worded.

And get some help in getting your I/O requirements… most vendors have tools that can help with that. It may mean that you are repeating the process several times – but at least you’ll get to see how thorough each vendor is regarding the performance piece. Beware of the ones that aren’t thorough.

D

NetApp posts great Cluster-Mode SPC-1 result

<Edited to add some more information on how SPC-1 works since there was some confusion based on the comments received>

We’ve been busy at NetApp… busy perfecting the industry’s only scale-out unified platform, among other things.

We’ve already released ONTAP 8.1, which, in Cluster-Mode, allows 24 nodes (each with up to 8TB cache) for NAS workloads, and 4 nodes for block workloads (FC and iSCSI).

With ONTAP 8.1.1 (released on June 14th), we increased the node count to 6 for block workloads plus we added some extra optimizations and features. FYI: the node count is just what’s officially supported now, there’s no hard limit.

After our record NFS benchmark results, people have been curious about the block I/O performance of ONTAP Cluster-Mode, so we submitted an SPC-1 benchmark result using part of the same gear left over from the SPEC SFS NFS testing.

To the people that think NetApp is not a fit for block workloads (typically the ones believing competitor FUD): These are among the best SPC-1 results for enterprise disk-based systems given the low latency for the IOPS provided (it’s possible to get higher IOPS with higher latency, as we’ll explain later on in this post).

Here’s the link to the result and another with the page showing all the results.

This blog has covered SPC-1 tests before. A quick recap: The SPC-1 benchmark is an industry-standard, audited, tough, block-based benchmark (on Fiber Channel) that tries to stress-test disk subsystems with a lot of writes, overwrites, hotspots, a mix of random and sequential, write after read, read after write, etc. About 60% of the workload is writes. The I/O sizes are of a large variety – from small to large (so, SPC-1 IOPS are decidedly not the same thing as fully random uniform 4KB IOPS and should not be treated as such).

The benchmark access patterns do have hotspots that are a significant percentage of the total workload. Such hotspots can be either partially cached if the cache is large enough or placed on SSD if the arrays tested have an autotiering system granular and intelligent enough.

If an array can perform well in the SPC-1 workload, it will usually perform extremely well under difficult, latency-sensitive, dynamically changing DB workloads and especially OLTP. The full spec is here for the morbidly curious.

The trick with benchmarks is interpreting the results. A single IOPS number, while useful, doesn’t tell the whole story with respect to the result being useful for real applications. We’ll attempt to assist in the deciphering of the results in this post.

Before we delve into the obligatory competitive analysis, some notes for the ones lacking in faith:

  1. There was no disk short-stroking in the NetApp benchmark (a favorite way for many vendors to get good speeds out of disk systems by using only the outer part of the disk – the combination of higher linear velocity and smaller head movement providing higher performance and reduced seeks). Indeed, we used a tuning parameter that uses the entire disk surface, no matter how full the disks. Look at the full disclosure report here, page 61. For the FUD-mongers out there: This effectively pre-ages WAFL. We also didn’t attempt to optimize the block layout by reallocating blocks.
  2. There was no performance degradation over time.
  3. Average latency (“All ASUs” in the results) was flat and stayed below 5ms during multiple iterations of the test, including the sustainability test (page 28 of the full disclosure report).
  4. No extra cache beyond what comes with the systems was added (512GB comes standard with each 6240 node, 3TB per node is possible on this model, so there’s plenty of headroom for much larger working sets).
  5. It was not a “lab queen” system. We used very few disks to achieve the performance compared to the other vendors, and it’s not even the fastest box we have.


ANALYSIS

When looking at this type of benchmark, one should probably focus on :
  1. High sustained IOPS (inconsistency is frowned upon).
  2. IOPS/drive (a measure of efficiency – 500 IOPS/drive is twice as efficient as 250 IOPS/drive, meaning a lot less drives are needed, which results in lower costs, less physical footprint, etc.)
  3. Low, stable latency over time (big spikes are frowned upon).
  4. IOPS as a function of latency (do you get high IOPS but also very high latency at the top end? Is that a useful system?)
  5. The RAID protection used (RAID6? RAID10? RAID6 can provide both better protection and better space efficiency than mirroring, resulting in lower cost yet more reliable systems).
  6. What kind of drives were used? Ones you are likely to purchase?
  7. Was autotiering used? If not, why not? Isn’t it supposed to help in such difficult scenarios? Some SSDs would be able to handle the hotspots.
  8. The amount of hardware needed to get the stated performance (are way too many drives and controllers needed to do it? Does that mean a more complex and costly system? What about management?)
  9. The cost (some vendors show discounts and others show list price, so be careful there).
  10. The cost/op (which is the more useful metric – assuming you compare list price to list price).
SPC-1 is not a throughput-type benchmark, for sheer GB/s look elsewhere. Most of the systems didn’t do more than 4GB/s in this benchmark since a lot of the operations are random (and 4GB/s is quite a lot of random I/O).

SYSTEMS COMPARED

In this analysis we are comparing disk-based systems. Pure-SSD (or plain old RAM) performance-optimized configs can (predictably) get very high performance and may be a good choice if someone has a very small workload that needs to run very fast.

The results we are focusing on, on the other hand, are highly reliable, general-purpose systems that can provide both high performance, low latency and high capacity at a reasonable cost to many hosts and applications, along with rich functionality (snaps, replication, megacaching, thin provisioning, deduplication, compression, multiple protocols incl. NAS etc. Whoops – none of the other boxes aside from NetApp do all this, but such is the way the cookie crumbles).

Here’s a list of the systems with links to their full SPC-1 disclosure where you can find all the info we’ll be displaying. Those are all systems with high results and relatively flat sustained latency results.

There are some other disk-based systems with decent IOPS results but if you look at their sustained latency (“Sustainability – Average Response Time (ms) Distribution Data” in any full disclosure report) there’s too high a latency overall and too much jitter past the initial startup phase, with spikes over 30ms (which is extremely high), so we ignored them.

Here’s a quick chart of the results sorted according to latency. In addition, the prices shown are the true list prices (which can be found in the disclosures) plus the true $/IO cost based on that list price (a lot of vendors show discounted pricing to make that seem lower):

…BUT THAT CHART SHOWS THAT SOME OF THE OTHER BIG BOXES ARE FASTER THAN NETAPP… RIGHT?

That depends on whether you value and need low latency or not (and whether you take RAID type into account). For the vast majority of DB workloads, very low I/O latencies are vastly preferred to high latencies.

Here’s how you figure out the details:
  1. Choose any of the full disclosure links you are interested in. Let’s say the 3Par one, since it shows both high IOPS and high latency.
  2. Find the section titled “Response Time – Throughput Curve”. Page 13 in the 3Par result.
  3. Check whether latency rises sharply as load is added to the system.

Shown below is the 3Par curve:

3parlatency

Notice how latency rises quite sharply after a certain point.

Now compare this to the NetApp result (page 13):

Netappspclatency

Notice how the NetApp result has in general much lower latency but, more importantly, the latency stays low and rises slowly as load is added to the system.

Which is why the column “SPC-1 IOPS around 3ms” was added to the table. Effectively, what would the IOPS be at around the same latency for all the vendors?

Once you do that, you realize that the 3Par system is actually slower than the NetApp system if a similar amount of low latency is desired. Plus it costs several times more.

You can get the exact latency numbers just below the graphs on page 13, the NetApp table looks like this (under the heading “Response Time – Throughput Data”):

Netappspcdata

Indeed, of all the results compared, only the IBM SVC (with a bunch of V7000 boxes behind it) is faster than NetApp at that low latency point. Which neatly takes us to the next section…

WHAT IS THE 100% LOAD POINT?

I had to add this since it is confusing. The 100% load point does not mean the arrays tested were necessarily maxed out. Indeed, most of the arrays mentioned could sustain bigger workloads given higher latencies. 3Par just decided to show the performance at that much higher latency point. The other vendors decided to show the performance at latencies more palatable to Tier 1 DB workloads.

The SPC-1 load generators are simply told to run at a specific target IOPS and that is chosen to be the load level. The goal being to balance cost, IOPS and latency.

JUST HOW MUCH HARDWARE IS NEEDED TO GET A SYSTEM TO PERFORM?

Almost any engineering problem can be solved given the application of enough hardware. The IBM result is a great example of a very fast system built by adding a lot of hardware together:

  • 8 SVC virtualization engines plus…
  • …16 separate V7000 systems under the SVC controllers…
  • …each consisting of 2 more SVC controllers and 2 RAID controllers
  • 1,920 146GB 15,000 RPM disks (not quite the drive type people buy these days)
  • For a grand total of 40 Linux-based SVC controllers (8 larger and 32 smaller), 32 RAID controllers, and a whole lot of disks.

Putting aside for a moment the task of actually putting together and managing such a system, or the amount of power it draws, or the rack space consumed, that’s quite a bit of gear. I didn’t even attempt to add up all the CPUs working in parallel, I’m sure it’s a lot.

Compare it to the NetApp configuration:
  • 6 controllers in one cluster
  • 432 450GB 15,000 RPM disks (a pretty standard and common drive type as of the time of this writing in June 2012).

SOME QUESTIONS (OTHER VENDORS FEEL FREE TO RESPOND):

  1. What would performance be with RAID6 for the other vendors mentioned? NetApp always tests with our version of RAID6 (RAID-DP). RAID6 is more reliable than mirroring, especially when large pools are in question (not to mention more space-efficient). Most customers won’t buy big systems with all-RAID10 configs these days… (customers, ask your vendor. There is no magic – I bet they have internal results with RAID6, make them show you).
  2. Autotiering is the most talked-about feature it seems, with attributes that make it seem more important than the invention of penicillin or even the wheel, maybe even fire… However, none of the arrays mentioned are using any SSDs for autotiering (IBM published a result once – nothing amazing, draw your own conclusions). One would think that a benchmark that creates hot spots would be an ideal candidate… (and, to re-iterate, there are hotspots and of a percentage small enough to easily fit in SSD). At least IBM’s result proves that (after about 19 hours) autotiering works for the SPC-1 workload – which further solidifies the question: Why is nobody doing this if it’s supposed to be so great?
  3. Why are EMC and Dell unwilling to publish SPC-1 results? (they are both SPC members). They are the only 2 major storage vendors that won’t publish SPC-1 results. EMC said in the past they don’t think SPC-1 is a realistic test – well, only running your applications with your data on the array is ever truly realistic. What SPC-1 is, though, is an industry-standard benchmark for a truly difficult random workload with block I/O, and a great litmus test.
  4. For a box regularly marketed for Tier-1 workloads, the IBM XIV is, once more, suspiciously absent, even in its current Gen3 guise. It’s not like IBM is shy about submitting SPC-1 results :)
  5. Finally – some competitors keep saying NetApp is “not true SAN”, “emulated SAN” etc. Whatever that means – maybe the NetApp approach is better after all… the maximum write latency of the NetApp submission was 1.91ms for a predominantly write workload :)

FINAL THOUGHTS

With this recent SPC-1 result, NetApp showed once more that ONTAP running in Cluster-Mode is highly performing and highly scalable for both SAN and NAS workloads. Summarily, ONTAP Cluster-Mode:
  • Allows for highly performant and dynamically-scalable unified clusters for FC, iSCSI, NFS and CIFS.
  • Exhibits proven low latency while maintaining high performance.
  • Provides excellent price/performance.
  • Allows data on any node to be accessed from any other node.
  • Moves data non-disruptively between nodes (including CIFS, which normally is next to impossible).
  • Maintains the traditional NetApp features (write optimization, application awareness, snapshots, deduplication, compression, replication, thin provisioning, megacaching).
  • Can use the exact same FAS gear as ONTAP running in the legacy 7-mode for investment protection.
  • Can virtualize other arrays behind it.
 Courteous comments always welcome.
D

Interpreting $/IOPS and IOPS/RAID correctly for various RAID types

<Article updated with more accurate calculation>

There are some impressive new scores at storageperformance.org, with the usual crazy configurations of thousands of drives etc.

Regarding price/performance:

When looking at $/IOP, make sure you are comparing list price (look at the full disclosure report, that has all the details for each config).

Otherwise, you could get the wrong $/IOP since some vendors have list prices, others show heavy discounting.

For example, a box that does $6.5/IOP after 50% discounting, would be $13/IOP using list prices.

Regarding RAID:

As I have mentioned in other posts, RAID plays a big role in both protection and performance.

Most SPC-1 results are using RAID10, with the notable exception of NetApp (we use RAID-DP, mathematically analogous to RAID6 in protection).

Here’s a (very) rough way to convert a RAID10 result to RAID6, if the vendor you’re looking for doesn’t have a RAID6 result, but you know the approximate percentage of random writes:

  1. SPC-1 is about 60% writes.
  2. Take any RAID10 result, let’s say 200,000 IOPS.
  3. 60% of that is 120,000, that’s the write ops. 40% is the reads, or 80,000 read ops.
  4. If using RAID6, you’d be looking at roughly a 3x slowdown for the writes: 120,000/3 = 40,000
  5. Add that to the 40% of the reads and you get the final result:
  6. 80,000 reads + 40,000 writes = 120,000 RAID6-corrected SPC-1 IOPS. Which is not quite as big as the RAID10 result… :)
  7. RAID5 would be the writes divided by 2.
All this happens because one random write can result in 6 back-end I/Os with RAID6, 4 with RAID5 and 2 with RAID10.

Just make sure you’re comparing apples to apples, that’s all. I know we all suffer from ADD in this age of information overload, but do spend some time going through the full disclosure, since there’s always interesting stuff in there…

D