NetApp posts SPC-1 Top Ten Performance results for its high end systems – Tier 1 meets high functionality and high performance

It’s been a while since our last SPC-1 benchmark submission with high-end systems in 2012. Since then we launched all new systems, and went from ONTAP 8.1 to ONTAP 8.3, big jumps in both hardware and software.

In 2012 we posted an SPC-1 result with a 6-node FAS6240 cluster  – not our biggest system at the time but we felt it was more representative of a realistic solution and used a hybrid configuration (spinning disks boosted by flash caching technology). It still got the best overall balance of low latency (Average Response Time or ART in SPC-1 parlance, to be used from now on), high SPC-1 IOPS, price, scalability, data resiliency and functionality compared to all other spinning disk systems at the time.

Today (April 22, 2015) we published SPC-1 results with an 8-node all-flash high-end FAS8080 cluster to illustrate the performance of the largest current NetApp FAS systems in this industry-standard benchmark.

For the impatient…

  • The NetApp All-Flash FAS8080 SPC-1 submission places the system in the #5 performance spot in the SPC-1 Top Ten by performance list.
  • And #3 if you look at performance at load points around 1ms Average Response Time (ART).
  • The NetApp system uses RAID-DP, similar to RAID-6, whereas the other entries use RAID-10 (typically, RAID-6 is considered slower than RAID-10).
  • In addition, the FAS8080 shows the best storage efficiency, by far, of any Top Ten SPC-1 submission (and without using compression or deduplication).
  • The FAS8080 offers far more functionality than any other system in the list.

We also recently posted results with the NetApp EF560  – the other major hardware platform NetApp offers. See my post here and the official results here. Different value proposition for that platform – less features but very low ART and great cost effectiveness are the key themes for the EF560.

In this post I want to explain the current Clustered Data ONTAP results and why they are important.

Flash performance without compromise

Solid state storage technologies are becoming increasingly popular.

The challenge with flash offerings from most vendors is that customers typically either have to give up a lot in order to get the high performance of flash, or have to combine 4-5 different products into a complex “solution” in order to satisfy different requirements.

For instance, dedicated all-flash offerings may not be able to natively replicate to less expensive, spinning-drive solutions.

Or, a flash system may offer high performance but not the functionality, scalability, reliability and data integrity of more mature solutions.

But what if you could have it all? Performance and reliability and functionality and scalability and maturity? That’s exactly what Clustered Data ONTAP 8.3 provides.

Here are some Clustered Data ONTAP 8.3 running on FAS8080 highlights:

  • All the NetApp signature ultra-tight application integration and automation for replication, SnapShots, Clones
  • Fancy write-accelerated RAID6-equivalent protection by default
  • Comprehensive data integrity and protection against insidious lost write/torn page/misplaced write errors that RAID and normal checksums don’t always catch
  • Non-disruptive data mobility for all protocols
  • Non-disruptive operations – no downtime even when doing things that would require downtime and extensive PS with other vendors
  • Granular QoS
  • Deduplication and compression
  • Highly scalable – 5,760 drives possible in an 8-node cluster, 17,280 drives possible in the max 24 nodes. Various drive types in the cluster, from SSD to SATA and everything else in between.
  • Multiprotocol (FC, iSCSI, NFS, SMB1,2,3) on the same hardware (no “helper” boxes needed, no dedicated SAN vs NAS pools needed)
  • 96,000 LUNs per 8-node cluster (that’s right, ninety-six thousand LUNs – about 50% more than the maximum possible with the other high-end systems)
  • ONTAP is VMware vVol ready
  • The only array that has been validated by VMware for VMware Horizon 6 with vVols – hopefully the competitors will follow our lead
  • Over 460TB (yes, TeraBytes) of usable cache after all overheads are accounted for (and without accounting for cache amplification through deduplication and clones) in an 8-node cluster. Makes competitor maximum cache amounts seem like rounding errors – indeed, the actual figure might be 465TB or more, but it’s OK… :) (and 3x that number in a 24-node cluster, over 1.3PB cache!)
  • The ability to virtualize other storage arrays behind it
  • The ability to have a cluster with dissimilar size and type nodes – no need to keep all engines the same (unlike monolithic offerings). Why pay the same for all nodes when some nodes may not need all the performance? Why be forced to keep all nodes in the same hardware family? What if you don’t want to buy all at once? Maybe you want to upgrade part of the cluster with a newer-gen system? :)
  • The ability to evacuate part of a cluster and build that part as a different cluster elsewhere
  • The ability to have multiple disk types in a cluster and, indeed, dedicate nodes to functions (for instance, have a few nodes all-flash, some nodes with flash-accelerated SAS and a couple with very dense yet flash-accelerated NL-SAS, with full online data mobility between nodes)
That last bullet deserves a picture:


“SVM” stands for Storage Virtual Machine –  it means a logical storage partition that can span one or more cluster nodes and have parts of the underlying capacity (performance and space) available to it, with its own users, capacity and performance limits etc.

In essence, Clustered Data ONTAP offers the best combination of performance, scalability, reliability, maturity and features of any storage system extant as of this writing. Indeed – look at some of the capabilities like maximum cache and number of LUNs. This is designed to be the cornerstone of a datacenter.

it makes most other systems seem like toys in comparison…


FUD buster

Another reason we wanted to show this result was FUD from competitors struggling to find an angle to fight NetApp. It goes a bit like this: “NetApp FAS systems aren’t real SAN, it’s all simulated and performance will be slow!”



Well – for a “simulated” SAN (whatever that means), the performance is pretty amazing given the level of protection used (RAID6-equivalent – far more resilient and capacity-efficient for large pooled deployments than the RAID10 the other submissions use) and all the insane scalability, reliability and functionality on tap :)

Another piece of FUD has been that ONTAP isn’t “flash-optimized” since it’s a very mature storage OS and wasn’t written “from the ground up for flash”. We’ll let the numbers speak for themselves. It’s worth noting that we have been incorporating a lot of flash-related innovations into FAS systems well before any other competitor did so, something conveniently ignored by the FUD-mongers. In addition, ONTAP 8.3 has a plethora of flash optimizations and path length improvements that helped with the excellent response time results. And lots more is coming.

The final piece of FUD we made sure was addressed was system fullness – last time we ran the test we didn’t fill up as much as we could have, which prompted the FUD-mongers to say that FAS systems need gigantic amounts of free space to perform. Let’s see what they’ll come up with this time 😉

On to the numbers!

As a refresher, you may want to read past SPC-1 posts here and here, and my performance primer here.

Important note: SPC-1 is a 100% block-based benchmark with its own I/O blend and, as such, the results from any vendor SPC-1 submission should not be compared to marketing IOPS numbers of all reads or metadata-heavy NAS benchmarks like SPEC SFS (which are far easier on systems than the 60% write blend of the SPC-1 workload). Indeed, the tested configuration might perform in the millions of “marketing” IOPS – but that’s decidedly not the point of this benchmark.

The SPC-1 Result links if you want the detail are here (summary) and here (full disclosure). In addition, here’s the link to the “Top 10 Performance” systems page so you can compare other submissions that are in the upper performance echelon (unfortunately, SPC-1 results are normally just alphabetically listed, making it time-consuming to compare systems unless you’re looking at the already sorted Top 10 list).

I recommend you look beyond the initial table in each submission showing the performance and $/SPC-1 IOPS and at least go to the price table to see the detail. The submissions calculate $/SPC-1 IOPS based on submitted price but not all vendors use discounted pricing. You may want to do your own price/performance calculations.

The things to look for in SPC-1 submissions

Typically you’re looking for the following things to make sense of an SPC-1 submission:

  • ART vs IOPS – many submissions will show high IOPS at huge ART, which would be rather useless when it comes to Flash storage
  • Sustainability – was performance even or are there constant huge spikes?
  • RAID level – most submissions use RAID10 for speed, what would happen with RAID6?
  • Application Utilization. This one is important yet glossed over. It signifies how much capacity the benchmark consumed vs the overall raw capacity of the system, before RAID, spares etc.

Let’s go over these one by one.


Our ART was 1.23ms at 685,281.71 SPC-1 IOPS, and pretty flat over time during the test:



The SPC-1 rules state the minimum runtime should be 8 hours. We ran the test for 18 hours to observe if there would be variation in the performance. There was no significant variation:


RAID level

RAID-DP was used for all FAS8080EX testing. This is mathematically analogous in protection to RAID-6. Given that these systems are typically deployed in very large pooled configurations, we elected long ago to not recommend single parity RAID since it’s simply not safe enough. RAID-10 is fast and fine for smaller capacity SSD systems but, at scale, it gets too expensive for anything but a lab queen (a system that nobody in their right mind will ever buy but which benchmarks well).

Application Utilization

Our Application Utilization was a very high 61.92% – unheard of by other vendors posting SPC-1 results since they use RAID10 which, by definition, wastes half the capacity (plus spares and other overheads to worry about on top of that).


Some vendors using RAID10 will fill up the resulting space after RAID, spares etc. to a very high degree, and call out the “Protected Application Utilization” as being the key thing to focus on.

This could not be further from the truth – Application Utilization is the only metric that really shows how much of the total possible raw capacity the benchmark actually used and signifies how space-efficient the storage was.

Otherwise, someone could do quadruple mirroring of 100TB, fill up the resulting 25TB to 100%, and call that 100% efficient… when in fact it only consumed 25% :)

It is important to note there was no compression or deduplication enabled by any vendor since it is not allowed by the current version of the benchmark.

Compared to other vendors

I wanted to show a comparison between the Top Ten Performance results both in absolute terms and also normalized around 1ms ART.

Here are the Top Ten highest performing systems as of April 22, 2015, with vendor results links if you want to look at things in detail:

FYI, the HP XP 9500 and the Hitachi system above it in the list are the exact same system, HP resells the HDS array as their high-end offering.

I will show columns that explain the results of each vendor around 1ms. Why 1ms and not more or less? Because in the Top Ten SPC-1 performance list, most results show fairly low ART, but some have very high ART, and it’s useful to show performance at that lower ART load point, which is becoming the ART standard for All-Flash systems. 1ms seems to be a good point for multi-function SSD systems (vs simpler, smaller but more speed-optimized architectures like the NetApp EF560).

The way you determine the 1ms ART load point is by looking at the table that shows ART vs SPC-1 IOPS. Let’s pick IBM’s 780 since it has a very interesting curve so you learn what to look for.

From page 5 of the IBM Power Server 780 SPC-1 Executive Summary:


IBM’s submitted SPC-1 IOPS are high but at a huge ART number for an all-SSD solution (18.90ms). Not very useful for customers picking an all-SSD system. Even the next load point, with an average ART of 6.41ms, is high for an all-flash solution.

To more accurately compare this to the rest of the vendors with decent ART, you need to look at the table to find the closest load point around 1ms (which, in this case, it’s the 10% load point at 0.71ms – the next one up is much higher at 2.65ms).

You can do a similar exercise for the rest, it’s worth a look – I don’t want to paste all these tables and graphs since this post will get too big. But it’s interesting to see how SPC-1 IOPS vs ART are related and translate that to your business requirements for application latency.

Here’s the table with the current Top Ten SPC-1 Performance results as of 4/22/2015. Click on it for a clearer picture, there’s a lot going on.


Key for the chart (the non-obvious parts anyway):

  • The “SPC-1 Load Level near 1ms” is the load point in each SPC-1 Report that corresponds to the SPC-1 IOPS achieved near 1ms. This is not how busy each array was (I see this misinterpreted all the time).
  • The “Total ASU Capacity” is the amount of capacity the test consumed.
  • The “Physical Storage Capacity” is the total amount of capacity in the array before RAID etc.
  • “Application Utilization” is ASU Capacity divided by Physical Storage Capacity.

What do the results show?

Predictably, all-flash systems trump disk-based and hybrid systems for performance and can offer very nice $/SPC-1 IOPS numbers. That is the major allure of flash – high performance density.

Some takeaways from the comparison:

  • Based on SPC-1 IOPs around 1ms Average Response Time load points, the FAS8080 EX shifts from 5th place to 3rd
  • The other vendors used RAID10 – NetApp used RAID-DP (similar to RAID6 in protection). What would happen to their results if they switched to RAID6 to provide a similar level of protection and efficiency?
  • Aside from the NetApp FAS result, the rest of the Top Ten Performance submissions offer vastly lower Application Utilization – about half! Which means that NetApp is able to use 2x the capacity vs raw compared to the other submissions. And that’s before starting to count the possible storage efficiencies we can turn on like dedupe and compression.

How does one pick a flash array?

It depends. What are you trying to do? Solve a tactical problem? Just need a lot of extra speed and far lower latency for some workloads? No need for the array to have a ton of functionality? A lot of the data management happens in the application? Need something cost-effective, simple yet reliable? Then an all-flash system like the NetApp EF560 is a solid answer, and it can still be front-ended by a Clustered Data ONTAP system to provide more functionality if the need arises in the future (we are firm believers in hardware reuse and investment protection – you see, some companies talk about Software Defined Storage, we do Software Defined Storage).

On the other hand, if you would prefer an Enterprise architecture that can serve as the cornerstone of your datacenter for almost any workload and protocol, offers rich data management functionality and tight application integration, insane scalability, non-disruptive everything and offers the most features (reliably) compared to any other platform – then the FAS line running Clustered Data ONTAP is the only possible answer.

Couple that with OnCommand Insight – the best multivendor fabric management tool on the planet – plus Workflow Automation, and we’ve got you covered.

In summary – the all-flash FAS8080EX gets a pretty amazing performance and efficiency SPC-1 result, especially given the extensive portfolio of functionality it offers. In my opinion, no competitor system offers the sheer functionality the FAS8080 does – not even close. Additionally, I believe that certain competitors have very questionable viability and/or tiny market penetration, making them a risky proposition for a high end system purchase.



Technorati Tags: , , , , , , , ,

NetApp Posts Top Ten SPC-1 Price-Performance Results for the new EF560 All-Flash Array

<edit: updated with the changes in the SPC-1 price/performance lineup as of 3/27/2015, fixed some typos>

I’m happy to announce that today we announced the new, third-gen EF560 all-flash array, and also posted SPC-1 results showing the impressive performance it is capable of in this extremely difficult benchmark.

If you have no time to read further – the EF560 achieves, by far, the absolute best price/performance at very low latencies in the SPC-1 benchmark.

The EF line has been enjoying great success for some time now with huge installations in some of the biggest companies in the world with the highest profile applications (as in, things most of us use daily).

The EF560 is the latest all-flash variant of the E-Series family, optimized for very low latency and high performance workloads while ensuring high reliability, cost effectiveness and simplicity.

EF560 highlights

The EF560 runs SANtricity – a lean, heavily optimized storage OS with an impressively short path length (the overhead imposed by the storage OS itself to all data going through the system). In the case of the EF the path length is tiny, around 30 microseconds. Most other storage arrays have a much longer path length as a result of more features and/or coding inefficiencies.

Keeping the path length this impressively short is one of the reasons the EF does away with fashionable All-Flash features like compression and deduplication –  make no mistake, no array that performs those functions is able to sustain that impressively short a path length. There’s just too much in the way. If you really want data reduction and an incredible number of features, we offer that in the FAS line – but the path length naturally isn’t as short as the EF560’s.

A result of the short path length is impressively low latency while maintaining high IOPS with a very reasonable configuration, as you will see further in the article.

Some other EF560 features:

  • No write cliff due to SSD aging or fullness
  • No performance impact due to SSD garbage collection
  • Enterprise components – including SSDs
  • Six-nines available
  • Up to 120x 1.6TB SSDs per system (135TB usable with DDP protection, even more with RAID5/6)
  • High throughput – 12GB/s reads, 8GB/s writes per system (many people forget that DB workloads need not just low latency and high IOPS but also high throughput for certain operations).
  • All software is included in the system price, apart from encryption
  • The system can do snaps and replication, including fully synchronous replication
  • Consistency Group support
  • Several application plug-ins
  • There are no NAS capabilities but instead there is a plethora of block connectivity options: FC, iSCSI, SAS, InfiniBand
  • The usual suspects of RAID types – 5, 10, 6 plus…
  • DDP – Dynamic Disk Pools, a type of declustered RAID6 implementation that performs RAID at the sub-disk level – very handy for large pools, rapid disk rebuilds with minimal performance impact and overall increased flexibility (for instance, you could add a single disk to the system instead of entire RAID groups’ worth)
  • T10-PI to help protect against insidious data corruption that might bypass RAID and normal checksums, and provide end-to-end protection, from the application all the way to the storage device
  • Can also be part of a Clustered Data ONTAP system using the FlexArray license on FAS.

The point of All-Flash Arrays

Going back to the short path length and low latency discussion…

Flash has been a disruptive technology because, if used properly, it allows an unprecedented performance density, at increasingly reasonable costs.

The users of All-Flash Arrays typically fall in two camps:

  1. Users that want lots of features, data reduction algorithms, good but not deterministic performance and not crazy low latencies – 1-2ms is considered sufficient for this use case (with the occasional latency spike), as it is better than hybrid arrays and way better than all-disk systems.
  2. Users that need the absolute lowest possible latency (starting in the microseconds – and definitely less than 1ms worst-case) while maintaining uncompromising reliability for their applications, and are willing to give up certain features to get that kind of performance. The performance for this type of user needs to be deterministic, without weird latency spikes, ever.

The low latency camp typically uses certain applications that need very low latency to generate more revenue. Every microsecond counts, while failures would typically mean significant revenue loss (to the point of making the cost of the storage seem like pocket change).

Some of you may be reading this and be thinking “so what, 1ms to 2ms is a tiny difference, it’s all awesome”. Well – at that level of the game, 2ms is twice the latency of 1ms, and it is a very big deal indeed. For the people that need low latency, a 1ms latency array is half the speed of a 500 microsecond array, even if both do the same IOPS.

You may also be thinking “SSDs that fit in a server’s PCI slot have low latency, right?”

The answer is yes, but what’s missing is the reliability a full-fledged array brings. If the server dies, access is lost. If the card dies, all is lost.

So, when looking for an All-Flash Array, think about what type of flash user you are. What your business actually needs. That will help shape your decisions.

All-Flash Array background operations can affect latency

The more complex All-Flash Arrays have additional capabilities compared to the ultra-low-latency gang, but also have a higher likelihood of producing relatively uneven latency under heavy load while full, and even latency spikes (besides their naturally higher latency due to the longer path length).

For instance, things like cleanup operations, various kinds of background processing that kicks off at different times, and different ways of dealing with I/O depending on how full the array is, can all cause undesirable latency spikes and overall uneven latency. It’s normal for such architectures, but may be unacceptable for certain applications.

Notably, the EF560 doesn’t suffer from such issues. We have been beating competitors in difficult performance situations with the slower predecessors of the EF560, and we will keep doing it with the new, faster system :)

Enough already, show me the numbers!

As a refresher, you may want to read past SPC-1 posts here and here, and my performance primer here.

Important note: SPC-1 is a block-based benchmark with its own I/O blend and, as such, the results from any vendor’s SPC-1 Result should not be compared to marketing IOPS numbers of all reads or metadata-heavy NAS benchmarks like SPEC SFS (which are far easier on systems than the 60% write blend and hotspots of the SPC-1 workload). Indeed, the tested configuration could perform way more “marketing” IOPS – but that’s decidedly not the point of this benchmark.

The EF560 SPC-1 Result links if you want the detail are here (summary) and here (full disclosure). In addition, here’s the link to the “Top 10 by Price-Performance” systems page so you can compare to other submissions (unfortunately, SPC-1 results are normally just alphabetically listed, making it time-consuming to compare systems unless you’re looking at the already sorted Top 10 lists).

The things to look for in SPC-1 submissions

Typically you’re looking for the following things to make sense of an SPC-1 submission:

  • Latency vs IOPS – many submissions will show high IOPS at huge latency, which would be rather useless for the low-latency crowd
  • Sustainability – was performance even or are there constant huge spikes?
  • RAID level – most submissions use RAID10 for speed, what would happen with RAID6?
  • Application Utilization. This one is important yet glossed over. It signifies how much capacity the benchmark consumed vs the overall raw capacity of the system, before RAID, spares etc.
  • Price – discounted or list?

Let’s go over these one by one.

Latency vs IOPS

Our average latency was 0.93ms at 245,011.76 SPC-1 IOPS, and extremely flat during the test:



The SPC-1 rules state the minimum runtime should be 8 hours. There was no significant variation in performance during the test:


RAID level

RAID-10 was used for all testing, with T10-PI Data Assurance enabled (which has a performance penalty but the applications these systems are used for typically need paranoid data integrity). This system would perform slower with RAID5 or RAID6. But for applications where the absolute lowest latency is important, RAID10 is a safe bet, especially with systems that are not write-optimized for RAID6 writes like Data ONTAP is. Not to fret though – the price/performance remained stellar as you will see.

Application Utilization

Our Application Utilization was a very high 46.90% – among the highest of any submission with RAID10 (and among the highest overall, only Data ONTAP submissions can go higher due to RAID-DP).


We did almost completely fill up the resulting RAID10 space, to show that the system’s performance is unaffected when very full. However, Application Utilization is the only metric that really shows how much of the total possible raw capacity the benchmark actually used and signifies how space-efficient the storage was.

Otherwise, someone could do quadruple mirroring of 100TB, fill up the resulting 25TB to 100%, and call that 100% efficient… when in fact it only consumed 25% :)

It is important to note there was no compression or deduplication enabled by any vendor since it is not allowed by the current version of the benchmark.

Compared to other vendors

I wanted to show a comparison between the SPC-1 Top Ten Price-Performance results both in absolute terms and also normalized around 500 microsecond latency to illustrate the fact that very low latency with great performance is still possible at a compelling price point with this solution.

Why 500 microseconds you might ask? Because that’s a good place for very low latency flash storage systems. Why not 1 millisecond you might also ask? Well, 1ms is more commonly found on systems that have more features and don’t concentrate on low latency as much (1ms is half the speed of 500 microseconds).

Here are the Top Ten Price-Performance systems as of March 27, 2015, with SPC-1 Results links if you want to look at things in detail:

  1. X-IO ISE 820 G3 All Flash Array
  2. Dell Storage SC4020 (6 SSDs)
  3. NetApp EF560 Storage System
  4. Huawei OceanStor Dorado2100 G2
  5. HP 3PAR StoreServ 7400 Storage System
  7. Kaminario K2 (28 nodes)
  8. Huawei OCEANSTOR Dorado 5100
  9. Huawei OCEANSTOR Dorado 2100

I will show columns that explain the results of each vendor around 500 microseconds, plus how changing the latency target affects SPC-1 IOPS and also how it affects $/SPC1-IOPS.

The way you determine that lower latency point (SPC calls it “Average Response Time“) is by looking at the graph that shows latency vs SPC-1 IOPS and finding the load point closest to 500 microseconds. Let’s pick Kaminario’s K2 so you learn what to look for:


Notice how the SPC-1 IOPS around half a millisecond is about 10x slower than the performance around 3ms latency. The system picks up after that very rapidly, but if your requirements are for latency to not exceed 500 microseconds, you will be better off spending your money elsewhere (indeed, a very high profile client asked us for 400 microsecond max response at the host level from the first-gen EF systems for their Oracle DBs – this is actually very realistic for many market segments).

Here’s the table with all this analysis done for you. BTW, the “adjusted latency” $/SPC-1 IOPS is not something in the SPC-1 Reports but simply calculated for our example by dividing system price by the SPC-1 IOPS found at the 500 microsecond point in all the reports.

What do the results show?

As submitted, the EF560 is #3 in the absolute Price-Performance ranking. Interestingly, once adjusted for latency around 500 microseconds at list prices (to keep a level playing field), the price/performance of the EF560 is far better than anything else on the chart.

Regarding pricing: Note that some vendors have discounted pricing and some not, always check the SPC-1 report for the prices and don’t just read the summary at the beginning (for example, Fujitsu has 30% discounts showing in the reports, Dell, X-IO and HP all at 45% off – the rest aren’t discounted).

Our price-performance is even better once you adjust for discounts in some of the other results. Update: In this edited version of the chart I show the list price calculations as well. We are #1 in price/performance when adjusted for list pricing even at the higher submitted latencies for all vendors… :)

Another interesting observation is the effects of longer path length on some platforms – for instance, Dell’s lowest reported latency is 0.70ms at a mere 11,249.97 SPC-1 IOPS. Clearly, that is not a system geared towards high performance at very low latency. In addition, the response time for the submitted max SPC-1 IOPS for the Dell system is 4.83ms, firmly in the “nobody cares” category for all-flash systems :) (sorry guys).

Conversely… The LRT (Least Response Time) we submitted for the EF560 was a tiny 0.18ms (180 microseconds) at 24,501.04 SPC-1 IOPS. This is the lowest LRT anyone has ever posted on any array for the SPC-1 benchmark.

Clearly we are doing something right :)

Final thoughts

If your storage needs require very low latency coupled with very high reliability, the EF560 would be an ideal candidate. In addition, the footprint of the system is extremely compact, the SPC-1 results shown are with just a 2U EF560 with 24x 400GB SSDs.

Coupled with Clustered Data ONTAP systems and OnCommand Insight and WorkFlow Automation, NetApp has an incredible portfolio, able to take on any challenge.



Technorati Tags: , , , ,

NetApp posts great Cluster-Mode SPC-1 result

<Edited to add some more information on how SPC-1 works since there was some confusion based on the comments received>

We’ve been busy at NetApp… busy perfecting the industry’s only scale-out unified platform, among other things.

We’ve already released ONTAP 8.1, which, in Cluster-Mode, allows 24 nodes (each with up to 8TB cache) for NAS workloads, and 4 nodes for block workloads (FC and iSCSI).

With ONTAP 8.1.1 (released on June 14th), we increased the node count to 6 for block workloads plus we added some extra optimizations and features. FYI: the node count is just what’s officially supported now, there’s no hard limit.

After our record NFS benchmark results, people have been curious about the block I/O performance of ONTAP Cluster-Mode, so we submitted an SPC-1 benchmark result using part of the same gear left over from the SPEC SFS NFS testing.

To the people that think NetApp is not a fit for block workloads (typically the ones believing competitor FUD): These are among the best SPC-1 results for enterprise disk-based systems given the low latency for the IOPS provided (it’s possible to get higher IOPS with higher latency, as we’ll explain later on in this post).

Here’s the link to the result and another with the page showing all the results.

This blog has covered SPC-1 tests before. A quick recap: The SPC-1 benchmark is an industry-standard, audited, tough, block-based benchmark (on Fiber Channel) that tries to stress-test disk subsystems with a lot of writes, overwrites, hotspots, a mix of random and sequential, write after read, read after write, etc. About 60% of the workload is writes. The I/O sizes are of a large variety – from small to large (so, SPC-1 IOPS are decidedly not the same thing as fully random uniform 4KB IOPS and should not be treated as such).

The benchmark access patterns do have hotspots that are a significant percentage of the total workload. Such hotspots can be either partially cached if the cache is large enough or placed on SSD if the arrays tested have an autotiering system granular and intelligent enough.

If an array can perform well in the SPC-1 workload, it will usually perform extremely well under difficult, latency-sensitive, dynamically changing DB workloads and especially OLTP. The full spec is here for the morbidly curious.

The trick with benchmarks is interpreting the results. A single IOPS number, while useful, doesn’t tell the whole story with respect to the result being useful for real applications. We’ll attempt to assist in the deciphering of the results in this post.

Before we delve into the obligatory competitive analysis, some notes for the ones lacking in faith:

  1. There was no disk short-stroking in the NetApp benchmark (a favorite way for many vendors to get good speeds out of disk systems by using only the outer part of the disk – the combination of higher linear velocity and smaller head movement providing higher performance and reduced seeks). Indeed, we used a tuning parameter that uses the entire disk surface, no matter how full the disks. Look at the full disclosure report here, page 61. For the FUD-mongers out there: This effectively pre-ages WAFL. We also didn’t attempt to optimize the block layout by reallocating blocks.
  2. There was no performance degradation over time.
  3. Average latency (“All ASUs” in the results) was flat and stayed below 5ms during multiple iterations of the test, including the sustainability test (page 28 of the full disclosure report).
  4. No extra cache beyond what comes with the systems was added (512GB comes standard with each 6240 node, 3TB per node is possible on this model, so there’s plenty of headroom for much larger working sets).
  5. It was not a “lab queen” system. We used very few disks to achieve the performance compared to the other vendors, and it’s not even the fastest box we have.


When looking at this type of benchmark, one should probably focus on :
  1. High sustained IOPS (inconsistency is frowned upon).
  2. IOPS/drive (a measure of efficiency – 500 IOPS/drive is twice as efficient as 250 IOPS/drive, meaning a lot less drives are needed, which results in lower costs, less physical footprint, etc.)
  3. Low, stable latency over time (big spikes are frowned upon).
  4. IOPS as a function of latency (do you get high IOPS but also very high latency at the top end? Is that a useful system?)
  5. The RAID protection used (RAID6? RAID10? RAID6 can provide both better protection and better space efficiency than mirroring, resulting in lower cost yet more reliable systems).
  6. What kind of drives were used? Ones you are likely to purchase?
  7. Was autotiering used? If not, why not? Isn’t it supposed to help in such difficult scenarios? Some SSDs would be able to handle the hotspots.
  8. The amount of hardware needed to get the stated performance (are way too many drives and controllers needed to do it? Does that mean a more complex and costly system? What about management?)
  9. The cost (some vendors show discounts and others show list price, so be careful there).
  10. The cost/op (which is the more useful metric – assuming you compare list price to list price).
SPC-1 is not a throughput-type benchmark, for sheer GB/s look elsewhere. Most of the systems didn’t do more than 4GB/s in this benchmark since a lot of the operations are random (and 4GB/s is quite a lot of random I/O).


In this analysis we are comparing disk-based systems. Pure-SSD (or plain old RAM) performance-optimized configs can (predictably) get very high performance and may be a good choice if someone has a very small workload that needs to run very fast.

The results we are focusing on, on the other hand, are highly reliable, general-purpose systems that can provide both high performance, low latency and high capacity at a reasonable cost to many hosts and applications, along with rich functionality (snaps, replication, megacaching, thin provisioning, deduplication, compression, multiple protocols incl. NAS etc. Whoops – none of the other boxes aside from NetApp do all this, but such is the way the cookie crumbles).

Here’s a list of the systems with links to their full SPC-1 disclosure where you can find all the info we’ll be displaying. Those are all systems with high results and relatively flat sustained latency results.

There are some other disk-based systems with decent IOPS results but if you look at their sustained latency (“Sustainability – Average Response Time (ms) Distribution Data” in any full disclosure report) there’s too high a latency overall and too much jitter past the initial startup phase, with spikes over 30ms (which is extremely high), so we ignored them.

Here’s a quick chart of the results sorted according to latency. In addition, the prices shown are the true list prices (which can be found in the disclosures) plus the true $/IO cost based on that list price (a lot of vendors show discounted pricing to make that seem lower):


That depends on whether you value and need low latency or not (and whether you take RAID type into account). For the vast majority of DB workloads, very low I/O latencies are vastly preferred to high latencies.

Here’s how you figure out the details:
  1. Choose any of the full disclosure links you are interested in. Let’s say the 3Par one, since it shows both high IOPS and high latency.
  2. Find the section titled “Response Time – Throughput Curve”. Page 13 in the 3Par result.
  3. Check whether latency rises sharply as load is added to the system.

Shown below is the 3Par curve:


Notice how latency rises quite sharply after a certain point.

Now compare this to the NetApp result (page 13):


Notice how the NetApp result has in general much lower latency but, more importantly, the latency stays low and rises slowly as load is added to the system.

Which is why the column “SPC-1 IOPS around 3ms” was added to the table. Effectively, what would the IOPS be at around the same latency for all the vendors?

Once you do that, you realize that the 3Par system is actually slower than the NetApp system if a similar amount of low latency is desired. Plus it costs several times more.

You can get the exact latency numbers just below the graphs on page 13, the NetApp table looks like this (under the heading “Response Time – Throughput Data”):


Indeed, of all the results compared, only the IBM SVC (with a bunch of V7000 boxes behind it) is faster than NetApp at that low latency point. Which neatly takes us to the next section…


I had to add this since it is confusing. The 100% load point does not mean the arrays tested were necessarily maxed out. Indeed, most of the arrays mentioned could sustain bigger workloads given higher latencies. 3Par just decided to show the performance at that much higher latency point. The other vendors decided to show the performance at latencies more palatable to Tier 1 DB workloads.

The SPC-1 load generators are simply told to run at a specific target IOPS and that is chosen to be the load level. The goal being to balance cost, IOPS and latency.


Almost any engineering problem can be solved given the application of enough hardware. The IBM result is a great example of a very fast system built by adding a lot of hardware together:

  • 8 SVC virtualization engines plus…
  • …16 separate V7000 systems under the SVC controllers…
  • …each consisting of 2 more SVC controllers and 2 RAID controllers
  • 1,920 146GB 15,000 RPM disks (not quite the drive type people buy these days)
  • For a grand total of 40 Linux-based SVC controllers (8 larger and 32 smaller), 32 RAID controllers, and a whole lot of disks.

Putting aside for a moment the task of actually putting together and managing such a system, or the amount of power it draws, or the rack space consumed, that’s quite a bit of gear. I didn’t even attempt to add up all the CPUs working in parallel, I’m sure it’s a lot.

Compare it to the NetApp configuration:
  • 6 controllers in one cluster
  • 432 450GB 15,000 RPM disks (a pretty standard and common drive type as of the time of this writing in June 2012).


  1. What would performance be with RAID6 for the other vendors mentioned? NetApp always tests with our version of RAID6 (RAID-DP). RAID6 is more reliable than mirroring, especially when large pools are in question (not to mention more space-efficient). Most customers won’t buy big systems with all-RAID10 configs these days… (customers, ask your vendor. There is no magic – I bet they have internal results with RAID6, make them show you).
  2. Autotiering is the most talked-about feature it seems, with attributes that make it seem more important than the invention of penicillin or even the wheel, maybe even fire… However, none of the arrays mentioned are using any SSDs for autotiering (IBM published a result once – nothing amazing, draw your own conclusions). One would think that a benchmark that creates hot spots would be an ideal candidate… (and, to re-iterate, there are hotspots and of a percentage small enough to easily fit in SSD). At least IBM’s result proves that (after about 19 hours) autotiering works for the SPC-1 workload – which further solidifies the question: Why is nobody doing this if it’s supposed to be so great?
  3. Why are EMC and Dell unwilling to publish SPC-1 results? (they are both SPC members). They are the only 2 major storage vendors that won’t publish SPC-1 results. EMC said in the past they don’t think SPC-1 is a realistic test – well, only running your applications with your data on the array is ever truly realistic. What SPC-1 is, though, is an industry-standard benchmark for a truly difficult random workload with block I/O, and a great litmus test.
  4. For a box regularly marketed for Tier-1 workloads, the IBM XIV is, once more, suspiciously absent, even in its current Gen3 guise. It’s not like IBM is shy about submitting SPC-1 results :)
  5. Finally – some competitors keep saying NetApp is “not true SAN”, “emulated SAN” etc. Whatever that means – maybe the NetApp approach is better after all… the maximum write latency of the NetApp submission was 1.91ms for a predominantly write workload :)


With this recent SPC-1 result, NetApp showed once more that ONTAP running in Cluster-Mode is highly performing and highly scalable for both SAN and NAS workloads. Summarily, ONTAP Cluster-Mode:
  • Allows for highly performant and dynamically-scalable unified clusters for FC, iSCSI, NFS and CIFS.
  • Exhibits proven low latency while maintaining high performance.
  • Provides excellent price/performance.
  • Allows data on any node to be accessed from any other node.
  • Moves data non-disruptively between nodes (including CIFS, which normally is next to impossible).
  • Maintains the traditional NetApp features (write optimization, application awareness, snapshots, deduplication, compression, replication, thin provisioning, megacaching).
  • Can use the exact same FAS gear as ONTAP running in the legacy 7-mode for investment protection.
  • Can virtualize other arrays behind it.
 Courteous comments always welcome.

Interpreting $/IOPS and IOPS/RAID correctly for various RAID types

<Article updated with more accurate calculation>

There are some impressive new scores at, with the usual crazy configurations of thousands of drives etc.

Regarding price/performance:

When looking at $/IOP, make sure you are comparing list price (look at the full disclosure report, that has all the details for each config).

Otherwise, you could get the wrong $/IOP since some vendors have list prices, others show heavy discounting.

For example, a box that does $6.5/IOP after 50% discounting, would be $13/IOP using list prices.

Regarding RAID:

As I have mentioned in other posts, RAID plays a big role in both protection and performance.

Most SPC-1 results are using RAID10, with the notable exception of NetApp (we use RAID-DP, mathematically analogous to RAID6 in protection).

Here’s a (very) rough way to convert a RAID10 result to RAID6, if the vendor you’re looking for doesn’t have a RAID6 result, but you know the approximate percentage of random writes:

  1. SPC-1 is about 60% writes.
  2. Take any RAID10 result, let’s say 200,000 IOPS.
  3. 60% of that is 120,000, that’s the write ops. 40% is the reads, or 80,000 read ops.
  4. If using RAID6, you’d be looking at roughly a 3x slowdown for the writes: 120,000/3 = 40,000
  5. Add that to the 40% of the reads and you get the final result:
  6. 80,000 reads + 40,000 writes = 120,000 RAID6-corrected SPC-1 IOPS. Which is not quite as big as the RAID10 result… :)
  7. RAID5 would be the writes divided by 2.
All this happens because one random write can result in 6 back-end I/Os with RAID6, 4 with RAID5 and 2 with RAID10.

Just make sure you’re comparing apples to apples, that’s all. I know we all suffer from ADD in this age of information overload, but do spend some time going through the full disclosure, since there’s always interesting stuff in there…