It’s been a while since our last SPC-1 benchmark submission with high-end systems in 2012. Since then we launched all new systems, and went from ONTAP 8.1 to ONTAP 8.3, big jumps in both hardware and software.
In 2012 we posted an SPC-1 result with a 6-node FAS6240 cluster – not our biggest system at the time but we felt it was more representative of a realistic solution and used a hybrid configuration (spinning disks boosted by flash caching technology). It still got the best overall balance of low latency, high SPC-1 IOPS, price, scalability, data resiliency and functionality compared to all other spinning disk systems at the time.
Today (April 22, 2015) we published SPC-1 results with an 8-node all-flash high-end FAS8080 cluster to illustrate the performance of the largest current NetApp FAS systems in this industry-standard benchmark.
For the impatient…
- The NetApp All-Flash FAS8080 SPC-1 submission places the system in the #5 performance spot in the SPC-1 Top Ten by performance list.
- And #3 if you look at performance at 1ms latency.
- Highest performing All-Flash Enterprise Unified system.
- The NetApp system uses RAID-DP, similar to RAID-6, whereas the other entries use RAID-10 – performance would be far lower for the other entries with RAID-6
- Price-performance-wise, the FAS8080 gets the #4 spot once adjusted for all list prices
- In addition, the FAS8080 shows the best storage efficiency, by far, of any SPC-1 submission to date (and without using compression or deduplication).
- The FAS8080 offers far more functionality than any other system in the list.
We also recently posted results with the NetApp EF560 – the other major hardware platform NetApp offers. See my post here and the official results here. Different value proposition for that platform – less features but very low latency and great cost effectiveness are the key themes for the EF560.
In this post I want to explain the current Clustered ONTAP results and why they are important.
Flash performance without compromise
Solid state storage technologies are becoming increasingly popular.
The challenge with flash offerings from most vendors is that customers typically either have to give up a lot in order to get the high performance of flash, or have to combine 4-5 different products into a complex “solution” in order to satisfy different requirements.
For instance, dedicated all-flash offerings may not be able to natively replicate to less expensive, spinning-drive solutions.
Or, a flash system may offer high performance but not the functionality, scalability, reliability and data integrity of more mature solutions.
But what if you could have it all? Performance and reliability and functionality and scalability and maturity? That’s exactly what Clustered ONTAP 8.3 provides.
Here are some Clustered ONTAP 8.3 running on FAS8080 highlights:
- All the NetApp signature ultra-tight application integration and automation for replication, SnapShots, Clones
- Fancy write-accelerated RAID6-equivalent protection by default
- Comprehensive data integrity and protection against insidious lost write/torn page/misplaced write errors that RAID and normal checksums don’t always catch
- Non-disruptive data mobility for all protocols
- Non-disruptive operations – no downtime even when doing things that would require downtime and extensive PS with other vendors
- Granular QoS
- Deduplication and compression
- Highly scalable – 5,760 drives possible in an 8-node cluster, 17,280 drives possible in the max 24 nodes. Various drive types, from SSD to SATA and everything else in between.
- Multiprotocol (FC, iSCSI, NFS, SMB1,2,3) on the same hardware (no “helper” boxes needed, no dedicated SAN vs NAS pools needed)
- 96,000 LUNs per 8-node cluster (that’s right, ninety-six thousand LUNs)
- ONTAP is VMware vVol ready
- The only array that has been validated by VMware for VMware Horizon 6 with vVols – hopefully the competitors will follow our lead
- Over 460TB (yes, TeraBytes) of usable cache after all overheads are accounted for (and without accounting for cache amplification through deduplication and clones) in an 8-node cluster. Makes competitor maximum cache amounts seem like rounding errors – indeed, the actual figure might be 465TB or more, but it’s OK… (and 3x that number in a 24-node cluster, over 1.3PB cache!)
- The ability to virtualize other storage arrays behind it
- The ability to have a cluster with dissimilar size and type nodes – no need to keep all engines the same (unlike monolithic offerings). Why pay the same for all nodes when some nodes may not need all the performance? Why be forced to keep all nodes in the same hardware family? What if you don’t want to buy all at once? Maybe you want to upgrade part of the cluster with a newer-gen system?
- The ability to evacuate part of a cluster and build that part as a different cluster elsewhere
- The ability to have multiple disk types in a cluster and, indeed, dedicate nodes to functions (for instance, have a few nodes all-flash, some nodes with flash-accelerated SAS and a couple with very dense yet flash-accelerated NL-SAS, with full online data mobility between nodes)
“SVM” stands for Storage Virtual Machine – it means a logical storage partition that can span one or more cluster nodes and have parts of the underlying capacity (performance and space) available to it, with its own users, capacity and performance limits etc.
In essence, Clustered ONTAP offers the best combination of performance, scalability, reliability, maturity and features of any storage system extant as of this writing. Indeed – look at some of the capabilities like maximum cache and number of LUNs. This is designed to be the cornerstone of a datacenter.
it makes most other systems seem like toys in comparison…
Another reason we wanted to show this result was FUD from competitors struggling to find an angle to fight NetApp. It goes a bit like this: “NetApp FAS systems aren’t real SAN, it’s all simulated and performance will be slow!”
Well – for a “simulated” SAN (whatever that means), the performance is pretty amazing given the level of protection used (RAID6-equivalent – far more resilient and capacity-efficient for large pooled deployments than the RAID10 the other submissions use) and all the insane scalability, reliability and functionality on tap
Another piece of FUD has been that ONTAP isn’t “flash-optimized” since it’s a very mature storage OS and wasn’t written “from the ground up”. We’ll let the numbers speak for themselves. It’s worth noting that we have been incorporating a lot of flash-related innovations into FAS systems well before any other competitor did so, something conveniently ignored by the FUD-mongers. In addition, ONTAP 8.3 has a plethora of flash optimizations and path length improvements that helped with the good latency results. And lots more is coming.
The final piece of FUD we made sure was addressed was system fullness – last time we ran the test we didn’t fill up as much as we could have, which prompted the FUD-mongers to say that FAS systems need gigantic amounts of free space to perform. Let’s see what they’ll come up with this time
On to the numbers!
Important note: SPC-1 is a 100% block-based benchmark with its own I/O blend and, as such, the results from any vendor SPC-1 submission should not be compared to marketing IOPS numbers of all reads or metadata-heavy NAS benchmarks like SPEC SFS (which are far easier on systems than the 60% write blend of the SPC-1 workload). Indeed, the tested configuration might perform in the millions of “marketing” IOPS – but that’s decidedly not the point of this benchmark.
The SPC-1 Result links if you want the detail are here (summary) and here (full disclosure). In addition, here’s the link to the “Top 10 Performance” systems page so you can compare other submissions that are in the upper performance echelon (unfortunately, SPC-1 results are normally just alphabetically listed, making it time-consuming to compare systems unless you’re looking at the already sorted Top 10 list).
I recommend you look beyond the initial table in each submission showing the performance and $/IOPS and at least go to the actual price list to see the detail. For instance, HDS shows a 58% discount if you go to the detail here, and calculates their $/IOPS number based on the discounted price. Just be aware and remember – the only way to get a real price is to talk to your sales rep.
The things to look for in SPC-1 submissions
Typically you’re looking for the following things to make sense of an SPC-1 submission:
- Latency vs IOPS – many submissions will show high IOPS at huge latency, which would be rather useless when it comes to Flash storage
- Sustainability – was performance even or are there constant huge spikes?
- RAID level – most submissions use RAID10 for speed, what would happen with RAID6?
- Application Utilization. This one is important yet glossed over. It signifies how much capacity the benchmark consumed vs the overall raw capacity of the system, before RAID, spares etc.
Let’s go over these one by one.
Latency vs IOPS
Our average latency was 1.23ms at 685,281.71 SPC-1 IOPS, and pretty flat over time during the test:
The SPC-1 rules state the minimum runtime should be 8 hours. We ran the test for 18 hours to observe if there would be variation in the performance. There was no significant variation:
RAID-DP was used for all testing. This is mathematically analogous in protection to RAID-6. Given that these systems are typically deployed in very large pooled configurations, we elected long ago to not recommend single parity RAID since it’s simply not safe enough. RAID-10 is fast and fine for smaller capacity SSD systems but, at scale, it gets too expensive for anything but a lab queen (a system that nobody in their right mind will ever buy but which benchmarks well).
Our Application Utilization was a very high 61.92% – unheard of by other vendors posting SPC-1 results since they use RAID10 which, by definition, wastes half the capacity (plus spares and other overheads to worry about on top of that).
Some vendors using RAID10 will fill up the resulting space after RAID, spares etc. to a very high degree, and call out the “Protected Application Utilization” as being the key thing to focus on.
This could not be further from the truth – Application Utilization is the only metric that really shows how much of the total possible raw capacity the benchmark actually used and signifies how space-efficient the storage was.
Otherwise, someone could do quadruple mirroring of 100TB, fill up the resulting 25TB to 100%, and call that 100% efficient… when in fact it only consumed 25%
It is important to note there was no compression or deduplication enabled by any vendor since it is not allowed by the current version of the benchmark.
Compared to other vendors
I wanted to show a comparison between the Top Ten Performance results both in absolute terms and also normalized around 1ms latency.
Here are the Top Ten highest performing systems as of April 22, 2015, with vendor results links if you want to look at things in detail:
- Hitachi Virtual Storage Platform G1000
- Kaminario K2
- Huawei OceanStor 18800
- IBM Power Server 780
- NetApp FAS8080
- Huawei OceanStor 6800 V3
- HDS VSP
- HP XP P9500 (same as the VSP above, HP resells it as their high end offering)
- Huawei OceanStor Dorado 5100
- IBM SVC with V7000
- IBM System Storage DS8870
I will show columns that explain the results of each vendor around 1ms. Why 1ms and not more or less? Because in the Top Ten SPC-1 performance list, most results show fairly low latency, but some have very high latency, and it’s useful to show performance at that lower latency point, which is becoming the latency standard for All-Flash systems. 1ms seems to be a good point for multi-function SSD systems (vs simpler, smaller but more speed-optimized architectures like the NetApp EF560).
The way you determine the 1ms latency point is by looking at the graph that shows latency vs SPC-1 IOPS. Let’s pick IBM’s 780 since it has a very interesting curve so you learn what to look for.
From page 5 of the IBM 780 SPC-1 report:
IBM’s submitted SPC-1 IOPS are high but at a huge latency number for an all-SSD solution (18.90ms). Not very useful for customers picking an all-SSD system. Even the next load point, with an average latency of 6.41ms, is high for an all-flash solution.
To more accurately compare this to the rest of the vendors with decent latency, you need to look at the chart around 1ms.
They didn’t publish a load point close to 1ms so I’ll “grant” them 200,000 SPC-1 IOPS at that point (the chart shows it’s probably less but it’s OK, it makes no difference to the overall standing in the end).
You can do a similar exercise for the rest, it’s worth a look – I don’t want to paste all these graphs since this post will get too big and firmly in tl;dr territory if it isn’t already
Here’s the table with the current Top Ten SPC-1 Performance results as of 4/22/2015. Click on it for a clearer picture, there’s a lot going on.
What do the results show?
Predictably, all-flash systems trump disk-based and hybrid systems for performance and can offer very nice $/SPC-1 IOPS numbers. That is the major allure of flash – high performance density.
Some takeaways from the comparison:
- Once adjusted for 1ms latency and list price, the results shift dramatically, what was once awesome suddenly is no more.
- The other vendors used RAID10 – NetApp used RAID-DP (similar to RAID6 in protection). What would happen to their results if they switched to RAID6 to provide a similar level of protection and efficiency?
- Some vendors try to fit a lot of the benchmark in RAM. I show that calculation as “Working Set Size as a % of RAM”. You want that number to be comfortably bigger than 100%. 100% and under means there’s a high likelihood much of the I/O was cached in RAM. This is important – and possibly explains why some vendors used such a small capacity (indeed, on the verge of legality within the SPC-1 rules). FYI, the “hot” data in SPC-1 is about 6.75% of the overall capacity used.
- Aside from the NetApp FAS result, the rest of the Top Ten Performance submissions offer vastly lower Application Utilization – about half! Which means that NetApp is able to use 2x the capacity vs raw compared to the other submissions. And that’s before starting to count the possible storage efficiencies we can turn on like dedupe and compression.
- No competitor system offers the sheer functionality the FAS8080 does – not even close.
- Certain competitors have very questionable viability and/or tiny market penetration, making them a risky proposition for a high end system purchase.
Overall – the all-flash FAS8080EX gets a pretty amazing performance and efficiency result, especially given the sheer amount of functionality it offers.
How does one pick a flash array?
It depends. What are you trying to do? Solve a tactical problem? Just need a lot of extra speed and far lower latency for some workloads? No need for the array to have a ton of functionality? A lot of the data management happens in the application? Need something cost-effective, simple yet reliable? Then an all-flash system like the NetApp EF560 is a solid answer, and it can still be front-ended by a Clustered ONTAP system to provide more functionality if the need arises in the future (we are firm believers in hardware reuse and investment protection – you see, some companies talk about Software Defined Storage, we do Software Defined Storage).
On the other hand, if you would prefer an Enterprise architecture that can serve as the cornerstone of your datacenter for almost any workload and protocol, offers rich data management functionality and tight application integration, insane scalability and offers the most features (reliably) compared to any other platform – then the FAS line running Clustered Data ONTAP is the only possible answer.