The EF line has been enjoying great success for some time now with huge installations in some of the biggest companies in the world with the highest profile applications (as in, things most of us use daily).
The EF560 is the latest all-flash variant of the E-Series family, optimized for very low latency and high performance workloads while ensuring high reliability, cost effectiveness and simplicity.
The EF560 runs SANtricity – a lean, heavily optimized storage OS with an impressively short path length (the overhead imposed by the storage OS itself to all data going through the system). In the case of the EF it’s around 30 microseconds. Most other storage arrays have a much longer path length as a result of more features and/or coding inefficiencies.
Keeping the path length this impressively short is one of the reasons the EF does away with fashionable All-Flash features like compression and deduplication – make no mistake, no array that performs those functions is able to sustain that impressively short a path length. There’s just too much in the way. If you really want data reduction and an incredible number of features, we offer that in the FAS line – but the path length naturally isn’t as short as the EF560’s.
A result of the short path length is impressively low latency while maintaining high IOPS with a very reasonable configuration, as you will see further in the article.
Some other EF560 features:
- No write cliff due to SSD aging or fullness
- No performance impact due to SSD garbage collection
- Up to 120x 1.6TB SSDs per system (135TB usable with DDP protection, even more with RAID5/6)
- High throughput – 12GB/s reads, 8GB/s writes per system (many people forget that DB workloads need not just low latency and high IOPS but also high throughput for certain operations).
- All software is included in the system price, apart from encryption
- The system can do snaps and replication, including fully synchronous replication
- Consistency Group support
- Several application plug-ins
- There are no NAS capabilities but instead there is a plethora of block connectivity options: FC, iSCSI, SAS, InfiniBand
- The usual suspects of RAID types – 5, 10, 6 plus…
- DDP – Dynamic Disk Pools, a type of declustered RAID6 implementation that performs RAID at the sub-disk level – very handy for large pools, rapid disk rebuilds with minimal performance impact and overall increased flexibility (for instance, you could add a single disk to the system instead of entire RAID groups’ worth)
- T10-PI to help protect against insidious data corruption that might bypass RAID and normal checksums, and provide end-to-end protection, from the application all the way to the storage device
- Can also be part of a Clustered Data ONTAP system using the FlexArray license on FAS.
The point of All-Flash Arrays
Going back to the short path length and low latency discussion…
Flash has been a disruptive technology because, if used properly, it allows an unprecedented performance density, at increasingly reasonable costs.
The users of All-Flash Arrays typically fall in two camps:
- Users that want lots of features, data reduction algorithms, good but not deterministic performance and not crazy low latencies – 1-2ms is considered sufficient for this use case (with the occasional latency spike), as it is better than hybrid arrays and way better than all-disk systems.
- Users that need the absolute lowest possible latency (starting in the microseconds – and definitely less than 1ms worst-case) while maintaining uncompromising reliability for their applications, and are willing to give up certain features to get that kind of performance. The performance for this type of user needs to be deterministic, without weird latency spikes, ever.
The low latency camp typically uses certain applications that need low latency to make more money. Every millisecond and, in some cases, microsecond, counts, while failures would typically mean significant revenue loss (to the point of making the cost of the storage seem like pocket change).
Some of you may be reading this and be thinking “so what, 1ms to 2ms is a tiny difference, it’s all awesome”. Well – at that level of the game, 2ms is twice the latency of 1ms, and it is a very big deal indeed. For the people that need low latency, a 1ms latency array is half the speed of a 500 microsecond array, even if both do the same IOPS.
You may also be thinking “SSDs that fit in a server’s PCI slot have low latency, right?”
The answer is yes, but what’s missing is the reliability a full-fledged array brings. If the server dies, access is lost. If the card dies, all is lost.
So, when looking for an All-Flash Array, think about what type of flash user you are. What your business actually needs. That will help shape your decisions.
All-Flash Array background operations can affect latency
The more complex All-Flash Arrays have additional capabilities compared to the ultra-low-latency gang, but also have a higher likelihood of producing relatively uneven latency under heavy load while full, and even latency spikes (besides their naturally higher latency due to the longer path length).
For instance, things like cleanup operations, various kinds of background processing that kicks off at different times, and different ways of dealing with I/O depending on how full the array is, can all cause undesirable latency spikes and overall uneven latency. It’s normal for such architectures, but may be unacceptable for certain applications.
Notably, the EF560 doesn’t suffer from such issues. We have been beating competitors in difficult performance situations with the slower predecessors of the EF560, and we will keep doing it with the new system
Enough already, show me the numbers!
Important note: SPC-1 is a block-based benchmark with its own I/O blend and, as such, the results from any vendor’s SPC-1 Result should not be compared to marketing IOPS numbers of all reads or metadata-heavy NAS benchmarks like SPEC SFS (which are far easier on systems than the 60% write blend and hotspots of the SPC-1 workload). Indeed, the tested configuration could perform way more “marketing” IOPS – but that’s decidedly not the point of this benchmark.
The EF560 SPC-1 Result links if you want the detail are here (summary) and here (full disclosure). In addition, here’s the link to the “Top 10 by Price-Performance” systems page so you can compare to other submissions (unfortunately, SPC-1 results are normally just alphabetically listed, making it time-consuming to compare systems unless you’re looking at the already sorted Top 10 lists).
The things to look for in SPC-1 submissions
Typically you’re looking for the following things to make sense of an SPC-1 submission:
- Latency vs IOPS – many submissions will show high IOPS at huge latency, which would be rather useless for the low-latency crowd
- Sustainability – was performance even or are there constant huge spikes?
- RAID level – most submissions use RAID10 for speed, what would happen with RAID6?
- Application Utilization. This one is important yet glossed over. It signifies how much capacity the benchmark consumed vs the overall raw capacity of the system, before RAID, spares etc.
- Price – discounted or list?
Let’s go over these one by one.
Latency vs IOPS
Our average latency was 0.93ms at 245,011.76 SPC-1 IOPS, and extremely flat during the test:
The SPC-1 rules state the minimum runtime should be 8 hours. There was no significant variation in performance during the test:
RAID-10 was used for all testing, with T10-PI Data Assurance enabled (which has a performance penalty but the applications these systems are used for typically need paranoid data integrity). This system would perform slower with RAID5 or RAID6. But for applications where the absolute lowest latency is important, RAID10 is a safe bet, especially with systems that are not write-optimized for RAID6 writes like Data ONTAP is. Not to fret though – the price/performance remained stellar as you will see.
Our Application Utilization was a very high 46.90% – among the highest of any submission with RAID10 (and among the highest overall, only Data ONTAP submissions can go higher due to RAID-DP).
We did almost completely fill up the resulting RAID10 space, to show that the system’s performance is unaffected when very full. However, Application Utilization is the only metric that really shows how much of the total possible raw capacity the benchmark actually used and signifies how space-efficient the storage was.
Otherwise, someone could do quadruple mirroring of 100TB, fill up the resulting 25TB to 100%, and call that 100% efficient… when in fact it only consumed 25%
It is important to note there was no compression or deduplication enabled by any vendor since it is not allowed by the current version of the benchmark.
Compared to other vendors
I wanted to show a comparison between the SPC-1 Top Ten Price-Performance results both in absolute terms and also normalized around 500 microsecond latency to illustrate the fact that very low latency with great performance is still possible at a compelling price point with this solution.
Here are the Top Ten Price-Performance systems as of Jan 27, 2015, with SPC-1 Results links if you want to look at things in detail:
- Kaminario K2-D
- NetApp EF560
- Huawei OceanStor Dorado 2100 G2
- HP 3PAR STORESERV 7400
- DELL STORAGE SC4020
- FUJITSU ETERNUS DX200 S3
- Kaminario K2 (28 nodes)
- Huawei OCEANSTOR Dorado 5100
- Huawei OCEANSTOR Dorado 2100
- FUJITSU ETERNUS DX100 S3
I will show columns that explain the results of each vendor around 500 microseconds, plus how changing the latency target affects SPC-1 IOPS and also how it affects $/SPC1-IOPS.
The way you determine that lower latency point is by looking at the graph that shows latency vs SPC-1 IOPS and finding the load point closest to 500 microseconds. Let’s pick Kaminario’s K2 so you learn what to look for:
Notice how the SPC-1 IOPS around half a millisecond is about 10x slower than the performance around 3ms latency. The system picks up after that very rapidly, but if your requirements are for latency to not exceed 500 microseconds, you will be better off spending your money elsewhere (indeed, a very high profile client asked us for 400 microsecond max response from the older-gen EF systems for their Oracle DBs – this is actually very realistic for many market segments).
Here’s the table with all this analysis done for you. BTW, the adjusted $/SPC-1 IOPS is simply calculated by dividing system price by the adjusted SPC-1 IOPS at the 500 microsecond point.
What do the results show?
As submitted, the EF560 is #2 in the Price-Performance ranking, behind an all-DRAM array. Interestingly, once adjusted for latency around 500 microseconds, the price/performance of the EF560 is far ahead of anything else on the chart (plus, DRAM arrays are severely limited when it comes to capacity scalability).
Note that some vendors have discounted pricing and some not, always check the SPC-1 report for the prices (for example, Fujitsu has 30% discounts showing in the reports, Dell 48%, HP 45%). Our price-performance is even better than shown in the chart once you adjust for discounts.
Another interesting observation is the effects of longer path length on some platforms – for instance, Dell’s lowest reported latency is 0.72ms at a mere 10,599.32 SPC-1 IOPS. Clearly, that is not a system geared towards high performance at very low latency.
The LRT (Least Response Time) we submitted for the EF560 was a tiny 0.18ms (180 microseconds) at 24,501.04 SPC-1 IOPS. This is the lowest LRT anyone has ever posted on any array for the SPC-1 benchmark.
Clearly we are doing something right
If your storage needs require very low latency coupled with very high reliability, the EF560 would be an ideal candidate. In addition, the footprint of the system is extremely compact, the SPC-1 results shown are with just a 2U EF560 and 24x 400GB SSDs.