Lately, when competing with VNX, I see EMC using several points to prove they’re superior (or at least not deficient).
I’d already written this article a while back, and today I want to explore a few aspects in more depth since my BS pain threshold is getting pretty low. The topics discussed:
- VNX space efficiency
- LUNs can be served by either controller for “load balancing”
- Claims that autotiering helps most workloads
- Claims that storage pools are easier
- Thin provisioning performance (this one’s interesting)
- The new VNX snapshots
References to actual EMC documentation will be used. Otherwise I’d also be no better than a marketing droid.
VNX space efficiency
EMC likes claiming they don’t suffer from the 10% space “tax” NetApp has. Yet, linked here are the best practices showing that in an autotiered pool, at least 10% free space should be available per tier in order for autotiering to be able to do its thing (makes sense).
Then there’s also a 3GB minimum overhead per LUN, plus metadata overhead, calculated with a formula in the linked article. Plus possibly more metadata overhead if they manage to put dedupe in the code.
My point is: There’s no free lunch. If you want certain pool-related features, there is a price to pay. Otherwise, keep using the old-fashioned RAID groups that don’t offer any of the new features but at least offer predictable performance and capacity utilization.
LUNs can be served by either controller for “load balancing”
This is a fun one. The claim is that LUN ownership can be instantly switched over from one VNX controller to another in order to load balance their utilization. Well – as always, it depends. It’s also important to note that VNX as of this writing does not do any sort of automatic load balancing of LUN ownership based on load.
- If it’s using old-fashioned RAID LUNs: Transferring LUN ownership is indeed doable with no issues. It’s been like that forever.
- If the LUN is in a pool – different story. There’s no quick way to shift LUN ownership to another controller without significant performance loss.
There’s copious information here. Long story short: You don’t change LUN ownership with pools, but rather need to do a migration of the LUN contents to the other controller (to another LUN, you can’t just move the LUN as-is – this also creates issues), otherwise there will be a performance tax to pay.
Claims that autotiering helps most workloads
Not so FAST. EMC’s own best practice guides are rife with caveats and cautions regarding autotiering. Yet this feature is used as a gigantic differentiator at every sales campaign.
For example, in the very thorough “EMC Scaling performance for Oracle Virtual Machine“, the following graph is shown on page 35:
The arrows were added by me. Notice that most of the performance benefit is provided once cache is appropriately sized. Adding an extra 5 SSDs for VNX tiering provides almost no extra benefit for this database workload.
One wonders how fast it would go if an extra 4 SSDs were added for even more cache instead of going to the tier… 🙂
Perchance the all-cache line with 8 SSDs would be faster than 4 cache SSDs and 5 tier SSDs, but that would make for some pretty poor autotiering marketing.
Claims that storage pools are easier
The typical VNX pitch to a customer is: Use a single, easy, happy, autotiered pool. Despite what marketing slicks show, unfortunately, complexity is not really reduced with VNX pools – simply because single pools are not recommended for all workloads. Consider this typical VNX deployment scenario, modeled after best practice documents:
- RecoverPoint journal LUNs in a separate RAID10 RAID group
- SQL log LUNs in a separate RAID10 RAID group
- Exchange 2010 log LUNs in a separate RAID10 RAID group
- Exchange 2010 data LUNs can be in a pool as long as it has a homogeneous disk type, otherwise use multiple RAID groups
- SQL data can be in an autotiered pool
- VMs might have to go in a separate pool or maybe share the SQL pool
- VDI linked clone repository would probably use SSDs in a separate RAID10 RAID group
OK, great. I understand that all the I/O separation above can be beneficial. However, the selling points of pooling and autotiering are that they should reduce complexity, reduce overall cost, improve performance and improve space efficiency. Clearly, that’s not the case at all in real life. What is the reason all the above can’t be in a single pool, maybe two, and have some sort of array QoS to ensure prioritization?
And what happens to your space efficiency if you over-allocate disks to the old-fashioned RAID groups above? How do you get the space back?
What if you under-allocated? How easy would it be to add a bit more space or performance? (not 2-3x – let’s say you need just 20% more). Can you expand an old-fashioned VNX RAID group by a couple of disks?
And what’s the overall space efficiency now that this kind of elaborate split is necessary? Hmm… 😉
Thin provisioning performance
This is just great.
VNX thin provisioning performs very poorly relative to thick and even more poorly relative to standard RAID groups. The performance issue makes complete sense due to how space is allocated when writing thin on a VNX, with 8KB blocks assigned as space is being used. A nice explanation of how pool space is allocated is here. A VNX writes to pools using 1GB slices. Thick LUNs pre-allocate as many 1GB slices as necessary, which keeps performance acceptable. Thin LUNs obviously don’t pre-allocate space and currently have no way to optimize writes or reads – the result is fragmentation, in addition to the higher CPU, disk and memory overhead to maintain thin LUNs 🙂
From the Exchange 2010 design document again, page 23:
Again, I added the arrows to point out a couple of important things:
- Thin provisioning is not recommended for high performance workloads on VNX
- Indeed, it’s so slow that you should run your thin pools with RAID10!!!
But wait – thin provisioning is supposed to help me save space, and now I have to run it with RAID10, which chews up more space?
Kind of an oxymoron.
And what if the customer wants the superior reliability of RAID6 for the whole pool? How fast is thin provisioning then?
Oh, and the VNX has no way to fix the fragmentation that’s rampant in its thin LUNs. Short of a migration to another LUN (kind of a theme it seems).
The new VNX snapshots
The VNX has a way to somewhat lower the traditionally extreme impact of FLARE snapshots by switching from COFW (Copy On First Write) to ROFW (Redirect On First Write).
The new VNX snapshots need a pool, and need thin LUNs. It makes sense from an engineering standpoint, but…
Those are exactly the 2 VNX features that lower performance.
There are many other issues with the new VNX snapshots, but that’s a story for another day. It’s no wonder EMC pushes RecoverPoint far more than their snaps…
There’s marketing, and then there’s engineering reality.
Since the VNX is able to run both pools and old-fashioned RAID groups, marketing wisely chooses to not be very specific about what works with what.
The reality though is that all the advanced features only work with pools. But those come with significant caveats.
If you’re looking at a VNX – at least make sure you figure out whether the marketed features will be usable for your workload. Ask for a full LUN layout.
And we didn’t even talk about having uniform RAID6 protection in pools, which is yet another story for another day.