It’s that time of the year again. The usual websites are busy with news of the upcoming EMC midrange refresh called VNX. And records being broken.
(NEWSFLASH: Watching the webcast now, the record they kept saying they would break ended up being some guy jumping over a bunch of EMC arrays with a motorcycle – and here I was hoping to see some kind of performance record…)
I’m not usually one to rain on anyone’s parade, but I keep seeing the “unified” word a lot, but based on what I’m seeing, it’s all more of the same, albeit with newer CPUs, a different faceplate, and (join the club) SAS. I’m sure the new systems will be faster courtesy of faster CPUs, more RAM and SAS. But are they offering something materially closer to a unified architecture?
Note that I’m not attacking anything in the EMC announcement, merely the continued “unified” claim. I’m sure the new Data Domain, Isilon and Vmax systems are great.
So here are some questions to ask EMC regarding VNX – I’ll keep this as a list instead of a more verbose entry to keep things easy for the ADD-afflicted and allow easier copy-paste into emails
- Let’s say I have a 100TB VNX system. Let’s say I allocate all 100TB to NAS. Then let’s say that all the 100TB is really chewed up in the beginning but after a year my real data requirements are more like 70TB. Can I take that 30TB I’m not using any more and instantly use it for FC? Since it’s “unified” and all? Without breaking best practices for LUN allocation to Celerra? Or is it forever tied to the NAS part and I have to buy all new storage if I don’t want to destroy what’s there and start from scratch?
- Is the VNX (or even the NS before it) 3rd-party verified as an over 5-nines system? (I believe the CX is but is the CX/NS combo?)
- How is the architecture of these boxes any different than before? It looks like you still have 2 CX SPs, then some NAS gateways. Seems like very much the same overall architecture and there’s (still) nothing unified about it. I call for some truth in advertising! Only the little VNXe seems materially different (not in the software but in the amount of blades it takes to run it all).
- Are the new systems licenced by capacity?
- Can the new systems use more than the 2TB of FAST Cache?
- On the subject of cache, what is the best practice regarding the minimum number of SSDs to use for cache? Is it 8? How many shelves/buses should they be distributed on?
- What is the best practice regarding cache oversubscription and how is this sized?
- Since the FAST Cache can also cache writes, what are the ramifications if the cache fails? How many customers have had this happen? After all, we are talking about SSDs, and even mirrored SSDs are much less reliable than mirrored RAM.
- What’s the granularity for using RecoverPoint to replicate the NAS piece? Seems like it needs to replicate everything NAS as one chunk as a large consistency group, with Celerra Replicator needed for more granular replication.
- What’s the granularity for recovering NAS with RecoverPoint? Seems like you can’t do things by file or by volume even. The entire data mover may need to be recovered in one go, regardless of the volumes within.
- When using RecoverPoint, does one need to not use storage pools for certain operations? And what does that mean regarding the complexity of implementation?
- Speaking of storage pools, when are they recommended, when not, and why? And what does that mean about the complexity of administration?
- What functionality does one lose if one does not use pools?
- Can one prioritize FAST Cache in pool LUNs or is cache simply on or off for the entire pool?
- Can I do a data-in-place upgrade from CX3 or CX4 or is this a forklift upgrade?
- Why is FASTv2 not recommended for Exchange 2010 and various other DBs?
- If Autotiering is not really applicable to many workloads, what is it really good for?
- What is the percentage of flash needed to properly do autotiering on VNX? (it’s only 3% on VMAX since it uses a 7MB page, but VNX uses a 1GB page, which is far more inefficient). Why is FAST still at the grossly inefficient 1GB chunk?
- Can FAST on the VNX exclude certain time periods that can confuse the algorithms, like when backups occur?
- Is file-level FAST still a separate system?
- Why does the low-end VNXe not offer FC?
- Can I upgrade from VNXe to VNX?
- Does the VNXe offer FAST?
- Can a 1GB chunk span RAID groups or is performance limited to 1 RAID group’s worth of drives?
- Why are functions like block, NAS and replication still in separate hardware and software?
- Why are there still 2 kinds of snapshotting systems?
- Are the block snaps finally without a huge write performance impact? How about the NAS snaps?
- Are the snaps finally able to be retained for years if needed?
- Why are there 4 kinds of replication? (Mirrorview, Celerra Replicator, Recoverpoint, SAN copy)
- Why are there still all these OSes to patch? (Win XP in the SPs, Linux on the Control Station and RecoverPoint, DART on the NAS blades, maybe more if they can run Rainfinity and Atmos on the blades as well)
- Why still no dedupe for FC and iSCSI?
- Why no dedupe for memory and cache?
- Why not sub-file dedupe?
- Why is Celerra still limited to 256TB per data mover?
- Is Celerra still limited to 16TB per volume? Or is yet another, completely separate system (Isilon) needed to do that?
- Is Celerra still limited to not being able to share a volume between data movers? Or is, again, Isilon needed to do that?
- Can Celerra non-disruptively move CIFS and NFS volumes between data movers?
- Why can there not be a single FCoE link to transfer all the protocols if the boxes are “unified”?
- Have the thin provisioning performance overheads been fixed?
- Have the pool performance bottlenecks been fixed? Or is it still recommended to use normal RAID LUNs for highest performance?
- Can one actually stripe/restripe within a FLARE pool now? When adding storage? With thin provisioning?
- What is the best practice for expanding, say, a 50 drive pool? How many drives do I have to expand by? Why?
- Does one still need to do a migration to use thin provisioning?
- Does one need to do yet another migration to “re-thin” a LUN once it gets temporarily chunky?
- Have the RAID5 and RAID6 write inefficiencies been fixed? And how?
- Will the benchmarks for the new systems use RAID6 or will they, again, show RAID10? After all, most customers don’t deploy RAID10 for everything, and RAID5 is thousands of times less reliable than RAID6. How about some SPC-1 benchmarks?
- Why is EMC still not fessing up to using a filesystem for their new pools? Maybe because they keep saying doing so is not a “real” SAN, even in recent communication?
- Since EMC is using a filesystem in order to get functionality in the CX SPs like pools, thin provisioning, compression and auto-tiering (and probably dedupe in the future), how are they keeping fragmentation under control? (how the tables have turned!)
What I notice is a lack of thought leadership when it comes to technology innovation – EMC is still playing catch-up with other vendors in many important architectural areas, and keeps buying companies left and right to plug portfolio holes. All vendors play catch-up to some extent, the trick is finding the one playing catch-up in the fewest areas and leading in the most, with the fewest compromises.
Some areas of NetApp leadership to answer a question in the comments:
- First Unified architecture (since 2002)
- First with RAID that has the space efficiency of RAID5, the performance of RAID10 and the reliability of RAID6
- First with block-level deduplication for all protocols
- FIrst with zero-impact snapshots
- First with Megacaches (up to 16TB cache per system possible)
- First with VMware integration including VM clones
- First with space- and time-efficient, integrated replication for all protocols
- First with snapshot-based archive storage (being able to store different versions of your data for years on nearline storage)
- First with Unified Connect and FCoE – single cable capability for all protocols (FC, iSCSI, NFS, CIFS)
However, EMC is strong when it comes to marketing, messaging and – wait for it – the management part. Since it’s amazingly difficult to integrate all the technologies EMC has acquired over the years (heck, it’s taking NetApp forever to properly integrate Spinnaker and that’s just one other architecture), EMC is focusing instead on the management of the various bits (the current approach being Unisphere, tying together a subset of EMC’s acquisitions).
So, Unified Storage in EMC-speak really means unified management. Which would be fine if they were upfront about it. Somehow, “our new arrays with unified management but not unified architecture” doesn’t quite roll off the tongue as easily as “unified storage”.
Mike Riley eloquently explains whether it’s easier to fix an architecture or fix management here. Ultimately, unified management can’t tackle all the underlying problems and limitations, but it does allow for some very nice demos.
A cool GUI with frankenstorage behind it is likeÂ putting lipstick on a pig, or putting a nice shell on top of a car cobbled together from disparate bits. The underlying build is masked superficially, until it’s not… usually, at the worst possible time.
Sure, ultimately, management is what the end user interfaces with. Many people won’t really care about what goes on inside, nor have the time or inclination to learn. I merely invite them to start thinking more about the inner bits, because when things get tricky is also when something like a portal GUI meshing 4-5 different products together also stops working as expected, and that’s also when you start bouncing between 3-4 completely different support teams all trying to figure out which of the underlying products is causing the problem.
Always think in terms of what happens if something goes wrong with a certain subsystem and always assume things will break – only then can you have proper procedures and be prepared for the worst.
And always remember that the more complex a machine, the more difficult it can be to troubleshoot and fix when it does break (and it will break – everything does). There’s no substitute for clean and simple engineering.
Of course, Rube Goldberg-esque machines can be entertaining… if entertainment is what you’re after