What exactly is Unified Storage and who can sell it to you?

It’s come to my attention that pretty much every storage manufacturer is trying to imitate NetApp’s thought leadership and keeps announcing “Unified Storage” products. Everyone can do it now, it seems 🙂

Now, this post is not going to be bashing them or claiming they don’t work.

This post is about arguing what “Unified Storage” really means. And, more importantly, whether you should care about the differences.

Now, NetApp has been shipping Unified Storage for 8+ years now, and has shipped 150,000 Unified Storage systems to date. See here and here. So, I’d think nobody can argue that NetApp has quite a bit of experience in the technology and, indeed, were the very first to do it. Depending on your definition of “Unified”, NetApp may still be the only one doing it, but read on.

The crazy success of NetApp’s Unified Storage (just look at the company’s growth) has forced the other vendors, who initially dismissed the concept, to take a harder look – imagine that, customers actually like the idea of a Unified Storage System!

Here’s how most (if not all) other vendors approach “Unified Storage”:

  • Start with your legacy Fiber Channel Array, use that to serve FC and maybe iSCSI. It’s probably a decent box, no reason to re-invent the wheel.
  • Connect some kind of Windows, Linux or UNIX server(s) to it that will then serve CIFS and NFS and maybe iSCSI (this is the NAS part)
  • Replicate them using different mechanisms for the FC and NAS parts

Pretty simple, really. You end up with the base legacy array, plus more boxes on top (ideally 2+ to ensure redundancy, plus some of them need an extra box or two called a “Control Station” in one implementation).

It all works – after all, it’s just like putting servers in front of your storage, you’re doing that anyway. You are able to serve FC, iSCSI, NFS and CIFS out of the same rack. If we assume that the rack is the termination point for the cables and that you don’t care much about exactly what happens within. So, most C-level execs are OK with it – the rack can serve out all those protocols, ergo the “Unified Storage” claim seems justified.

Here are some potentially business-impacting issues with this approach:

  1. Aside from a couple of exceptions, the add-on boxes used by the storage vendors to add the NAS protocols aren’t even made by that vendor (neither the OS nor the hardware). Obviously that raises some concerns with interoperability, manageability and the longevity of whatever NAS vendor was chosen. Support is now maybe not as robust since you are relying on using tech someone licensed from someone else.
  2. Replication gets complicated since you need to do it a few different ways depending on what protocol you’re replicating.
  3. Patching is more time-consuming since, apart from the legacy array, you need to also patch all the NAS paraphernalia.
  4. Management is frequently totally separate and laborious – you might have to take care of the legacy array separately from the NAS part
  5. Certain important features are only available to one part of the solution (file-level single-instancing/dedupe, for example, only available for CIFS and NFS and not for iSCSI or FC).
  6. And, finally, what I think is the biggest problem: Space allocation is split between the FC and NAS parts and you can’t reduce one to increase the other. For instance, if you started with a 50/50 split, once you’ve allocated the space to the NAS (that always has its own Volume Manager and now owns that 50% chunk of array space), and you realize you’re only using 10% of that space after all, you can’t go ahead and return the remainder of the space to the FC part. This can cause serious inefficiency, inflexibility, cost and manageability issues.

The NetApp approach

NetApp decided to do things a bit differently. Maybe by virtue of how the original systems started out, it turned out it was easier for NetApp to effectively create what is effectively a protocol engine. Maybe “Protocol Engine with Integrated Disk Control, Space Efficiency Technologies and Protection” is more appropriate than “Unified Storage” but it’s a bit wordy…

Effectively, a single NetApp box, without external hangers-on, allows you to:

  • Connect using a variety of methods – FC, 1GbE, 10GbE, FCoE
  • Use the proprietary NetApp RAID-DP protection for great performance and better protection than RAID10
  • Provision FC, iSCSI, CIFS and NFS out of the same pool of physical disk space
  • Reclaim space from FC, iSCSI, CIFS and NFS and put it back in the pool of space
  • Deduplicate FC, iSCSI, CIFS and NFS workloads
  • Perform application-aware replication regardless of protocol
  • Take application-aware snapshots regardless of protocol
  • Clone VMs, DBs and indeed, anything you like, without chewing up space and without impacting performance
  • Virtualize legacy arrays and impart on them the NetApp features
  • Perform workload and cache prioritization
  • Auto-tier hot blocks to gigantic cache to increase speeds (at a super-efficient 4K granularity)

As you can see, everything happens within one system, there’s no separate RAID controller or NAS box or replication box. And, like it or not, that’s a pretty impressive list of capabilities that a single architecture provides.

The potential business benefits with a true Unified Storage system:

  1. Single product, single OS, single architecture – you’re not relying on the marriage of completely different boxes.
  2. Better reliability, less things to break.
  3. Better support – no finger-pointing, it’s a single system from a single company.
  4. Consistent replication – one way to replicate things, yet still application-aware for 100% recoverability, improved CapEx and OpEx.
  5. Management simplicity – lower OpEx.
  6. All performance-enhancing and efficiency features are available to all protocols – Improved CapEx.
  7. There’s no dichotomy between FC, iSCSI and NAS space – allocations are fluid –  Improved CapEx and OpEx.
  8. Protect your existing investment by virtualizing existing legacy disk arrays – improved CapEx and OpEx.
  9. Overall lower OpEx and CapEx – in addition to the significant space-saving features (avoid purchasing as much storage long-term), there’s significant cost avoidance since you potentially don’t need to purchase: Backup software, deduplication appliances, replication appliances, fileservers, OS licenses.

So, should you care how “Unified Storage” is architected?

Beyond the philosophical debate (one box vs multiple), given what you read, what do you think? I believe that the multi-box approach has some inherent drawbacks that are difficult to overcome. Comments welcome as always.

D

18 Replies to “What exactly is Unified Storage and who can sell it to you?”

  1. Maybe to prove your point, what strikes me is that the industry at large still refers to storage systems as either NAS or SAN devices.

    What I also find surprising is that there are still arguments being had about whether it is a good idea for SAN storage devices to take the additional load of supporting file services (i.e. NAS) – which is really just fulfilling
    the the original purpose of SAN’s in the first instance – that of consolidating storage.

  2. Hmm. Interesting. I was wondering where did application aware storage started and who actually innovated it? Did NetApp picked it up from some other storage vendor like everybody else picks up whats in demand? If yes, why NetApp should be crowned with thought leadership for “Unified Storage”?

  3. Shahid, unified and application-aware are 2 different things…

    Our replication, clones and snapshots are app-aware.

    What’s your point?

  4. good blog.
    I work as a Storage Support Engineer for a well known, established Storage Vendor that has introduced his own vision of Unified Storage by converging different specialized FC only and NAS only arrays with each other and Yes, like described in your blog, supporting those arrays is a nightmare and there is a huge management overhead and customers for example accidentally change LUN sizes of the underlying FC LUNs, unaware of the consequences and then the NAS head is missing the Meta data information for that LUN and looses his Volume, etc. We had to engage 3rd level support personal from the vendor to solve the problem. Things you most likely will not encounter on a NetApp array.

  5. So – why doesn’t NetApp kill the V-Series if its that bad? I mean if i’m looking at V-Series now, it sounds like it’s a mess and you should run away from it for all the reasons listed above. Also, your application aware stuff, isn’t it OEM’d from Kroll-Ontrack?

    Better yet – take the V-Series and all the kool-aide you can produce and put it in front of a Xiotech Storage Blade that can make it SCREAM performance!!! Oh ya, with a nice 5 year FREE Hardware Warranty ontop of it all !!

    @StorageTexan

    http://storagetexan.com/2010/04/26/the-time-for-storage-blades/

  6. Thanks StorageTexan 🙂

    Well, we typically certify market-leading arrays behind the V-Series, so sorry, Xiotech is unfortunately not certified. Not that your boxes are bad, it’s more like we have received maybe 1-2 requests for supporting them, so it’s just not worth the effort. If you manage to capture more of the market, we’ll look at this again. Basic supply/demand stuff.

    V-Series is just a standard NetApp controller that lets you attach 3rd party stuff behind it AS WELL AS NETAPP DISK. Some of the same caveats apply with that approach, as with SVC and any other gateway solution!

    So yes, once the V-Series controls the disk you can’t easily give it back, you’re not supposed to mess with the back-end, etc etc.

    The HUGE difference:

    The V-Series allows you to COMPLETELY pass ALL traffic through it – FC, iSCSI and NAS. That’s how you’re supposed to use it.

    The reason V-Series exists:

    Most customers opt for 100% NetApp disk (RAID-DP is that cool), but some have enough legacy disk that they could get serious savings if they could enhance it BEYOND JUST ADDING NAS.

    A huge telco has over 10PB of a rather famous array vendor behind V-Series. I’ve checked some of their stats and several of their boxes are getting 3x space efficiency. Translation: totally worth it 🙂

    They’re passing all protocols through V-Series. Indeed, that 10+PB is mostly accessed via FC.

    I hope you see the difference between that and a gateway that only does NAS.

    Regarding the app-aware stuff: no, aside from a sub-function of one of the integrations, it’s not OEM’d from Kroll. The Kroll piece just extracts some data after the application-aware NetApp piece has done it’s job.

    I should maybe do a demo for you… 🙂

    There are also plenty on youtube, go check them out.

    D

  7. Your point #6 (under business impact) is not valid in Compellent’s unified storage approach. Unless you know something I don’t the zNAS filer does look like another initiator to Storage Center and thus volumes will be thin provisioned.

    With all due respect, I don’t see a problem with the “multi-box approach” as you call it. Why force an architecture, which was specifically designed to do one thing VERY well (virtualize 512-byte block storage) to perform something done better by another platform.

    I don’t mean to disparage the NTAP approach, because I have managed FAS systems in my career for both block and file and found them adequate to the task. Not superior, but adequate. 🙂

    I will say that the question you pose at the end is very important – buyers SHOULD consider the underlying architecture before selecting a Unified Storage solution.

  8. Hey John,

    So let’s go over this step by step, and let’s see if I’m wrong.

    1. You thin-provision a zpool with multiple virtual disks from your box
    2. It starts chewing up space at the back end, let’s say 10TB of REAL space has been chewed up because you’ve really written 10TB
    3. Your space requirements after 6 months are now 1TB
    4. How do you reclaim the freed ZFS blocks? Because now you need to provision 9TB for Exchange 2010 and you kinda need the space back.

    And, of course you don’t see a problem with the multi-box approach, since that’s what you’re selling 🙂

    Thx

    D

  9. Sweet – thanks for pointing out you only certify market-leading arrays behind the V – i’ll bookmark this for future use.

    You need to see a demo of our Emprise 5000 Storage Blade – trust me – your xyratex drive bays are the same ones everyone else uses – nothing special, just make sure you keep your useable capacity something around 70% of total capacity or performance problems start 🙂 Or better yet – throw in some SSD to solve that performance problem !!

    LOVE IT.

    @StorageTexan
    http://storagetexan.com/2010/01/11/performance-starved-applications/

  10. You want to reclaim deleted space? That’s not what your question led me to believe – I thought you were talking about PRE-allocated space (as in, say, snapshot reserves or something?). Sure, I can get your space back easily.

    Not even going to address the last comment – that’s a sword with two edges my friend! 🙂

  11. @storagetexan: It’s not as if you’ve discovered some major secret, if there’s demand for integration, we’ll do it. That’s all. There has been almost zero demand for Xiotech.

    And who cares about the disk shelves? That’s not where the intelligence is.

    Please, try to not let the trees prevent you from seeing the forest.

    @JohnDias: I thought my #6 point was self-explanatory. Going back to the example: You have a Solaris server with ZFS, and you asked for a (thin-provisioned) 10TB. If you end up using that space for real, and later delete some files (BUT NOT DELETE THE ZPOOL OR ITS DISKS), can you still reclaim the freed space? And how?

    I’m not saying you can’t, I’m asking if yes, and how (obviously without divulging your innermost secrets).

    D

  12. Hmm – probably need to double check your requests – but I’ll digress – and we agree on something – no intelligence in your disk shelves!! We are making progress now.

    What I think you find difficult to understand is that at the end of the day – I think the V is a pretty cool solution especially when married with Intelligent Disk subsystems like our Emprise 5000 Storage Blades. We can run our blades at 97% full with ZERO performance problems – that’s goodness for the V – your performance problem has NOTHING to do with WAFL – it has everything to do with the “dumb disk shelves” and that once a drive gets above about 70 to 80% full – performance problems start happening. You can’t argue with that – an empty drive is faster than a full drive – and the difference in performance as a drive fills up is STAGERING!! That why the industry recommends not to fill up a drive more than about 70% to 80% full. On 100TB’s – that 20TB’s of capacity not utilized. YUCK.

    @StorageTexan

  13. Storagetexan,

    I don’t make the rules Re Xiotech and V-Series, indeed I asked a couple of days ago if we had support. The answer always is – not enough demand. Have you tried officially approaching NetApp to get integrated?

    Agreed that filling disks up slows things down, it’s been like that forever unless you use SSD. Actually, there are certain things we can do in WAFL to alleviate this, interestingly never used by anyone showing anti-NetApp benchmarks…

    How about this: Do you have some links that outline how Xiotech don’t have the slowdown vs used space? That and the failure characteristics of ISEs are something I just haven’t seen decent material on. For instance – how does your RAID compare with RAID10,5,6 – and why? (if you’ve seen the NetApp USENIX papers full of math, you’ll know what I mean).

    D

  14. Thanks. What’s not clear to me at all (even after turning off the competitive part of my brain, felt like being lobotomized):

    1. Are volumes capable of striping multiple ISEs – and how many? (and what happens to RAID then)
    2. The exact failure scenarios
    3. What does “striping the data at the level of an individual drive head” mean?

    The posts are severely lacking in detail. There’s a lot of interesting stuff and how you use custom firmware (as do we, BTW) and how low-level you get, but what’s not clear is how failures are handled. I get it that the pack is “sealed” but how many disks are dedicated to sparing, in case the motor gives out? Fixing UREs and surface errors is only part of the answer.

    How are lost writes detected and handled?

    At what point do you need to replace a pack, and what are the ramifications? (everything fails).

    Data resiliency is paramount and if I can’t understand it, either I’m being thick, or customers will indeed have a really hard time understanding it (or both).

    The way I understand it: A volume that’s R5 has bits of it in different drives, kinda like how Compellent (and I think 3Par) do it. Sounds like you don’t make RAID on discrete drives, you RAID chunklets (I do like 3Par’s term). There also seem to be exchoes of XIV in the design.

    Am I right so far at all?

    So, no matter how you do it, is it true that if you lose 2 of said chunks on a R5 volume then it’s kaput?

    Ultimately, it’s all about the applications. The architecture is a means to an end. It’s only important inasmuch as it helps or hinders the applications, be that performance, resiliency, recoverability, etc.

    D

  15. I’m cross-posting a response I put in the Xiotech blog, they were pretty effective at derailing the conversation from Unified Storage and onto their stuff 🙂

    It is very interesting that you try to stay so close to the disk but I’m a bit sceptical about a few things:

    1. What do you do if there is a latent physical defect in your sole source of disk drives that is not recoverable with disk drive manufacturing IP? (happened with Seagate drives in the past).

    2. The spare capacity for rebuilds – this seems similar to XIV, only they just do mirroring. How did you decide on the percentage of spare space to leave for rebuilds? What if you were off?

    3. Staying with a single source manufacturer of drives (however big) has potential drawbacks. What if they have a bad batch, or what if some other vendor comes up with a dramatically better drive?

    4. I see the indisputable benefit of making the fault domains smaller (individual platters/heads) – but making claims that something like RAID 6 is not necessary because, the way you do things, failures are less likely than with normal R5 – not sure if I agree with that. R6 is an orthogonal method to what you already do to help with reliability.

    5. The ISE hasn’t been out for 5 years, right? Is the 5-year warranty and all the reliability claims based on mathematical models? What data is there to prove the overall system is more reliable?

    6. 1 billion transactions per day – OK, how many operations per second on the disk drives at the busiest period? If it’s Oracle, 1 transaction could be multiple IOPS on the drives. A day has many many seconds 🙂

    7. If all that’s needed to make disks more reliable is smarter software, why is all this intelligence not already on the Seagate drives?

    8. Indeed, drive motors seldom fail. What happens with more frequency is that bearings and lubricants develop issues.

    9. Correlated failures can happen due to external events.

    10. Last but not least – if, for whatever reason, you have truly used up your 20% of spare space in the ISE due to whatever problems, and you’re not close to the end of your warranty, what’s the mechanism for regaining the peace-of-mind the original 20% of spare space provided?

    In general I don’t much like answers like “with out technology this is highly unlikely” – a good engineering design has answers for pretty hairy worst case scenarios.

    The answers may not always be what the customer wants to hear, but at least they exist and show thought has been put in the design.

    Thx

    D

Leave a comment for posterity...