Do you need a VTL or not?

I first posted this as a comment on http://www.gotitsolutions.org but this is its rightful place.

Having deployed what was, at the time, the largest VTL in the world, and subsequently numerous other VTL and ATA Solutions, I think I can offer a somewhat different perspective:

It depends on the number of data movers you have and how much manual work you’re prepared to do. Oh, and speed.

Licensing for VTL is now capacity-based for most packages (at least the famous/infamous/important ones like CommVault, Networker and NetBackup, not respectively).

Also, I’d forget about using VTL features such as replication and using the VTL to write directly to tape (unless you’re retarded, insane or the backup software is running ON the VTL, as is the case now with EMC’s CDL). Just use the VTL like tape. I’ve been so vehement about this that even the very stubborn and opinionated Curtis Preston is now afraid to say otherwise with me in the room… (I shut him up REALLY effectively during one Veritas Vision session we were co-presenting a couple years ago. I like Curtis but he’s too far removed from the real world. Great presenter, though, and funny).

Even dedup features are suspect in my opinion, since they rely on hashes and searches of databases of hashes, which progressively get slower the more you store in them. Most companies selling dedup (data domain, avamar, to name a couple major names) are sorta cagey when you confront them with questions such as “I have 5 servers with 50 million files each, how well will this thing work?”

Answer is, it won’t, even for far fewer files. Just get some raw-based backup method that also indexes, such as Networker’s snapimage or NBU’s flashbackup.

Dedup also fails with very large files such as database files.
I can expand on any of the above comments if anyone cares.

But back on the data movers (Media Agents, Storage Nodes, Media Servers):

Whether you use VTL or ATA, you effectively need to divvy up the available space.

With ATA, you either allocate a fixed amount of space to each data mover, or use a cluster filesystem (such as Adic’s Stornext) to allow all data movers to see the same disk.

With VTL, the smallest quantum of space you can allocate to a data mover is, simply, a virtual tape. A virtual tape, just like a real tape, gets automatically alocated, as needed.

So, imagine you have a large datacenter, with maybe 40 data movers and multiple backup masters.

Imagine you have a 64TB ATA array.

You can either:

1. Split the array into 40 chunks, and have a management nightmare
2. Deploy stornext so all servers see a SINGLE 64TB filesystem (at an extra 3-4K per server, plus probably 50K more for maintenance, central servers and failover) – easy to deal with but complex to deploy and more software on your boxes)
3. Deploy VTL and be done with it.

For such a large environment, option #3 is the best choice, hands down.

With filesystems, you have to worry about space, fragmentation, mount options, filesystem creation-time tunables, runtime tunables, esoteric kernel tunings, fancy disk layouts, and so on. If you’re weird like me and thoroughly enjoy such things, then go for it. As time goes by though, the novelty factor diminishes greatly. Been there, done that, smashed some speed records on the way.

What’s needed in the larger shops, aside from performance, is scalability, ease of use and deployment, and simplicity.

With VTL, you get all of that.

The other issue with disk is that backup vendors, while they’re getting better, impose restrictions on the # streams in/out, copy to tape and so on. No such restrictions on tape.

One issue with VTL: depending on your backup software, setting up all those new virtual drives etc. can be a pain (esp. on NBU).
for a small shop (less than 2 data movers), a VTL is probably overkill.

D

7 Replies to “Do you need a VTL or not?”

  1. Well, slap me thrice and call me retarded, then. 😉

    Ah, the memories… Dmitris-vs-Curtis Vision 2005… You were making the same point then that you are making now, that anyone that would want to use any of the advanced features of VTLs is “retarded.” Specifically, you said that the worst feature of all was an integrated VTL that copies the virtual tape to physical tape.

    And now, just as then, you aren’t actually saying WHY the integrated model is “retarded.” You just say it is so. And because you say it is so, it must be so. And then, just as now, you’re saying it with such strong words that a polite panel moderator (which I was and am) might have difficulty responding without getting all riled up, which wouldn’t have served the audience at all. So I chose to ignore it and move on. I’m not sure if that qualifies as “shut[ting] him up REALLY effectively.”

    Anyway, on to the topic at hand.

    I must back up statements that I make on the stage or in writing. (Maybe it’s because I’m so far “removed from the real world,” as you say. I think that the multiple multi-national oil-and-gas, telecom, banking, and pharmaceutical companies that I’ve worked with in the past year or two might feel differently. For the record, if you think I’m an analyst who doesn’t touch backup systems any more, you misunderstand my job completely.)

    On to backing up my statements. The only point I was making, and will continue to make, is that the integrated VTL model (where the VTL can copy the virtual tape to physical tape for you) has value for some environments. Backup software duplication/cloning is my favorite way to move data from virtual tape to physical tape, but it doesn’t fit the bill for everybody. For example:

    1. Customers who use NBU but haven’t bought Vault. (I think they should buy it, but they haven’t. Good luck automating tape duplication without it.)

    2. Customers who use NW and can’t use automated cloning — and don’t want to code the equivalent of NBU Vault for NW on their own using Perl.

    3. Customers who (for various reasons) are unable to get acceptable duplication/cloning performance out of their backup software. This was a topic of discussion just last week on the Veritas-bu mailing list, where one person (a former client of mine) was being as dismissive to NBU Vault as you are being to the integrated VTL model. They were getting single-digit MB/s performance out of it. Turns out it was because their backup images had too many files. They starting using the integrated features of their VTL, and they’re happy as clams now.

    You can’t completely dismiss a feature like that just because you tried it and it didn’t work for you. I know it has limitations in some scenarios, but so does duplication/cloning via the backup software.

    As to your comments on de-duplication, I’d say you’re equating de-duplication software (e.g. Avamar, PureDisk, Asigra) with de-duplication devices (Data Domain, Diligent, FalconStor, NetApp SEPATON, Quantum, and all the vendors that sell them). They are actually very different technologies.

    You said to ask de-dupe vendors, “I have 5 servers with 50 million files each, how well will this thing work? Answer is, it won’t, even for far fewer files.”

    You also said “Dedup also fails with very large files such as database files.”

    I’d say that those are probably true of de-duplication backup software, but this software is aimed at remote offices, not backing up your 20 TB Oracle database. Don’t confuse that with that high-speed de-dupe engines available in VTLs. One is designed to go 10s or 100s of MB/s, the other is designed to go 1000s of MB/s. (Or at least that’s the performance I got out of them when I tested multiple de-dupe vendors at one of the largest banks in the world. Maybe in the “real world” that I don’t live in, they’re much slower.)

  2. Hey Curtis, glad to see you here. I won’t even slap you 🙂

    Glad to know you’re still touching real systems, I’ve heard you speak a few times and sometimes I got the impression you didn’t. Maybe it’s your enthusiasm.

    This blog is not PC and will indeed get very opinionated. However, don’t for a second pretend you’re not opinionated, you’re at least as opinionated as I am 🙂

    Will VTL export to tape work? Yes it will, I never said it won’t. However, I stand by my guns – the process is fraught with too many pitfalls to be truly safe. If there’s no other way to make it work, then by all means try it, but at least be fully aware of the caveats.

    About customers that don’t have vault: If you can’t afford vault then what the hell are you doing buying a VTL?

    As to why using the integrated copy is retarded: Happy to oblige with a full explanation (and, for the record, I did expand on it during our little session – maybe I should have expanded on it more). Anything that does stuff to physical tapes and leaves backup software unaware is retarded. Can you work around the problems? Possibly. Are you willing to wrestle with the NBU databases? I’ve done a lot of that wrestling and it gets pretty ugly. Yes, I agree that it may alleviate some issues, but maybe safer methods should be employed.

    Here’s another problem: Compression on tapes. Say you use an LTO2, and you get via NBU writing directly to it 3:1 compression (not unheard of). The backup will take, say, 2 full tapes. Vaulting it takes 2 tapes.

    Now, say you’re writing to uncompressed VTL-based LTO2 – each tape is a hard 200GB in size. The backup takes 6 virtual tapes, same amount of data.

    Vaulting it using the VTL-based export will create 6 LTO2 tapes, each of them only 1/3 utilized (since it will try to match virtual tapes to physical tapes).

    You can circumvent that by using compression on the VTL but will the algorithms be the same as the physical tape? Or am I missing something?

    Also, if you don’t use the VTL to export, you can use dissimilar tapes between virtual and physical (i.e. DLT/LTO), and more easily migrate to new tape technologies.

    If your VTL controls your tape library, what will you do if the VTL fails? At least if the 2 are separate you can use the one while repairing the other. Unless the VTL is just a Media Server/Storage node, as is the case with EMC’s latest EDL – possibly the only case that I would deem it “OK”.

    Some software (like Networker) actually has some pretty good algorithms for writing to tape, specifically when it comes to handling errors. The VTLs do not. What happens when you “eject” the virtual tape, causing a write to the real tape, and you get a media error? Yeah, that never happens. In truly busy shops and with media that are more physically sensitive it happens all the time. Then what? Unless, again, I’m missing something really obvious here.

    In large environments, most VTLs will NOT be faster than a series of decent media servers doing vaulting in parallel. Most VTLs are really just Linux boxes, they don’t have massive power, despite some hardware assist such as Quantum’s compression cards. If you have 10-20 decent Sun boxes as media servers, I’d think that your aggregate vaulting throughput would be pretty high…

    Maybe my view is tainted because I’ve only worked in huge shops, I’m sure smaller shops are perfectly happy with a PVX or whatever doing all the tape management. And mainframe shops do it all the time, right?

    If you’re backing up too many files, then use the special options in NBU, NW et al to get around the performance limitations. Were they vaulting a flashbackup backup or was it a normal backup?

    Some VTLs do have a cool feature: Replication. However, is it not better to just replicate your storage as a matter of course? Add something like a proper CDP technology like EMC’s Recoverpoint (previously Kashya) and you can replicate AND go back in time. So many options.

    Regarding Dedup: software like Avamar is pretty scalable since you just add nodes to make it go faster. Ergo, they can afford to get more granular with the chunk size (though Avamar doesn’t go below 12K I think, Data Domain is appliance-based and goes down to 4K). Dedup in the VTLs is obviously faster since they don’t go too granular. However Quantum has hardware assist, so it should be pretty quick anyway. But even Quantum told me that you probably shouldn’t keep stuff on it too long, plus they haven’t done serious long-term tests. And yes, Avamar/Evault/Asigra/Puredisk etc. are mostly geared towards remote office backup, but you can use them for whatever (I wouldn’t).

    Out of curiosity, which dedup products got GB/s and what was the config? How long did the tests run for? Is this stuff you can share with the rest of the world?

    Did the explanations satisfy you?

    D

  3. Yes, I was a bit miffed by your initial post, as it basically said I was out of touch. If I’m out of touch, then I’m a stuffed shirt and why should anyone list to me. I wanted you and your readers to know that wasn’t the case. I appreciate you allowing me to reply and not deleting my reply. I feel better now. 😉

    As to me touching real systems, I do it as much as I can. I don’t get to do it often enough, but I do. What I do more often is spend quality time with the people who DO touch real systems. A day doesn’t go by where I’m not advising someone (usually a GlassHouse consultant or customer) on how they should try and solve the problem in front of them. Then they report back to me on what worked and didn’t work, and I advise further. So, I don’t have to be the actual guy at the keyboard to stay contacted with the real world. And the “consultant to the consultants” model allows me to “touch” much more of the real world than any end user or consultant would ever touch.

    As to your reply… I’ll give some paraphrased quotes and then reply to them:

    “no vault but they have a vtl?”

    Depending on the VTL in question, Vault can be much more expensive than a VTL. And if I only had $20,000 and I had to choose between the two, I’d choose VTL any day. (I’m a big vault fan.)

    “are you willing to wrestle with the NBU databases”

    That’s just FUD, dude. If you do it right (i.e. match barcodes between physical and virtual), there are no database issues. NBU thinks it backed up to barcode 1000, and in the end that’s what happened — if you match the barcodes. Some vendors (like the one you like) allow you to NOT match the barcodes. Now THAT’s idiotic. Some vendors allow you to stack several virtual tapes onto one physical tape. Also idiotic. Anything that breaks the relationship between the backup catalog and real tape is wrong.

    “Compression”

    If this is an issue for you, then use VTL compression. Almost all of them now support hardware compression, so it’s not big deal. And yes, it’s the same exact algorithm they use in the tape drives. It’s often the SAME CHIP now. And if there’s any question as to any differences, the VTL only has to back off a little to make sure the virtual tape always fits on the physical tape, cause if that breaks, then you are screwed. This is just something else to think about, not a show stopper.

    “In large environments, VTLs will not be faster”

    “Oh yeah? Care to put that to the test?”

    Draw!

    “too many files, just use special options”

    They are a HUGE NAS shop. Flashbackup not available. But I think you get the point. Vault’s performance (and any backup software’s copy facility) is susceptible to the number of files.

    The important thing here is that the VTL’s AREN’T susceptible to this problem. They just do a bit-for-bit copy.

    “I’ve only worked in big shops”

    That’s all I work in, but I do my best to keep connected to the smaller ones as well. And it was a very big shop that switched from Vault to integrated.

    “Avamar is scalable”

    It actually scales only so far. I’m very familiar with the technology and am a big fan, but it still has limits when backing up a single host. All the nodes in the world aren’t going to change how much computation they have to do on each client to make the hashes. It’ll scale pretty far, but not as far as the high-end de-dupes vendors.

    “which de-dupe products got GB/s?”

    Now THAT’s a billable question. 😉 I’d be happy to send you a statement of work. And with that, I’m going to get back to my OWN blog (www.backupcentral.com).

    🙂

  4. Curtis, I delete nobody’s replies (at least I haven’t had to, so far, but it’s a fairly new blog).

    About touching stuff in the real world: Getting the skinny from others is great but actually personally wrestling with some really thorny issues and seeing how long they take to resolve also gives one a great perspective. For the record, my role nowadays is similar to yours as you describe it, though with a bit more hands-on (I’d actually like even more, there’s just no time).

    Would I buy a VTL for $20K? Dunno, I think I’m definitely out of touch with what smaller shops are willing to spend. I’d probably get a ton of disk. If $20K is too much maybe NBU was the wrong software to begin with?

    I KNOW that if you match physical Vs virtual barcodes you’ll be fine. My concern is, what happens when the physical tape has issues? I always operate under worst-case assumptions and it’s helped me greatly so far (though the wife always tells me I’m a pessimist).

    I don’t think there’s a way to make VTL’s “back off” when writing to tape so everything matches… I’d be curious to know which appliances do it.

    If you have an environment with 50 beefy media servers simultaneously doing I/O to high-end disk and tape, I’d think the aggregate throughput would be higher than any VTL out there doing the dump to tape (important distinction: I’m talking about the performance of copying to tape using the VTL, NOT aggregate backup or restore throughput). Most VTLs have very limited paths to tape. Again, I’m thinking about large environments, where you may have expanded STK 8500 boxes with dozens of high-end drives. I don’t think there’s any VTL that can simultaneously push them all… It’s just speeds and feeds, there is no magic.

    Billable question indeed. I should start doing that, then I won’t answer all the important questions 😉

    D

  5. Oh, yeah. I forgot to address the “what happens when the physical tape fails.” First, you need a tape type that supports removable bar codes (most 1/2″ tapes). You take out the bad tape, swap the old bar code to the new/good tape, and tell the VTL to rerun the copy. I know it sounds like a pain, but it works, and if you only have to do it every couple of months, big deal?

    As to matching the bar codes, you do it when you setup the VTL. The default way is “inventory my physical library and create virtual tapes with the same bar codes.” That’s how they match. The problem is that people set it up first w/o connecting their physical library on the back end. Then they fill up the VTL and want to know how to copy them to physical tape. If they call support, Falconstor will allow them to export to physical tapes that don’t have matching bar codes. I think that’s the equivalent to having a customer calling and saying “how do I set myself on fire?” It think Falconstor/EMC/IBM/Sun/Copan (all Falconstor VTLs) should all say. “bpduplicate. Learn it. Live it. Love it.”

    There you go generalizing again with “most VTLs have limited paths to tape.” How about saying “most VTLs I’ve used have limited paths to tape?” How about saying “all of the re-branded VTLs that have to be pre-configured so that they SKU-able for EMC/Sun/IBM/HP have limited paths to tape?” (If you buy a Falconstor VTL from anybody but EMC/Sun/IBM, you can have all the paths to tape you want.)

    You’ve also spent way too much time with non-expandable VTLs. You need to look at ones like SEPATON, NetApp, or Diligent’s former (non-de-dupe) model. Those VTLs can go as big as you need them to be all within a single footprint.

    I think your perception of the VTL industry is tainted by your working at a company that required you to use major brands. While I understand the reasoning, the VTL market is an example where they don’t add value. They actually (IMHO) subtract value by putting a really good product into a box they can understand, and by holding back features (like de-dupe) that they don’t really like. Think outside the box. Either buy your own disk and server and install the VTL software yourself, or buy the whole thing from an integrator that will build it the way you want it. Or buy it from a VTL vendor (like SEPATON) that will put it all together for you. Their only limitation is that they’re not EMC/IBM/Sun. So what, I say. It’s a freakin’ tape drive, for goodness sake. Закупите его и потревожьтесь о вашей следующей проблеме. (How’s my russian?)

  6. True, there are work-arounds for almost anything. However, I believe in automation and removal of the human factor as much as possible. If you have thousands of tapes in each silo, multiple silos and the killer combo of inept operators and mechanically sensitive media and drives (say, 9940, sorry STK) then the rate of failures for media and drives is far higher than once every few months (with us it was several times a week), and it all becomes a huge pain in the ass. Though with more reliable media, vastly smaller numbers and more careful tape-jockeys I acknowledge it may not be a huge deal.

    And sometimes the labels are an utter bitch to remove and unpeeling them screws them up. Then you have to print new ones, and so on. Hardly a no-brainer.

    Matching the barcodes is not a problem, but in shops where you ingest thousands of new tapes in a regular basis, it becomes yet another step you absolutely must not forget.

    So you’re saying I can have something like 50x 4Gbit links to tape alone (in addition to the host interfaces) from a VTL if I roll my own? I know Falconstor is working on N+1 clustering but it’s not out yet, is it?

    And yes, if you install Falconstor on the box of your choice you can add as many HBAs as the box will take, but at some point the buses crap out and you get interrupt issues, even with fairly high-end boxes. Then you add another box and cluster, but what if you need even more speed? Just buy more I guess, but then you have to set them up as separate libraries, they’re not truly a cluster any more, etc.

    I’m aware of how big most of the various VTLs can get, the end size of the VTL was never the point of our discussion (though more is better). The funny thing is that we really haven’t disagreed on anything so far yet here we are bickering 🙂

    “What we have here is a failure to communicate”

    Dude I LOVE VTL in general, I’m utterly sold on it and have been for years. I don’t question performance, utility or scalability. My only beef is the copy to tape mechanism, due to all the issues I’ve mentioned so far.

    BTW I thought Sepaton didn’t have a path to tape? You can have up to 16 FC ports to the hosts though, which is potentially very decent throughput assuming the box can keep up.

    Yes, most larger shops will go with someone like EMC, Sun etc. Is it the best idea? Technically, hell no. If you know how to roll your own you can invariably do a much better job for much less money. But politics come into it, the vendors make you offers you can’t refuse, and companies such as one of the biggest technology vendors out there (won’t name names since they WILL hire hitmen) have such connections as to make the US government make prime ministers of certain countries call the president of their national bank and make them order from said vendor, even if the product has been demonstrated not to work and currently rots away, purchased but uninstalled, in warehouses. Now, that’s power.

    BTW I’m not Russian, I’m Greek, but what is it, buy it and it will fix your problems?

    D

  7. Greetings,
    IMHO the performance achieved by the integrated VTL approach cannot be discounted. Standalone implementation would do very well if you are trying to move to only a disk based backups in the near future (it doesn’t matter in this case) or if you archive only a few tapes a week/month.

    But if you do archive many tapes, you would feel the performance bottleneck. Having a separate backup server for the writes to physical tape is a good idea for a standalone solution, but then its additional resources.

    Keeping integrated VTLs in mind can be a good idea. Sooner or later you would have solutions not workarounds for the problems mentioned. Also check out whitepapers at http://www.scache.com, i believe the issues with integrated VTLs are addressed

    Regards,

Leave a comment for posterity...