Should EMC move to more multi-functional devices?

Here’s the deal: EMC has a lot of cool stuff. Lots of it came through acquisitions. Lots of it runs on the x86 platform, believe it or not.

At the moment one needs to buy multiple boxes from EMC to do NAS, SAN, archiving, etc.

Imagine if you got instead generic boxes (with their power relative to their cost, there could be a few models).

In each box you could run a Clariion, Centera, Celerra, a print server, WAAS (even though it’s Cisco it’s really a Linux box), something like Recoverpoint, and so on.

All the products could be custom Virtual Machine Appliances, possibly running on a modified ESX platform (so you can’t just run them anywhere). You’d get all the benefits of cool technologies such as VMotion and HA. You could easily add to it.

This doesn’t preclude the use of specialized hardware to accelerate certain functions, though in this age of quad-core CPUs even that may be unnecessary.

Think about it. EMC owns the IP for all that technology.

They don’t need to make less money – if anything, since all the platforms would be virtual, production would be greatly streamlined. They could even have a single type of box (say a quad quad with tons of RAM and expansion capability) as the hardware. You need more speed for NAS? Add an extra box, an extra license and load-balance a new virtual data mover.

This of course is unattainable at the moment – I don’t think VMware can provide such low latency and high throughput but maybe I’m wrong.

Such a move won’t fix the proliferation of management interfaces, but EMC could build a common interface.



Do you need a VTL or not?

I first posted this as a comment on but this is its rightful place.

Having deployed what was, at the time, the largest VTL in the world, and subsequently numerous other VTL and ATA Solutions, I think I can offer a somewhat different perspective:

It depends on the number of data movers you have and how much manual work you’re prepared to do. Oh, and speed.

Licensing for VTL is now capacity-based for most packages (at least the famous/infamous/important ones like CommVault, Networker and NetBackup, not respectively).

Also, I’d forget about using VTL features such as replication and using the VTL to write directly to tape (unless you’re retarded, insane or the backup software is running ON the VTL, as is the case now with EMC’s CDL). Just use the VTL like tape. I’ve been so vehement about this that even the very stubborn and opinionated Curtis Preston is now afraid to say otherwise with me in the room… (I shut him up REALLY effectively during one Veritas Vision session we were co-presenting a couple years ago. I like Curtis but he’s too far removed from the real world. Great presenter, though, and funny).

Even dedup features are suspect in my opinion, since they rely on hashes and searches of databases of hashes, which progressively get slower the more you store in them. Most companies selling dedup (data domain, avamar, to name a couple major names) are sorta cagey when you confront them with questions such as “I have 5 servers with 50 million files each, how well will this thing work?”

Answer is, it won’t, even for far fewer files. Just get some raw-based backup method that also indexes, such as Networker’s snapimage or NBU’s flashbackup.

Dedup also fails with very large files such as database files.
I can expand on any of the above comments if anyone cares.

But back on the data movers (Media Agents, Storage Nodes, Media Servers):

Whether you use VTL or ATA, you effectively need to divvy up the available space.

With ATA, you either allocate a fixed amount of space to each data mover, or use a cluster filesystem (such as Adic’s Stornext) to allow all data movers to see the same disk.

With VTL, the smallest quantum of space you can allocate to a data mover is, simply, a virtual tape. A virtual tape, just like a real tape, gets automatically alocated, as needed.

So, imagine you have a large datacenter, with maybe 40 data movers and multiple backup masters.

Imagine you have a 64TB ATA array.

You can either:

1. Split the array into 40 chunks, and have a management nightmare
2. Deploy stornext so all servers see a SINGLE 64TB filesystem (at an extra 3-4K per server, plus probably 50K more for maintenance, central servers and failover) – easy to deal with but complex to deploy and more software on your boxes)
3. Deploy VTL and be done with it.

For such a large environment, option #3 is the best choice, hands down.

With filesystems, you have to worry about space, fragmentation, mount options, filesystem creation-time tunables, runtime tunables, esoteric kernel tunings, fancy disk layouts, and so on. If you’re weird like me and thoroughly enjoy such things, then go for it. As time goes by though, the novelty factor diminishes greatly. Been there, done that, smashed some speed records on the way.

What’s needed in the larger shops, aside from performance, is scalability, ease of use and deployment, and simplicity.

With VTL, you get all of that.

The other issue with disk is that backup vendors, while they’re getting better, impose restrictions on the # streams in/out, copy to tape and so on. No such restrictions on tape.

One issue with VTL: depending on your backup software, setting up all those new virtual drives etc. can be a pain (esp. on NBU).
for a small shop (less than 2 data movers), a VTL is probably overkill.