Pillar claiming their RAID5 is more reliable than RAID6? Wizardry or fiction?

Competing against Pillar at an account. One of the things they said: That their RAID5 is superior in reliability to RAID6. I wanted to put this on the public domain and, if true, invite Pillar engineers to comment here and explain how it works for all to see. If untrue, again I invite the Pillar engineers to comment and explain why it’s untrue.

The way I see it: very simply, RAID5 is N+1 protection, RAID6 is N+2. Mathematically, RAID5 is about 4,000 times more likely to lose data than a RAID6 group with the same number of data disks. Even RAID10 is about 160 times more likely to lose data than RAID6.

The only downside to RAID6 is performance – if you want the protection of RAID6 but with extremely high performance then look at NetApp, the RAID-DP NetApp employs by default has in many cases better performance than RAID10 even. Oracle has several PB of DB’s running on NetApp RAID-DP. Can’t be all that bad.

See here for some info…


About the Data Domain acquisition – and is EMC really the best place for Data Domain?

Much has already been written about this imminent acquisition of Data Domain by either NetApp or EMC and, since opinions are like you-know-what, and I have one, here it is… if I ramble, forgive me. I have too much to say and I’m trying to be PC… I wrote and subsequently erased all kinds of stuff that could probably get me in trouble (the more you work with a company the more dirt you uncover, and I have several earth movers’ worth).

I do think that both companies waited too long to try and acquire Data Domain – frankly, it’s staggering to me that other companies that make decent products like CommVault haven’t been acquired yet (I mean, seriously, if EMC want to compete in the backup software space they should just drop Networker and buy CommVault). Consolidation is the trend…

Maybe both NetApp and EMC thought their in-house deduplication would work out for everything, maybe they thought Data Domain wouldn’t become a contender. Maybe they thought it was just a phase. Either way, the backup market is still strong, most people don’t want to move en masse to something like Avamar, not everyone needs VTL, and Data Domain does provide a very convenient way to keep using your existing backup product, make next to no changes, and get better efficiencies.

The simple truth is that EMC needed SOMETHING to combat Data Domain so they signed the agreement with Quantum and rushed the product to market. And then tried to strong-arm the resellers into forgetting about Data Domain and instead selling the new and amazing DL3D (that backfired BTW).

As far as EMC is concerned, the attempt to acquire Data Domain is a slap in the face for Quantum and all the customers that have been pitched/sold DL3D (the OEM’ed Quantum DXi product). EMC has spent quite a bit of time belittling Data Domain and instead pushing a product that has seen very limited testing (I know, I’ve been burned personally by it several times). A good example: EMC recently released a patch to allow backups done with EMC’s Networker to actually be deduplicated (talk about a reason to return a product if there ever was one – like a car that can’t go faster than 10 mph or that gets 2 mpg instead of 20 mpg). You see, there was an issue with the filter that figures out what backup app you’re using, and Networker backups were getting only plain old compression, NO deduplication. This is no secret, if anyone bothers to read the release notes of the recent patches they’ll see this info. Maybe if you’re a DL3D customer you should insist on reading the release notes if they’re not easily available? After all, you have a right to know what’s changing!

Think about this: EMC’s own backup product was not tested with DL3D. Yet EMC happily sold DL3D to customers with Networker. To me, this is a sales-driven company, not a customer-driven company.

Not to mention other crippling bugs, slow startup times (especially in the case of unclean shutdowns) and the abysmal performance which simply stems from how the product is designed – it’s spindle-happy and needs about 2 trays of drives to work well. Oh, and don’t EVER fill it beyond 80% capacity. You’re also not supposed to use it as a normal CIFS/NFS share for archiving anything like email or normal files (arguably a great place for dedup).

So, EMC knew about the DL3D issues (well, some of them, it’s not their product after all, indeed I helped them identify some of the bugs) and played coy with customers. Then, they saw NetApp making a move for Data Domain and realized that by buying Data Domain EMC could accomplish several things:

  • Minimize NetApp’s cash reserves if NetApp does in the end succeed in acquiring Data Domain (but is that necessarily a bad thing for NetApp?)
  • Remove the flailing DL3D and replace it with a product that actually works and is selling very well
  • Get a bunch of solid deduplication and consistency checking algorithms
  • Assimilate a competitor that’s been a huge thorn on EMC’s side in that space
  • Reduce the efficiency of NetApp as a competitor

But think from the customer standpoint for a minute (most of the analysts so far seem to miss the most important player here – and that’s certainly not EMC, NetApp or Data Domain, but the customer). You’ve been pitched DL3D, and now you must forget about that and all the bad things you were told about Data Domain – it’s all good now that it belongs to EMC, you’ll be taken care of. Or you can buy the DL3D if you still want it (and I don’t see EMC derailing ANY existing DL3D campaign, no matter what).

I were a DL3D prospect/customer, I’d be worried no matter what.

Let’s talk about the best place for Data Domain to end up. As far as investors go of course, if they want to make a quick buck and run, the EMC cash offer is tantalizing. But for Data Domain employees, EMC can be a black hole and the added complexity and bureaucracy anything but fun. EMC has become almost too diversified – let’s look at just some of EMC’s storage solutions (I won’t mention the software since then it’d be a REALLY long and weird post):

  • Symmetrix
  • Clariion
  • Celerra
  • Centera
  • Atmos
  • EDL
  • DL3D
  • RecoverPoint
  • Avamar (that’s both a software solution and an appliance)

What’s interesting is that, by and large, the teams in charge of the above products don’t talk much, if at all, with each other. Talk about islands! And, when it comes to sales, EMC has internally competing groups of people that sell the above products – for instance, “NAS overlay” guys only get paid on Celerra sales, and I’ve seen them screw up campaigns that were clearly a pure Clariion play just so they could somehow get some Celerra in so they get paid. The basic EMC sales guy you meet can sell them all and indeed doesn’t care, but the people he relies on for support cannot sell them all and do care about what gets sold. It’s all very fragmented and, again, not a model that operates with the customer’s best interests always in mind. It always baffled me why EMC would allow so much fluff in their sales organization.

So, if Data Domain got absorbed, they’d probably not be enjoying all the “melting pot” advantages the EMC corporate bloggers seem so keen on advertising, and the “large startup” feel (maybe it’s like that in MA for a few chosen people – in most other locations it’s decidedly not like that). They’d just be another acquired unit, internally competing with other units, dealing with large-company politics and other inefficiencies. The EMC stock wouldn’t really become much higher than it is now, if at all. It’s been about the same for quite some time now.

Let’s examine the scenario of NetApp buying Data Domain:

  • NetApp is much more focused than EMC – indeed they have literally less than a handful of major offerings that don’t really compete with each other
  • The NetApp sales force is unified and doesn’t internally compete about what to sell
  • NetApp culture is much closer to Data Domain culture
  • It’s not good for innovation to have one company hoarding 3 dedup technologies, NetApp + Data Domain will actually push EMC more and be better for the customers
  • Data Domain could make NetApp much stronger against EMC, in turn driving NetApp’s stock price up significantly. Which, in turn, would give investors back much more than $2bn, thereby making this the better deal.

The only drawback I see (as do most writing about this) is NetApp’s relatively poor history in managing the few acquisitions they’ve made. But I believe that as long as they leave Data Domain alone and slowly try to integrate the technology in the other products it will all work out.

Hopefully all this made some sense…


The true XIV fail condition finally revealed (?)

I just got this information:

For XIV to be in jeopardy you need to lose 1 drive from one of the host-facing ingest nodes AND 1 drive from the normal data nodes within a few minutes (so there’s no time to rebuild) while writing to the thing.

Have no way of confirming this but it did come from a reliable source.

A customer recently tried pulling random drives and XIV didn’t shut down and was working fine, but they were from the data nodes.

Why can’t anyone post something concrete here? I’m sure IBM won’t post since the confusion serves them well.

For what it’s worth, the customer is really happy with the simplicity of the XIV GUI.


So what exactly is IBM trying to do with the XIV?

By now most people dealing with storage know that IBM acquired the XIV technology. What IBM is doing now is trying to push the technology to everyone and their dog, for reasons we’ll get into…

I just hope IBM gets their storage act together since now they’re selling products made by 4-5 different vendors, with zero interoperability between them (maybe SVC is the “one ring to rule them all”?)

In a nutshell, the way XIV works is by using normal servers running Linux and the XIV “sauce” and coupling them together via an Ethernet backbone. A few of the nodes get FC cards and can become FC targets. A few more of the features:

  • Thin provisioning
  • Snaps
  • Synchronous (only) replication
  • Easy to use (there’s not much you can do with it)
  • Uses RAID-X (no global spares, merely there’s space on each drive, faster rebuilds are possible)
  • Only mirrored
  • A good amount of total cache per system since each server has several GB of RAM BUT the cache is NOT global (each node simply caches the data for its local disks).

IBM claims insane performance numbers with the XIV (“it will destroy DMX/USP!” — sure). But let’s take a look at how everything looks:

  • 180 maximum (or minimum) drives (you can get a half config but I think you always get the 180 drives but license half, I might be mistaken – I believe you have to make a commitment that you’ll buy the whole thing in 1 year)
  • Normal Linux servers do everything
  • Only SATA
  • The backbone is Ethernet, not FC or Infiniband (much, much higher latency is incurred by Ethernet vs the other technologies)

The way IBM claims they can sustain high speed is to not try and make the SATA drives get bound by their low transactional performance vs 15K FC drives or, even worse, SSDs. From what I understand (and IBM employees feel free to chime in) XIV:

  1. Ingests data using a few of the front-end nodes
  2. Tries to break up the datastream into 1MB chunks
  3. The algorithm tries to pseudo-randomly spread the 1MB chunks and mirror them among the nodes (the simple rule being that a 1MB chunk cannot have a mirror on the same server/shelf of drives!)

Obviously, by doing effectively as much as possible large block writes to the SATA drives and using the cache to great effect, one should be able to see the 180 SATA drives perform pretty much as fast as possible (ideally, the drives should be seeing streaming instead of random data). However (there’s always that little word…)

  1. There is no magic!
  2. If the incoming random IOPS are coming at too great a rate (OLTP scenarios), any cache can get saturated (the writes HAVE to be flushed to disk, I don’t care what array you have!) and it all boils down to the actual number of disks in the box. The box is said to do 20,000 IOPS if that happens – which I think is optimistic at 111 IOPS/drive! At any rate, 20,000 IOPS is less than what even small boxes from EMC or other vendors can do when they run out of cache. Where’s the performance advantage of XIV?
  3. The “randomization removing algorithm”, if indeed there’s such a thing in the box, will have issues with more than 1-2 servers sending it stuff
  4. See #1!

Like with anything, you can only extract so much efficiency out of a given system before it blows up.

An EMC CX4-960 could be configured with 960 drives. Even assuming that not all are used due to spares etc. you are left with a system with over 5 times the number of physical disks vs an XIV, tons more capacity etc. Even if the “magic” of XIV makes it more efficient, are those XIV SATA drives really 5 times more efficient (5 times would make it EQUAL to the 960 performance, XIV would have to be well over 5 times more efficient than an EMC box of equivalent size to beat the 960).

Let’s put it that way:

If my system was as efficient as IBM claims, and I had IBM’s money, it’d buy all the competitive arrays, even at several times the size of my box, and publicize all kinds of benchmarks showing just how cool my box is vs the competition. You just can’t find that info anywhere, though.

Regarding innovation: Other vendors have had similar chunklet wide striping for years now (HP EVA, 3Par, Compellent if I’m not mistaken, maybe more). 3Par for sure does hot sparing similar to an XIV (they reserve space on each drive). 3Par can also grow way bigger than XIV (over 1,000 drives).

So, if I want a box with thin provisioning, wide striping, sparing like XIV but the ability to choose among different drive types, why not just get a 3Par? What is the compelling value of XIV, short of being able to push 180 SATA drives well? Nobody has been able to answer this.

I’m just trying to understand XIV’s value prop since:

  1. It’s not faster unless you compare it to poorly architected configs
  2. It has less than 50% efficiency at best, so it’s not good for bulk storage
  3. It’s not cheap from what I’ve seen
  4. Burns a ton of power
  5. Cannot scale AT ALL
  6. Cannot tier within the box (NO drive choices besides 1TB SATA)
  7. Cannot replicate asynchronously
  8. Has no application integration
  9. No Quality of Service performance guarantees
  10. No ability to granularly configure it
  11. Is highly immature technology with a small handful of reference customers and a tiny number of installs! (I guess everyone has to start somewhere but do YOU want to be the guinea pig?)

Unless your needs are exactly what XIV provides, why would you ever buy one? Even if your space/performance needs are in the XIV neighborhood there are other far more capable storage systems out there for less money!

IBM is not stupid, or at least I hope not. So, what IBM is doing is pretty much handing out XIVs to whoever will take one. If you get one, think of yourself as a beta tester. Because I hardly believe that IBM bought the XIV IP without seeing some kind of roadmap, otherwise the purchase would be kinda stupid! If you are a beta-tester, be aware that:

  • XIV cheats with benchmarks that write zeros to the disk or read from not previously-accessed addresses
  • XIV will be super-fast with 1-2 hosts pushing it, push it realistically with a real number of hosts
  • Try to load up the box since if it’s not full enough you’ll get an extremely skewed view of performance – put even dummy data inside but fill it to 80% and then run benchmarks!
  • Test with your applications, not artificial benchmarks
  • Do not accept the box in your datacenter before you see a quote! In at least 3 cases that I know of IBM drops off the box without giving you even a ballpark figure. I think that’s insane.

And last, but not least: I keep hearing and reading about the following being true, I’d love IBM engineers to disprove it:

If you remove 2-3 drives from different trays simultaneously from a loaded system then you will suffer a catastrophic failure (logically makes sense looking at how the chunks get allocated but I’d love to know how it works in real life). And before someone tells me that this never happens in real life, It’s personally happened to me at least once (lost 2 drives in rapid succession) and many other people I know that have any serious real-world experience…


Postmark on late 2008 Macbook Pro

So I’m now the proud owner of a tricked-out 2.8GHz MBP.

Naturally I’ve been tinkering with it already (only had it for 2 days). I’ve disabled swapfile encryption, for instance, and I think it makes it have teh snappy.

I compiled postmark for it with -O3 -m64 and ran the usual. Before doing so though I did disable the Spotlight indexer like this:

sudo launchctl unload /System/Library/LaunchDaemons/com.apple.metadata.mds.plist

PostMark v1.5 : 3/27/01
pm>set number 10000
pm>set transactions 20000
pm>set subdirectories 5
pm>set size 500 100000
pm>set read 4096
pm>set write 4096

273 seconds total
256 seconds of transactions (78 per second)

20092 created (73 per second)
Creation alone: 10000 files (833 per second)
Mixed with transactions: 10092 files (39 per second)
9935 read (38 per second)
10064 appended (39 per second)
20092 deleted (73 per second)
Deletion alone: 10184 files (2036 per second)
Mixed with transactions: 9908 files (38 per second)

548.25 megabytes read (2.01 megabytes per second)
1158.00 megabytes written (4.24 megabytes per second)

I then enabled spotlight and re-ran the benchmark:

483 seconds total
468 seconds of transactions (42 per second)

20092 created (41 per second)
Creation alone: 10000 files (909 per second)
Mixed with transactions: 10092 files (21 per second)
9935 read (21 per second)
10064 appended (21 per second)
20092 deleted (41 per second)
Deletion alone: 10184 files (2546 per second)
Mixed with transactions: 9908 files (21 per second)

548.25 megabytes read (1.14 megabytes per second)
1158.00 megabytes written (2.40 megabytes per second)

Obviously spotlight is very aggressive in its indexing and tries to do it ASAP – you lose half your performance when doing metadata-intensive processing. The results though, while sucky for the specs of the box, are far, far removed (and much better than) what an old colleague got on his beastie: http://recoverymonkey.net/wordpress/?p=62 – granted, my box is faster but it shouldn’t be THAT much faster.

I   urge my newfound Mac brethren to help out in determining the cause.

More benchmarks to follow.