FUD tales from the blogosphere: when vendors attack (and a wee bit on expanding and balancing RAID groups)

Haven’t blogged in a while, way too busy. Against my better judgment, I thought I’d respond to some comments I’ve seen on the blogosphere, adding one of my trademark extremely long titles. Part response, part tutorial. People with no time to read it all: Skip to the end and see if you know the answer to the question or if you have ideas on how to do such a thing.

It’s funny how some vendors won’t hesitate to wholeheartedly agree when some “independent” blogger criticizes their competition (before I get flamed, independent in quotes since, as I discussed before, there ain’t no such thing whether said blogger realizes it or not – being biased is a basic human condition).

The equivalent of someone posting in an Audi forum about excessive brake dust, and having guys from Mercedes and BMW chime in and claim how they “tested” Audis and indeed they had issues (but of course!) and how their cars are better now and indeed maybe Audi doesn’t have as much of a lead any more (if, indeed, they ever did). I think the term for that is “shill” but I can understand taking every opportunity to harm an opponent.

So the “Storage Architect” posted entries asking about certain features to be implemented on NetApp storage, one of them being able to reduce the size of an aggregate. Then everyone and their mum jumped on and complained how on earth such an important feature isn’t there 🙂 BTW I’m not saying such a thing wouldn’t be useful to have from time to time. I’ll just try to explain why it’s tricky to implement and maybe ways to avoid problems.

For the uninitiated, a NetApp aggregate is a collection of RAID-DP RAID groups, that are pooled, striped and I/O then hits all the drives from all RAID groups equally for performance. You then carve out volumes out of that aggregate (containers for NFS, CIFS, iSCSI, FC).

A pretty simple structure, really, but effective. Similar constructs are used by many other storage vendors that allow pooling.

So, the question was, why not be able to make an aggregate smaller? (you can already make it bigger on-the-fly, as well as grow or shrink the existing volumes within).

An HP guy them proceeded to complain about how he put too few drives in an aggregate and ended up with an imbalanced configuration while trying to test a NetApp box.

So, some basics:  the following picture shows a well-balanced pool – notice the equal number of drives per RAID group:

The idea being that everything is load-balanced:

Makes sense, right?

You then end up with pieces of data across all disks, which is the intent. Growing it is easy – which is, after all, what 99.99% of customers ever want to do.

However, the HP dude didn’t have enough disks to create a balanced config with the default-sized RAID group (16). So he ended up with something like this, not performance-optimal:

So what the HP dude wanted to do, was to reduce the size of the RAID group and remove drives, even though he expanded the aggregate (and by extension the RAID group) originally.

Normally, before one starts creating pools of storage (with any storage system), one also knows (or should) what one has to play with in order to get the best overall config. It’s like “I want to build a 12-cylinder car engine, but I only have 9 cylinders”. Well – either buy more cylinders, or build an 8-cylinder engine! Don’t start building the 12-cylinder engine and go “oops” 🙂 This is just Storage 101. Mistakes can and do happen, of course.

So, with the current state of tech, if I only had 20 drives to play with (and no option to get more), assuming no spares, I’d rather do one of the following:

  1. Aggregate with 10 + 10 RAID groups inside or
  2. Use all 20 drives in a single RAID group for max space
  3. Ask someone that knows the system better than I do for some advice

This is common sense and both doable and trivial with a NetApp system. The idea is you set the desired RAID group size for that aggregate BEFORE you put in disks. Not really difficult and pretty logical.

For instance, aggr options HPdudeAggr raidsize 10 before adding the drives would have achieved #1 above. Graphically, the Web GUI has that option in there as well, when you modify an aggregate. The option exists and it’s well-known and documented. Not knowing about it is a basic education issue. Arguing that no education should be needed to use a storage device (with an extreme number of features) properly even for deeply involved, low-level operations, is a romantic notion at best. Maybe some day. We are all working hard to make it a reality. Indeed, a lot of things that would take a really long time in the past (or still, with other boxes) have become trivialized – look at SnapDrive and the SnapManager products, for instance.

Back to our example: if, in the future, 10 more disks were purchased, and approach #1 above was taken, one would simply add the ten disks to the aggregate with aggr add HPdudeAggr 10. Resulting in a 10+10+10 config.

But what if I had done #2 above (make a 20-drive RAID group the default for that aggregate)?

Then, simply, you’d end up imbalanced again, with a 20+10. Some thought is needed before embarking on such journeys.

Maybe a better approach would be to add, say, a more reasonable number of drives to achieve good balance? Adding 12 more drives, for example, would allow for an aggregate with 16+16 drives. So, one could simply change the raidsize using aggr options HPdudeAggr raidsize 16, then, add the 12 disks to the aggregate with aggr add HPdudeAggr -g all 12.

This would expand both RAID groups contained within the aggregate dynamically to 16 drives per, resulting in a 16+16 configuration. Which, BTW, is not something you can easily do with most other storage systems!

Having said all that, I think that for people that are not storage savvy (or for the storage savvy that are suffering from temporary brain fog), a good enhancement would be for the interfaces to warn you about imbalanced final configs and show you what will be created in a nice graphical fashion, asking you if you agree (and possibly providing hints on how it could be done better).

I’m not aware of any other storage system that does that degree of handholding but hey, I don’t know everything.

Indeed, maybe the nature of the other posts was being bait so I’ll obligingly take the bait and ask the question so you can advertise your wares here: 🙂

Is anyone aware of a well-featured storage system from an established, viable vendor that currently (Aug 7, 2010, not roadmap or “Real Soon Now”) allows the creation of a wide-striped pool of drives with some RAID structures underneath; then allows one to evacuate and then destroy some of those underlying RAID groups selectively, non-disruptively, without losing data, even though they already contain parts of the stripes; then change the RAID layout to something else using those same existing drives and restripe without requiring some sort of data migration to another pool and without needing to buy more drives? Again, NOT for expansion, but for the shrinking of the pool?

To clarify even further: What the HP guy did was exactly this: He had 20 drives to play with, he created by mistake a pool with 2 RAID groups, 14+2 and a 2+2, how would your solution take those 2 RAID groups, with data, and change the config to something like 10 + 10 without needing more drives or the destruction of anything?

Can you dynamically reduce a RAID group? (NetApp can dynamically expand, but not reduce a RAID group).

I’m not implying such a thing doesn’t exist, I’m merely curious. I could see ways to make this work by virtualizing RAID further. Still, it’s just one (small) part of the storage puzzle.

The one without sin may cast the first stone! 🙂

D

Technorati Tags: ,,

6 Replies to “FUD tales from the blogosphere: when vendors attack (and a wee bit on expanding and balancing RAID groups)”

  1. Well-featured storage system? – Yes, Compellent.

    Established? – Yes, Compellent.

    Viable vendor that currently allows the creation of a wide-striped pool of drives with some RAID structures underneath? – Yes, Compellent.

    Allows one to evacuate and then destroy some of those underlying RAID groups selectively, non-disruptively, without losing data, even though they already contain parts of the stripes? – Yes, Compellent. I actually just helped a customer do this very thing last week. Best part was the customer didn’t even know it happened and they now have 10TB more usable space. They switched from a lot of RAID 10 to RAID 5.

    Change the RAID layout to something else using those same existing drives and restripe without requiring some sort of data migration to another pool without needing to buy more drives? – Yes, Compellent. Again, just did this last week.

    NOT for expansion, but for the shrinking of the pool? – Yes, Compellent. I’ve actually had customers who did it themselves. One removed 64 146GB 10K drives and then replaced them with 48 450GB 15K drives. No outage, in the background over a few days. The empty shelf is sitting there waiting for them to use in the future when they need more capacity and/or performance in their tier 1.

    I’m sure there will be questions…

  2. Thanks for the reply, I edited the question for clarification, here is the bit I added:

    To clarify even further: What the HP guy did was exactly this: He had 20 drives to play with, he created by mistake a pool with 2 RAID groups, 14+2 and a 2+2 (data + parity), how would your solution take those 2 RAID groups, with data, assuming not completely full, and change the config on the fly to something like 8+2 and 8+2 or 18+2 without needing more drives or the destruction of anything?

    I understand Compellent (and 3Par I believe) do RAID at the sub-disk level so some temporary over-provisioning could do the trick (for instance, instead of allowing 1 RAID chunk per disk, allow 2, so, if there’s enough space, one could move the RAID chunks from the 2+2 group into the 16+2, then change the layout on the fly. Is this possible?

    Thx

    D

  3. You pretty much nailed it. The subject you are hitting on is at the very core of Compellent’s architecture and one of the features that is often not understand and almost always overlooked.

    Since Compellent does RAID by block (or “sub-disk” as you called it) and not by disk like most other systems, a new raid device can be easily created on the same drives that already have the 14+2 and 2+2. In the Compellent world, the feature of RAID by block is called Dynamic Block Architecture and is what most of the other features are built on (ex. thin provisioning, automated storage tiering and snapshots). Since multiple RAID devices with different stripe widths can place their blocks/pages on the same drive at the same time, two new 8+2 (RAID 6-10 in Compellent world) RAID devices could be created in the raw space of the drives and the original 14+2 and 2+2 could then have all of their blocks/pages of data moved in the background without any disruption.

    Hopefully, this answers your question.

  4. Thanks Aaron. But could you modify the EXISTING RAID groups? I.e. Dynamically shrink one and add to the other?

    NetApp lets you expand a RAID group but not shrink it.

    Thx

    D

  5. This ended up longer than I thought it would, but hopefully it answers one or two of your questions along the way.

    Technically, Compellent doesn’t have “RAID groups” like most arrays, but to answer your question RAID devices, to the best of my knowledge, have fixed stripe widths. That doesn’t restrict Compellent from shrinking and expanding the number of drives though, it’s just handled differently than how you are probably thinking about it.

    To expand upon my last response… If the system starts with say 16 drives, RAID devices would be created to support data distributed as evenly as possible across all drives using the stripe widths available. Volumes would then place data into multiple RAID devices to achieve having a volume spanned across all drives within the system.

    Time moves on and 16 more drives are added. The system would then create new RAID devices with the goal again being that they are distributed as evenly as possible across all drives (old and new). Once a new device is created, the system would then migrate data out of one of the old devices and into a new one. Once the old device is empty it is removed from the system. This is repeated until all of the old devices are removed. This entire process runs in the background using system idle cycles and is commonly referred to as a RAID Rebalance.

    Time moves on again and this time the original drives are to be replaced. For the purpose of this discussion, let’s say the original drives in the system were all 146GB 15K drives and the second set of drives were 450GB 15K drives. Without introducing a new enclosure, the original set of drives (146GB 15K) can be marked as being removed and all data will be evacuated from them using a RAID Rebalance. In that case the system follows the same steps as before, but is back to only having RAID devices that span the 16 450GB 15K drives. Once the RAID Rebalance is complete, the original 146GB 15K drives can be physically removed from the enclosure and the new drives we have can be installed in their place.

    Just to put sizes to this, let’s say the new drives now are 600GB 15K drives. Again, just like the last two cases, a RAID Rebalance is kicked off and the system would create new RAID devices across the now older 450GB 15K drives and the newer 600GB 15K drives.

    In short, the RAID device itself doesn’t change how wide it is, the system just manages how many devices to create to be able to go across all drives. Just to toss it out there, technically it doesn’t matter if the drives are the same size or speed, but different speed drives are by default placed into separate tiers. While I haven’t ever replaced faster drives with slower drives, there is no reason this couldn’t be done, but generally everyone wants to go bigger and faster.

    I seriously doubt this will answer all of your questions, if anything it probably just created several more.

  6. Thanks, so Compellent would just do it with slick migration, not by changing the RAID layouts on the fly. Still works I guess if you have enough free disk space, won’t if you don’t.

    Ultimately, the point is that there’s no magic, regardless of solution you do need to know what you’re doing and what the ramifications of your actions are (and how to revert just in case).

    D

Leave a comment for posterity...