Progress Needs Trailblazers

I got the idea for this stream-of-consciousness (and possibly too obvious) post after reading some comments regarding new ways to do high speed I/O. Not something boring like faster media or protocols, but rather far more exotic approaches that require drastically rewriting applications to get the maximum benefits of radically different architectures.

The comments, in essence, stated that such advancements would be an utter waste of time since 99% of customers have absolutely no need for such exotica, but rather just need cheap, easy, reliable stuff, and can barely afford to buy infrastructure to begin with, let alone custom-code applications to get every last microsecond of latency out of gear.

Why Trailblazers Matter

It’s an interesting balancing game: There’s a group of people that want progress and innovation at all costs, and another group of people that think the status quo is OK, or maybe just “give me cheaper/easier/more reliable and I’m happy”.

Both groups are needed, like Yin and Yang.

The risk-averse are needed because without them it would all devolve into chaos.

But without people willing to take risks and do seemingly crazy and expensive things, progress would simply be much harder.

Imagine if the people that prevailed were the ones that thought trepanning and the balancing of humours was state-of-the-art medicine and needed no further advancements. Perchance the invention of MRI machines might have taken longer?


What if traditional planes were deemed sufficient, would a daring endeavor like the SR-71 have been undertaken?


Advanced Tech Filters Down to the Masses

It’s interesting how the crazy, expensive and “unnecessary” eventually manages to become part of everyday life. Think of such commonplace technologies like:

  • ABS brakes
  • SSDs
  • 4K TVs
  • High speed networking
  • Smartphones
  • Virtualization
  • Having more than 640KB of RAM in a PC

They all initially cost a ton of money and/or were impractical for everyday use (the first ABS brakes were in planes!)

What is Impractical or Expensive Now Will be Normal Tomorrow…

…but not every new tech will make it. Which is perfectly normal.

  • Rewriting apps to take advantage of things like containers, microservices and fast object storage? It will slowly happen. Most customers can simply wait for the app vendors to do it.
  • In-memory databases? Not a niche any more, when even the ubiquitous SQL Server is doing it…
  • Using advanced and crazy fast byte-addressable NVDIMM in storage solutions? Some mainstream vendors like Nimble and Microsoft are already doing it. Others are working on some really interesting implementations.
  • AI, predictive analytics, machine learning? Already happening and providing huge value (for example Nimble’s InfoSight).

Don’t Belittle the Risk Takers!

Even if you think the crazy stuff some people are working on isn’t a fit for you, don’t belittle it. It can be challenging to hold your anger, I understand, especially when the people working on crazy stuff are pushing their viewpoint as if it will change the world.

Because you never know, it just might.

If you don’t agree – get out of the way, or, even better, help find a better way.

Just Because it’s New to You, Doesn’t Mean it’s Really New, or Risky

Beware of turning into an IT Luddite.

Just because you aren’t using a certain technology, it doesn’t make the technology new or risky. If many others are using it, and reaping far more benefits than you with your current tech, then maybe they’re onto something… 🙂

At what point does your resistance to change become a competitive disadvantage?

Tempered by: is the cost of adopting the new technology overshadowed by the benefits?

Stagnation is Death

…or, at a minimum, boring. Τα πάντα ρει, as Heraclitus said. I, for one, am looking forward to the amazing innovations coming our way, and actively participate in trying to make them come to fruition faster.

What will you do?


Are some flash storage vendors optimizing too heavily for short-lived NAND flash?

I really resisted using the “flash in the pan” phrase in the title… first, because the term is overused and second, because I don’t believe solid state is of limited value. On the contrary.

However, I am noticing an interesting trend among some newcomers in the array business, desperate to find a flash niche to compete in:

Writing their storage OS around very specific NAND flash technologies. Almost as bad as writing an entire storage OS to support a single hypervisor technology, but that’s a story for another day.

Solid state technology is still too fluid. Unlike spinning disk technology that is overall very reliable and mature and likely won’t see huge advances in the years to come, solid state technology seems to advance almost weekly. New SSD controllers are coming out almost too frequently, and new kinds of solid state storage are either out now (Triple Level Cell, anyone?) or coming in the future (MRAM, ReRAM, FeRAM, PCM, PMC, and probably a lot more that I’m forgetting).

My point is:

How far ahead are certain vendors thinking if they are writing an entire storage OS around the limitations of a class of storage that may look very different in just a year or two?

Some of them go really deep and try to do all kinds of clever optimizations to ensure good wear leveling for the flash chips. Some write their own controller software and use bare NAND flash chips, not even off-the-shelf SSDs. Which is great, but what if you don’t need to do that in two years? Or what if the optimizations need to be drastically different for the new technologies? How long will coding for the new flash technologies take? Or will they be stuck using old technologies? Food for thought.

I guess some of us are in it for the long haul, and some aren’t. “Can’t see the forest for the trees” comes to mind. “Gold rush” also seems relevant.

I strongly believe general-purpose storage OSes need to be flexible enough to be reasonably adaptable to different underlying media. And storage OSes that are specifically designed for solid state storage need to be especially flexible regarding the underlying SSD technology to avoid the problems outlined above, and to avoid the relative lack of reliability of current SSD solutions (another story for another day).

At the moment I don’t see clear winners yet. I see a few great short-term stories, but who has the most flexible architecture to be able to deal with different kinds of technologies for years to come?


Technorati Tags: , ,

Has NetApp sold more flash than any other enterprise disk vendor?

NetApp has been selling our custom cache boards with flash chips for a while now. We have sold over 3PB of usable cache this way.

The question was raised in public forums such as Twitter – someone mentioned that this figure may be more usable Solid State storage than all other enterprise disk vendors have sold combined (whether it’s used for caching or normal storage – I know we have greatly outsold anyone else that does it for caching alone 🙂 ).

I don’t know if it is, maybe the boys from the other vendors can chime in on this and tell us, after RAID, how much usable SSD they’ve sold, but the facts remain:

  • NetApp has demonstrated thought leadership in pioneering the pervasive use of Megacaches
  • The market has widely adopted the NetApp Flash Cache technology (I’d say 3PB of usable cache is pretty wide adoption)
  • The performance benefits in the real world are great, due to the extra-granular nature of the cache (4KB blocks vs 64+ KB for others) and extremely intelligent caching algorithms
  • The cost of entry is extremely reasonable
  • It’s a very easy way to add extra performance without forcing data into faster tiers.

Comments welcome…


Technorati Tags: , , , , , , , , ,

A look at EMC’s FASTv2, FAST Cache and FLARE30 – EMC giveth, EMC taketh away

[Update: some grammar mistakes fixed and a few questions added]

Before anyone starts frothing at the mouth, notice that in the categories this post is part of FUD 🙂 Always do your own analysis… I just wanted to give people some food for thought, like I did when FASTv1 came out. I didn’t make this up, it’s all based on various EMC documents available online. I advise people looking at this technology to ask for extensive documentation regarding best practices before taking the leap.

As a past longtime user and sometimes pusher of EMC gear, some of the enhancements in FASTv2 seemed pretty cool to me, and potentially worrisome from a competitive standpoint. So I decided to do some reading to see how cool the new technology really is.

Summary of the new features:

  • Large heterogeneous pools (a single pool can consist of different drive types and encompass all drives in a box minus the vault and spares)
  • FASTv2 – sub-LUN movement of hot or cold chunks of data for auto-tiering between drive types
  • FAST Cache – add plain SSDs as cache
  • Much-touted feature: ability to use SSD as a write cache
  • LUN compression
  • Thin LUN space reclamation

It all sounds so good and, if functional, could bring Clariions to parity with some of the more advanced storage arrays out there. However, some examination of the features reveals a few things (I’m sure my readers will correct any errors). In no particular order:

EMC now uses a filesystem

It finally had to happen, thin LUN pools at the very least live on a filesystem laid on top of normal RAID groups (and I suspect all new pools on both Symm and CX now live on a filesystem). So is it real FC or some hokey emulation? Not that it matters if it provides useful functionality impossible to achieve otherwise, it’s just an about-face. But how mature is this new filesystem? Does it automatically defragment itself or at least provide tools for manual defragmentation? Filesystem design is not trivial.

LUN compression

  1. Best practices indicate compression should not be used for anything I/O intensive and is best suited for static workloads (i.e. not good for VMs or DBs). However, new data is compressed as a post-process, which theoretically doesn’t penalize new writes – which I find interesting. Also: What happens with overwrites? Do compressed blocks that need to be overwritten get uncompressed and re-laid down uncompressed until the next compression cycle? Do the blocks to be overwritten get overwritten in their original place or someplace new? What happens with fragmentation? It all sounds so familiar 🙂
  2. The read performance hit is reported to be about 25% – makes sense since the CPU has to work harder to uncompress the data.
  3. Turning on compression for an existing traditional LUN means the LUN will need to be migrated to a thin LUN in a pool (not converted, migrated – indeed, you need to select where the new LUN will go). Not an in-place operation, it seems.
  4. Does data need to be migrated to a lower tier in order to be compressed?
  5. It follows you need enough space for the conversion to take place… (can you do more than one in parallel? If so, quite a bit of extra space will be needed).
  6. How does this work with external replication engines like RecoverPoint? Does data need to be uncompressed? (probably counts as a normal “read” operation which will uncompress the data).
  7. Does this kind of compression mess with alignment of, say, VMs? This could have catastrophic consequences regarding performance of such workloads…

Thin LUN space reclamation

  1. Another case where migration from thick to thin takes place (doesn’t seem like the LUN is converted in-place to thin)
  2. Unclear whether an already thin LUN that has temporarily ballooned in size can have its space reclaimed (NetApp and a few other arrays can actually do this). You see, LUNs don’t only grow in size… several operations (i.e. MS Exchange checking) can cause a LUN to temporarily expand in space consumption, then go back down to its original size. Thin provisioning is only truly useful if it can help the LUN remain thin 🙂

Dual-drive ownership, especially when it pertains to pool LUNs

Dual-drive ownership is not strictly a new feature, but best practices is for a single CX controller (SP) to own a drive, and not have it shared. Furthermore, with pool LUNs, if you change the controller ownership of a pool LUN, I/O will see much higher latencies – it’s recommended to do a migration to a new LUN controlled by the other SP (yet another scenario that needs migration). I’m mentioning this since EMC likes to make a big deal about how both controllers can use all drives at the same time… obviously this is not nearly as clean as it’s made to appear. The Symmetrix does it properly.

Metadata used per thin LUN

3GB is the minimum space a thin LUN will occupy due to metadata and other structures. Indeed, LUN space is whatever you allocate plus another 3GB. Depending on how many LUNs you want to create, this can add up, especially if you need many small LUNs.

Loss of performance with thin LUNs and pools in general

It’s not recommended to use pools and especially thin LUNs for performance-sensitive apps, and in general old-style LUNs are recommended for the highest performance, not pools. Which is interesting, since most of the new features need pools in order to work… I heard 30% losses if thin LUNs are used in particular, but that’s unconfirmed. I’m sure someone from EMC can chime in.

Expansion, RAID and scalability caveats with pools

  1. To maintain performance, you need to expand the pool by adding as many drives as the pool already has – I suspect this has something to do with the way data is striped. This could cause issues as the system gets larger (who will really expand a CX4-960 by 180 drives a pop? Because best practices state that’s what you have to do if you start with 180 drives in the pool) .
  2. Another thing that’s extremely unclear is how data is load-balanced among the pool drives. Most storage vendors are extremely open about such things. All I could tell is that there are maximum increments at which you can add drives to a pool, ranging from 40 on a CX4-120 to 180 on a CX4-960. Since a pool can theoretically encompass all drives aside from vault and spares, does this mean that striping happens in groups of 180 in a CX4-960 and if you add another 180 that’s another stripe and the stripes concatenated?
  3. What if you don’t add drives by the maximum increment, and you only add them, say, 30 at a time? What do you give up, if anything?
  4. RAID6 is recommended for large pools (which makes total sense since it’s by far the most reliable RAID at the moment when many drives are concerned). However, RAID6 on EMC gear has a serious write performance penalty. Catch-22?

FAST Cache (includes being able to cache writes)

  1. Cache only on or off for the entire pool, can’t tune it per LUN (can only be turned on/off per LUN if old-style RAID groups and LUNs are used).
  2. 64KB block size (which means that a hot 4K block will still take 64K in cache – somewhat inefficient).
  3. A block will only be cached if it’s hit more than twice. Is that really optimal for the best hit rate? Can it respond quickly to a rapidly changing working set?
  4. Unclear set associativity (important for cache efficiency).
  5. No option to automatically optimize for sequential read after random write workloads (many DB workloads are like that).
  6. Flash drives aren’t that fast for writes as confirmed by EMC’s Barry Burke (the Storage Anarchist) in his comment here and by Randy Loeschner here. Is the write benefit really that significant? Maybe for Clariions with SATA, possibly due to the heavy RAID write penalties, especially with RAID6.
  7. It follows that highly localized overwrites could be significantly optimized since the Clariion RAID suffers a great performance degradation with overwrites, especially with RAID6 (something other vendors neatly sidestep).
  8. EMC Clariions don’t do deduplication so the cache isn’t deduplicated itself, but is it at least aware of compression? Or do blocks have to be uncompressed in cache? Either way, it’s a lot less efficient than NetApp Flash Cache for environments where there’s a lot of block duplication.
  9. The use of standard SSDs versus a custom cache board is a mixed blessing – by definition, there will be more latency. At the speeds these devices are going, those latencies add up (since it’s added latency per operation, and you’re doing way more than one operation). All high-end arrays add cache in system boards, not with drives…
  10. Smaller Clariions have severely limited numbers of flash drives that can be used for caching (2-8 depending on the model, with the smaller ones only able to use very small cache drives). Only the CX4-960 can do 20 mirrored cache drives, which I predict will provide good performance even for fairly heavy write workloads. However, that will come at a steep price. The idea behind caches like NetApp’s Flash Cache is to reduce costs

For a very detailed discussion regarding megacaches in general read here.

I can see FAST Cache helping significantly on a system with lots of SATA in a well-configured CX4-960. And I can definitely see it helping with heavy read workloads that have good locality of reference, since SSDs are very good for reads.

And finally, the pièce de résistance,


This is EMC’s sub-LUN auto-tiering feature. Meaning that a LUN is chopped up into 1GB chunks, and that the 1GB chunks move to slower or faster disks depending on how heavily accessed they are. The idea being that, after a little while, you will achieve steady state and end up with the most appropriate data on the most appropriate drives.

Other vendors (most notably Compellent and now also 3Par, IBM and HDS) have some form of that feature (Compellent pioneered this technology and has the smallest possible chunks I believe at 512KB).

The issues I can see with the CX approach of FASTv2:

  1. Gigantic 1GB slice to be moved. EMC admits this was due to the Clariion not being fast enough to deal with the increased metadata of many smaller slices (the far more capable Symmetrix can do 768KB per slice, offering far more granularity). It follows that the bigger the slice the less optimal the results are from an efficiency standpoint.
  2. All RAID groups within the pool have to be of the same RAID type (i.e, RAID6). So you can’t have, say, SATA as RAID6 and SSD as RAID5 in the same pool. Important since RAID6 on most arrays has a big performance impact.
  3. Unknown performance impact for keeping track of the slices (possibly the same as using thin provisioning – 30% or so?)
  4. The most important problem in my opinion: Too much data can end up in expensive drives. For instance, imagine a 1TB DB LUN. That LUN will be sliced into 1,000x 1GB chunks. Unless the hotspots of the DB are extremely localized, even if a few hundred blocks are busy per slice, that entire slice will get migrated to SSD the next day (it’s a scheduled move). Now imagine if, say, half the slices have blocks that are deemed busy enough – half the LUN (512GB in this example) will be migrated to SSD, even if the hot data in those slices were more like 5GB (say a 10% working set size, quite typical). Clearly, this is not the most effective use of fast disks. EMC has hand-waved this objection away in the past, but if it’s not important, why does the Symmetrix go with the smaller slice?
  5. Extremely slow transactional performance for the data that has been migrated to SATA, especially with RAID6 – EMC says you need to pair this with FAST Cache, which makes sense… Of course, come next day that data will move to SSD or FC drives, but will that be fast enough? Policies will have to be edited and maintained per application (often removing the auto-tiering by locking an app at a tier), which removes much of the automation on offer.
  6. The migration is I/O intensive, and we’re talking about migrations of 1GB slices (on a large array, many thousands of them). What does that mean for the back-end? After all, once a day all the migrations need to be processed… and will need to contend with normal I/O activity.
  7. Doesn’t support compressed blocks, data needs to be uncompressed in order to be moved.
  8. I still think this technology is most applicable to fairly predictable, steady workloads with good locality of reference.

Messaging inconsistencies

As I’ve mentioned before, I don’t have an issue with EMC’s technology, merely with the manner in which the capabilities and restrictions are messaged (or not, as the case may be). For instance, I’ve seen marketing announcements and blog entries talking about doing VMware on thin LUNs with compression etc. – sure, that could be space-efficient, but will it also be fast?

Now that the limitations of the new features are more understood, EMC’s marketing message loses some of its punch.

  • Will compression really work with busy VMware or DB setups?
  • Will thin LUNs be OK for busy systems?
  • Unless 20 disks are used for FAST Cache (only with a CX4-960), is the performance really enough to accelerate highly random writes on large systems?
  • What is the performance impact of thin LUNs for highly-intensive workloads?
  • What is the performance of a large system all running RAID6?
  • Last but not least – does the filesystem EMC uses allow defragmentation? By definition, features such as thin provisioning, compression and FAST will create fragmentation.

Moreover – what all the messaging lacks is some comparison to other people’s technology. Showing a video booting 1000 VMs in 50 minutes where before it took 100 is cool until you realize others do it in 12.

And why is EMC (I’m picking on them since they’re the most culpable in this aspect) ridiculing technologies such as NetApp’s Flash Cache and Compellent’s Data Motion only to end up implementing similar technologies and presenting things to the world as if they are unique in figuring this out? “You see, none of the other guys did it right, now that we did it it’s safe”.

Too many of the new features are extremely obscure in their design, if storage professionals can’t easily figure them out, how is the average consumer expected to? I think more openness is in order, otherwise it just looks like you have something to hide.

Ultimately – the devil is in the details, so why would you have to choose between space OR performance, and not be able to optimize space utilization AND performance?

I think it has to do with the original design of your storage system. Not all systems lend themselves to advanced features because of the way they started.

But that’s a subject for another day.


EMC’s incredible marketing and the FAST fairy tale (and a bit on how to reduce tiers)

I’m in MN prepping to teach a course (my signature anti-FUD extravaganza), and thought I’d get a few things off my chest that I’ve been meaning to write about for a while. Some Stravinsky to provide the vibes and I’m good to go. It’s getting really late BTW and I’m sure this will progressively get less coherent as time goes by, but I like to write my posts in one shot…

I never cease to be amazed by what’s possible with the power of great marketing/propaganda. And EMC is a company that has some of the best marketing anywhere. Other companies should take note!

Think about it: Especially on the CX, they took an auto-tiering implementation as baked as wheat that hasn’t been planted yet, and managed to create so much noise and excitement around it that many people think EMC actually invented the concept and, heavens, some even believe that the existing implementation is actually decent. Worse still, some have actually purchased it. Kudos to EMC. With the exception of some of Microsoft’s work, nobody reputable has the stones any more to release, amidst such fanfare, a product this unpolished. Talk about selling futures…

Perception is reality.

I’m an engineer by training and by trade first and foremost, and, regardless of bias, I consider the existing FAST implementation an affront. Allow me to explain, gentle reader…

The tiering concept

Some background info is in order. Most arrays of any decent size and complexity sold nowadays are configured with different kinds of disk, purely out of cost considerations. For instance, there may be 30 really fast drives where a bunch of important low-latency DBs live, another 100 pretty fast drives where most VMs and Exchange live, then 200 SATA drives for bulk storage and backups.

Don’t kid yourself: If the customer buying the aforementioned array had enough dough, they’d be getting the wunderbox with all super-fast drives inside – all the exact same kind of drives. That’s just simpler to deal with from a management standpoint and obviously the performance is stellar. Remember this point since we’ll get back to it…

Of course, not everyone is made of money, so arrays that look like the 3-tier example above are extremely common. Just enough drives of each type are purchased in order to achieve the end result.

What typically ends up happening is that, over time, some pieces of data end up in the wrong tier, for one reason or another. Maybe a DB that was super-important once now only needs to be accessed once a year; or a DB that was on SATA now has become the most frequently-accessed piece of data in the array. Or, perhaps, the importance of a DB flip-flops during a month, so it only needs to be fast maybe for month-end-processing. So now, you need to move stuff around so that what needs to be fast is shifted to the fast drives.

Pressure points and the need for passing the hot potato

But wait, there’s more…

The entire performance problem is created in the first place due to most array architectures being older than mud. In legacy array architectures, LUNs are carved out of RAID groups, typically made of relatively few disks. So, in an EMC Clariion, it’s best practices to have a 5-disk RAID5 group. You then ideally split up that group into no more than 2 LUNs and assign one to each controller.

With disks getting bigger and bigger, creating 1-2 LUNs can become exceedingly difficult – a 5-disk R5 group made with 450GB drives in a Clariion offers a bit over 1.5TB of space, which is too much for many application needs – maybe you just need 50GB here, another 300GB there… in the end, you may have 10 LUNs in that RAID group that’s supposed to have no more than 2. The new 600GB FC drives make this even worse.

So, in summary, what ends up happening is that you split up that RAID group into too many LUNs in order to avoid waste. And that’s where your array develops a serious pressure problem.

You see, now you may have 10 different servers hitting the exact same RAID group, creating undue pressure on the 5 poor disks struggling to cope with the crazy load. I/O service times get too high, queue lengths get crazy, users get cranky.

Again – this whole problem exists exactly because legacy array architectures don’t automatically balance I/O among all drives.

But for those afflicted Paleolithic systems, wouldn’t it be nice if we could move some of those hot LUNs, non-disruptively, to other RAID groups that don’t suffer from high pressure?

That’s what EMC’s FAST for the Symmetrix and CX does. It attempts to move entire LUNs to faster tiers like SSD. Which, BTW, is something you can do manually, but FAST attempts to automate the task (kinda, depends, etc).

The current FAST pitfalls

Let’s examine first how FAST (Fully Automated Storage Tiering) is implemented. Since it’s really 3 utterly different solutions, depending on whether you have Symm, CX or NS:

On the Symmetrix it’s always been there in the form of Symmetrix Optimizer, which may not have been aware of tiers but it definitely knew about migrating to less busy disks. Now you can teach it about tiers, too. But it’s not, in my mind, a new product, even if EMC would like you to believe it is. It looks to me too much like Optimizer + some new heuristics. But the Gods of Marketing managed to create unbelievable commotion about something that was an old feature. What amazes me is that nobody seems to have made the connection – maybe I’m really missing something. I’m sure someone from EMC will correct me if I’m wrong. In my experience, Optimizer, when purchased, often did more harm than good, was difficult to manage and, ultimately, was left inactive in many shops – with the beancounters lamenting the spending of precious funds on something that never quite worked that well. Oh, and it seems the current version doesn’t support thin LUNs. But of the FAST implementations on EMC gear it is the more complete version, exactly because Optimizer has been there for a long time…

On the far more popular CX platform, what happens is like a tribute to kludges everywhere. Consider this:

  1. Movement is one-way only (FC to SATA, or FC to SSD). More of a one-shot tool than continuous optimization!
  2. You need a separate PC that will crunch Navisphere Analyzer performance logs, this takes a while
  3. The PC will then provide a list of recommendations
  4. Depending on which LUNs you approve it will invoke a NaviCLI command to move the specified LUNs in the box
  5. Doesn’t support thin provisioning
  6. Not sure if it supports MetaLUNs
  7. It is NOT automatic since you have to approve the move! Ergo, it should not be sold under the name “FAST” since the “A” stands for “Automated”, aren’t there laws for false advertising?

On the Celerra NS platform (EMC’s NAS), one needs to purchase the Rainfinity FMA boxes, which then can move files between tiers of disk based on frequency of access. One is then limited by the scalability of the FMA – how many files can it track? How dynamically can it react to changing workloads? What if the FMA breaks? Why do I need yet more boxes to do this?

Ah, but it gets better with FASTv2! Or does it?

EMC has been upfront that FAST will become way cooler with v2. It better be, since as you can see it’s no great shakes at the moment. From what the various EMC bloggers have been posting, it seems FASTv2 will use the thin provisioning subsystem to go to a sub-LUN level of granularity.

The granularity will obviously depend on how many disks you have in the virtual provisioning pool, since a LUN (just like with MetaLUNs) will be split up so that it occupies all the disks in the pool. The bigger the pool, the better. This should provide better performance (it does with other vendors) yet EMC in their docs state the current version of virtual provisioning (at least on the CX) has higher overhead when compared to their traditional LUNs and will provide less performance. I guess that’s a subject for another day, and maybe they’ll finally revamp the architecture to fix it. Back to FASTv2:

The “busyness” of each LUN segment will be analyzed, and that segment will then move, if applicable, to another tier. Of course, how efficient that will end up being will depend on how you do I/O to the LUN in the first place! If the LUN I/O is fairly spatially uniform, then the whole thing will have to move just like FASTv1. But I guess with v2 there’s at least the potential of sub-LUN migration, for cases where a clearly delineated part of the LUN is really “hot” or “cold”. Obviously, since the chunk size will still be significantly large, expect a bunch of non-applicable data to move with the stuff that should be moved.

The real problem

First, to give credit where it’s due: Compellent already has had sub-LUN moves for a long, long time. Give those guys props. They actually deserve it.

However – both the Compellent approach as well as FASTv2 and, even worse, v1, suffer from this fundamental issue:

Lack of real-time acceleration.

Think about it – performance has to be analyzed periodically, heuristics followed, then LUNs or pieces of LUNs have to be moved around. This is not something that can respond instantly to performance demands.

Consider this scenario:

You have a payroll DB that, during most of the month, does absolutely nothing. A fully automated tiering system will say “hey, nobody has touched this LUN in weeks, I better move it to SATA!”

Then crunch time comes, and the DB is on the SATA drives. Oopsie.

People complain, and the storage admin is forced to manually migrate it back to SSD.

Kinda defeats the whole purpose… unless I’m missing something the size of Titanic.

So, you may have to write all kinds of exception rules (provided the system lets you). Some rules for most DBs, Exchange, a few apps here and there…

Soon, you’re actually in a worse state than where you begun: You have the added complexity and cost of FAST, plus you have to worry about creating exception rules.

Now here’s a novel idea…

What if you actually put your data in the right tier to begin with and what if, even if you didn’t, it didn’t matter too much?

For instance – normal fileshares, deep archives, large media files, backups to disk – most people would agree that those workloads should probably forever be on SATA if you’re trying to save some money. With 2TB drives, the SATA tier has become super-dense, which can be very useful for quite a few use cases.

DBs, VM OS files – should usually be on faster disk. But no need to go nuts with several tiers of fast disk, a single fast tier should be sufficient!

LUNs and other array objects should try to automatically span as many drives as possible by default without you having to tell the array to do that… that way you avoid the hot spots in the first place by design, thereby reducing or even removing the need for migrations (I can still see some very limited cases where migration would be useful).

And finally, large, intelligent cache (as in really large) to help with real-time workload demands, dynamically and as-needed, by caching tiny 4K chunks and not wasting space on gigantic pieces… with the ability to prioritize the caching if needed. Not to mention being deduplication-aware.

Wouldn’t that be a bit simpler to manage, more nimble and more useful in real-world scenarios? The cache will help out even the slower drives for both file and OLTP-type workloads.

Maybe life doesn’t need to be complicated after all.

It’s almost 0300 so I’d better go to bed…