Is convenience devaluing products? Does quality suffer because of it?

Kind of a long hiatus posting (far too busy working on cool stuff) and for you looking for a deep technical post this may not be it… but here goes anyway since the content may also apply to my more usual subjects.

Recently I decided to discard my Luddite membership card and join the hordes of people using network-based services for music.

The experiment is ongoing – I do like the convenience of being able to select almost any song or album for the monthly equivalent cost of less than what an album is worth.

It’s a pretty good deal if you listen to a lot of new music, and/or you don’t like listening to ads on the radio.

There’s a plethora of free offerings but if you are mobile and want to use it on your phone, there’s usually a cost involved to have the convenience of selecting the exact songs you like.

How convenience has affected me

I did notice several interesting aspects in which this newfound convenience has changed my listening habits in a positive way:

  1. I am discovering a lot more new music since it’s so incredibly easy to do so. And some old music I never gave a chance to.
  2. Sharing music with other people is easy and involves no illegal copying of data.
  3. I don’t have to worry about putting the “right” music in my portable device – I can stream what I want, from wherever I want, even on devices I don’t own, and even designate items for “offline use” – meaning they’ll be cached and playable even if I’m not connected to a network.
  4. I have easy access to most of my oldie favorites that I might normally not keep in my device due to space reasons.
  5. The quality is very good. But we won’t go into psychoacoustics here :)

However, there have also been some pretty negative aspects to all this convenience… for instance:

  1. I realize I now suffer from music ADD – I seldom just sit down and listen to a whole album like we all used to do in the olden days.
  2. Albums now have zero monetary value in my mind – they’re just part of the low monthly fee.
  3. If albums have a perceived zero monetary value, they become a commodity and not something to be treasured. I remember when it was a huge deal to get a new album from my favorite artists: the anticipation, the excitement, the trip to the record store, waiting in line, scarcity, the artwork in the packaging, the sheer physicality of it all. This combination of attributes ensured I would at least give that album a chance – indeed, I was likely to listen to it repeatedly, analyze it and appreciate the artistry involved. I was invested.
  4. As a result of this devaluing, amazing works of art that were extremely difficult to accomplish may now be skipped altogether because they may be a bit time-consuming or even difficult to “get into” – some concept albums you just need to be in the right frame of mind for and/or have the requisite amount of time to listen to the story unfold. Since there’s no perceived investment and no excitement, it’s less likely to spend the energy trying to get into the album, no matter how rewarding it may be in the end.
  5. For something more practical: The toll on the mobile devices’ batteries is 2-3x that of just playing music natively (even without streaming – the tracks are encrypted so you can’t just lift them from the storage, which adds CPU cycles to decrypt, plus some products use codecs more computationally intensive than mp3). Best have a device with a fast CPU.
  6. An extended unplanned network or music provider outage will mean no access to music.

How this applies to other aspects of our lives

I wonder now what other conveniences have affected our lives significantly?

And are we all looking for that quick fix, the easy way out?

Are we heading towards the world depicted in the movie Idiocracy? (very interesting flick – it’s worth watching for the premise alone).

Already, most of us in the more “civilized” parts of the globe don’t know how to hunt down and skin an animal, build a weapon, start a fire, build a shelter. That is knowledge that convenience robbed us of many years ago. You can study how to do those things, but chances are, if you’re in need to do so, you won’t have the training to be anywhere near as successful as our ancestors were in those endeavors.

Same goes for taking pictures – aside from a few people that still develop and print their own film, most of us use digital (with the same deluge of information problem described in the music section above – thousands of pictures may now be taken during a vacation, where previously no more than a hundred would, with tremendous love and care – but most of the hundred were keepers).

Many of us are getting heavier, too – convenient access to food and low levels of physical activity (since locomotion is so convenient) being the killer combination.

Does quality suffer because of convenience?

Conveniences aren’t a bad thing overall – I am not hankering for the destruction of all things convenient. However, I posit that certain aspects of quality absolutely suffer because of convenience:

  1. Consumers are more likely to pick an easier to use, throw-away and even short-sighted product over a better-engineered, longer-lasting one – shifting the engineering emphasis instead to ease-of-use and disposability.
  2. The quality of workers in many fields isn’t what it used to be.
  3. We are heading towards more generalists and less specialists.
  4. Troubleshooting is becoming a lost art.

I’m not sure how to even conclude – I’m probably part of the problem since one of the things I do is help make very advanced technology easier to consume and more forgiving.

Just don’t get too comfortable.

Toilet chair

Thx

D

EMC’s VNX2 forklift: The importance of hardware re-use, slowing down obsolescence, and maximizing your investment

It was with interest that I watched the launch of EMC’s VNX refresh. The updated boxes got some long-awaited features, EMC talked a lot about how some pretty severe single-threaded bottlenecks were removed, more CPU and memory was put in, and there was much rejoicing.

Not really trying to pick on the new boxes (that will be in a future post, relax :) ), but what I thought was interesting was that the code that makes most of the new features a possibility cannot be loaded on current-gen VNX boxes (not even the biggest, the 7500, which has plenty of CPU and RAM juice even compared to the next gen boxes).

Software-defined storage indeed.

The existing VNX disk shelves also seemingly can’t be re-used at the moment (correct me if I’m wrong please).

This forced obsolescence has been a theme with EMC: Clariion -> VNX -> VNX2 are all complete forklift upgrades. When the original VNX was released, it was utterly incompatible with the CX (Clariion) shelves (SAS replaced FC connectivity). Despite using the exact same code (FLARE).

Other vendors are guilty of this too – a new controller is released, and all prior investments on disk shelves are rendered useless (HDS did this with several iterations of AMS, maybe even with AMS -> HUS, HP with EVA…)

I understand that as technology progresses we sometimes have to abandon the old to make room for the new but, at the same time, customers make significant investments in N-1 technology – and often want to be able to re-use some of their investment with N (and sometimes N+1).

I just had this conversation with a customer, and he said “well, I throw away my gear every 3 years, why should I care?

Let’s try a thought experiment.

Imagine you just bought a system that’s running the fastest controllers a company sells today, and you got 1PB of storage behind it.

Now, imagine that a mere month after you purchased your controllers, new ones are released that are significantly faster. OK, that stuff happens.

Your gear is not 3 years old yet. It’s 1 month old. It’s running OK.

Now, imagine that your array runs out of steam 6 months later due to unprecedented performance growth. Your system is now 7 months old. You can’t just throw it away and start fresh.

You could buy a new storage system and migrate some of the data to share the load. However, you don’t need more space – you just ran out of controller headroom. Indeed, you still have tons of free space.

But what if you could replace the controllers with the new, beefier ones? And maintain your investment in the 1PB of storage, cache etc? Wouldn’t that be nice?

Or at least be able to move some of the storage pools you may have to the new family of controllers? Even if you had to reformat the disk?

Well – most vendors will tell you “sorry, no, you need to migrate your data to the new box”.

Let’s try another thought experiment.

You bought a storage system a year ago. It performs fine, but it lacks true deduplication capabilities. You have determined it would save you a lot of storage space (= money) if your array had deduplication.

The vendor you purchased the system from announces the refreshed storage OS that finally includes deduplication. And that same vendor made a truly gigantic fuss about software-defined storage, which made everyone feel software would be the big enabler and that coolness was a mere firmware upgrade away.

However, you are eventually told they will not allow the code that enables deduplication to be loaded to your array, and, instead, they ask you to migrate to the refreshed array that supports deduplication. Since the updated code somehow only runs on the new box. Something to do with unicorn milk.

But your array has plenty of CPU headroom to handle deduplication… and you could reformat the disks given some swing storage shelves if the underlying disk format is the issue. But the option is not provided.

How NetApp does things instead

At NetApp we sort of take it for granted so we don’t make a big fuss about software-defined storage, but hardware was always considered an enabler for the software and not the other way around. 

For instance: deduplication was released as a free software upgrade for Data ONTAP (the OS for our main line of storage). Back in 2007. For all storage protocols.

In general, we try to let systems be able to load at a minimum N+1 software releases, but most of the time we utterly spoil customers and go far above and beyond, unless we’re talking about the smallest boxes, which naturally have less headroom.

For example, the now aging FAS 3070 I have in the local lab (the bigger of the older midrange boxes, released in 2006) supports anything from ONTAP 7.2.1 (what it was released with) to ONTAP 8.1.3 (released in mid-2013).

This spans multiple major ONTAP releases – huge changes in the code have happened during those releases: 7.3, 8.0, 8.1… Multiple newer arrays were also released as replacements for the 3070: 3170, 3270, 3250.

Yet the 3070 soldiers on with a fully supported, modern OS, 7 years later.

What arrays did our competitors have back then? What is the most modern OS those same arrays can run today? What is that OS missing vs the OS that competitor’s more modern arrays have?

Let’s talk disk shelves.

We used FC loop connectivity for the older shelves (DS14). We then switched to fancy multi-channel SAS and totally different shelves and disks, but never stopped supporting the older shelf technology, even with newer controllers.

That’s the big one. I have customers with DS14 shelves that they purchased for a 3070 that they now have on a 3270 running 8.2. It all works, all supported. Other vendors cut support off after the transition from FC to SAS.

Will we support those older shelves forever? No, that’s impossible, but at least we give our customers a lot of leeway and let them stretch their hardware investments significantly longer than any other major storage vendor I can think of.

Think long term

I encourage customers to always think long term. Try to stop thinking in 3-year increments. Start thinking of other ways you can stretch your investment, other ways to deploy older gear while still keeping it interoperable with newer hardware.

And start thinking about what will happen to your investment once newer gear is released.

D

Technorati Tags: , , ,

How to decipher EMC’s new VNX pre-announcement and look behind the marketing.

It was with interest that I watched some of EMC’s announcements during EMC World. Partly due to competitor awareness, and partly due to being an irrepressible nerd, hoping for something really cool.

BTW: Thanks to Mark Kulacz for assisting with the proof points. Mark, as much as it pains me to admit so, is quite possibly an even bigger nerd than I am.

So… EMC did deliver something. A demo of the possible successor to VNX (VNX2?), unavailable as of this writing (indeed, a lot of fuss was made about it being lab only etc).

One of the things they showed was increased performance vs their current top-of-the-line VNX7500.

The aim of this article is to prove that the increases are not proportionally as much as EMC claims they are, and/or they’re not so much because of software, and, moreover, that some planned obsolescence might be coming the way of the VNX for no good reason. Aside from making EMC more money, that is.

A lot of hoopla was made about software being the key driver behind all the performance increases, and how they are now able to use all CPU cores, whereas in the past they couldn’t. Software this, software that. It was the theme of the party.

OK – I’ll buy that. Multi-core enhancements are a common thing in IT-land. Parallelization is key.

So, they showed this interesting chart (hopefully they won’t mind me posting this – it was snagged from their public video):

MCX core util arrow

I added the arrows for clarification.

Notice that the chart above left shows the current VNX using, according to EMCmaybe a total of 2.5 out of the 6 cores if you stack everything up (for instance, Core 0 is maxed out, Core 1 is 50% busy, Cores 2-4 do little, Core 5 does almost nothing). This is important and we’ll come back to it. But, currently, if true, this shows extremely poor multi-core utilization. Seems like there is a dedication of processes to cores – Core 0 does RAID only, for example. Maybe a way to lower context switches?

Then they mentioned how the new box has 16 cores per controller (the current VNX7500 has 6 cores per controller).

OK, great so far.

Then they mentioned how, By The Holy Power Of Software,  they can now utilize all cores on the upcoming 16-core box equally (chart above, right).

Then, comes the interesting part. They did an IOmeter test for the new box only.

They mentioned how the current VNX 7500 would max out at 170,000 8K random reads from SSD (this in itself a nice nugget when dealing with EMC reps claiming insane VNX7500 IOPS). And that the current model’s relative lack of performance is due to the fact its software can’t take advantage of all the cores.

Then they showed the experimental box doing over 5x that I/O. Which is impressive, indeed, even though that’s hardly a realistic way to prove performance, but I accept the fact they were trying to show how much more read-only speed they could get out of extra cores, plus it’s a cooler marketing number.

Writes are a whole separate wrinkle for arrays, of course. Then there are all the other ways VNX performance goes down dramatically.

However, all this leaves us with a few big questions:

  1. If this is really all about just optimized software for the VNX, will it also be available for the VNX7500?
  2. Why not show the new software on the VNX7500 as well? After all, it would probably increase performance by over 2x, since it would now be able to use all the cores equally. Of course, that would not make for good marketing. But if with just a software upgrade a VNX7500 could go 2x faster, wouldn’t that decisively prove EMC’s “software is king” story? Why pass up the opportunity to show this?
  3. So, if, with the new software the VNX7500 could do, say, 400,000 read IOPS in that same test, the difference between new and old isn’t as dramatic as EMC claims… right? :)
  4. But, if core utilization on the VNX7500 is not as bad as EMC claims in the chart (why even bother with the extra 2 cores on a VNX7500 vs a VNX5700 if that were the case), then the new speed improvements are mostly due to just a lot of extra hardware. Which, again, goes against the “software” theme!
  5. Why do EMC customers also need XtremeIO if the new VNX is that fast? What about VMAX? :)

Point #4 above is important. For instance, EMC has been touting multi-core enhancements for years now. The current VNX FLARE release has 50% better core efficiency than the one before, supposedly. And, before that, in 2008, multi-core was advertised as getting 2x the performance vs the software before that. However, the chart above shows extremely poor core efficiency. So which is it? 

Or is it maybe that the box demonstrated is getting most of its speed increase not so much by the magic of better software, but mostly by vastly faster hardware – the fastest Intel CPUs (more clockspeed, not just more cores, plus more efficient instruction processing), latest chipset, faster memory, faster SSDs, faster buses, etc etc. A potential 3-5x faster box by hardware alone.

It doesn’t quite add up as being a software “win” here.

However – I (or at least current VNX customers) probably care more about #1. Since it’s all about the software after all:)

If the new software helps so much, will they make it available for the existing VNX? Seems like any of the current boxes would benefit since many of their cores are doing nothing according to EMC. A free performance upgrade!

However… If they don’t make it available, then the only rational explanation is that they want to force people into the new hardware – yet another forklift upgrade (CX->VNX->”new box”).

Or maybe that there’s some very specific hardware that makes the new performance levels possible. Which, as mentioned before, kinda destroys the “software magic” story.

If it’s all about “Software Defined Storage”, why is the software so locked to the hardware?

All I know is that I have an ancient NetApp FAS3070 in the lab. The box was released ages ago (2006 vintage), and yet it’s running the most current GA ONTAP code. That’s going back 3-4 generations of boxes, and it launched with software that was very, very different to what’s available today. Sometimes I think we spoil our customers.

Can a CX3-80 (the beefiest of the CX3 line, similar vintage to the NetApp FAS3070) take the latest code shown at EMC World? Can it even take the code currently GA for VNX? Can it even take the code available for CX4? Can a CX4-960 (again, the beefiest CX4 model) take the latest code for the shipping VNX? I could keep going. But all this paints a rather depressing picture of being able to stretch EMC hardware investments.

But dealing with hardware obsolescence is a very cool story for another day.

D

 

Technorati Tags: , , , , , , ,

More EMC VNX caveats

Lately, when competing with VNX, I see EMC using several points to prove they’re superior (or at least not deficient).

I’d already written this article a while back, and today I want to explore a few aspects in more depth since my BS pain threshold is getting pretty low. The topics discussed:

  1. VNX space efficiency
  2. LUNs can be served by either controller for “load balancing”
  3. Claims that autotiering helps most workloads
  4. Claims that storage pools are easier
  5. Thin provisioning performance (this one’s interesting)
  6. The new VNX snapshots

References to actual EMC documentation will be used. Otherwise I’d also be no better than a marketing droid.

VNX space efficiency

EMC likes claiming they don’t suffer from the 10% space “tax” NetApp has. Yet, linked here are the best practices showing that in an autotiered pool, at least 10% free space should be available per tier in order for autotiering to be able to do its thing (makes sense).

Then there’s also a 3GB minimum overhead per LUN, plus metadata overhead, calculated with a formula in the linked article. Plus possibly more metadata overhead if they manage to put dedupe in the code.

My point is: There’s no free lunch. If you want certain pool-related features, there is a price to pay. Otherwise, keep using the old-fashioned RAID groups that don’t offer any of the new features but at least offer predictable performance and capacity utilization.

LUNs can be served by either controller for “load balancing”

This is a fun one. The claim is that LUN ownership can be instantly switched over from one VNX controller to another in order to load balance their utilization. Well – as always, it depends. It’s also important to note that VNX as of this writing does not do any sort of automatic load balancing of LUN ownership based on load.

  1. If it’s using old-fashioned RAID LUNs: Transferring LUN ownership is indeed doable with no issues. It’s been like that forever.
  2. If the LUN is in a pool – different story. There’s no quick way to shift LUN ownership to another controller without significant performance loss.

There’s copious information here. Long story short: You don’t change LUN ownership with pools, but rather need to do a migration of the LUN contents to the other controller (to another LUN, you can’t just move the LUN as-is – this also creates issues), otherwise there will be a performance tax to pay.

Claims that autotiering helps most workloads

Not so FAST. EMC’s own best practice guides are rife with caveats and cautions regarding autotiering. Yet this feature is used as a gigantic differentiator at every sales campaign.

For example, in the very thorough “EMC Scaling performance for Oracle Virtual Machine“, the following graph is shown on page 35:

NewImage

The arrows were added by me. Notice that most of the performance benefit is provided once cache is appropriately sized. Adding an extra 5 SSDs for VNX tiering provides almost no extra benefit for this database workload.

One wonders how fast it would go if an extra 4 SSDs were added for even more cache instead of going to the tier… :)

Perchance the all-cache line with 8 SSDs would be faster than 4 cache SSDs and 5 tier SSDs, but that would make for some pretty poor autotiering marketing.

Claims that storage pools are easier

The typical VNX pitch to a customer is: Use a single, easy, happy, autotiered pool. Despite what marketing slicks show, unfortunately, complexity is not really reduced with VNX pools – simply because single pools are not recommended for all workloads. Consider this typical VNX deployment scenario, modeled after best practice documents:

  1. RecoverPoint journal LUNs in a separate RAID10 RAID group
  2. SQL log LUNs in a separate RAID10 RAID group
  3. Exchange 2010 log LUNs in a separate RAID10 RAID group
  4. Exchange 2010 data LUNs can be in a pool as long as it has a homogeneous disk type, otherwise use multiple RAID groups
  5. SQL data can be in an autotiered pool
  6. VMs might have to go in a separate pool or maybe share the SQL pool
  7. VDI linked clone repository would probably use SSDs in a separate RAID10 RAID group

OK, great. I understand that all the I/O separation above can be beneficial. However, the selling points of pooling and autotiering are that they should reduce complexity, reduce overall cost, improve performance and improve space efficiency. Clearly, that’s not the case at all in real life. What is the reason all the above can’t be in a single pool, maybe two, and have some sort of array QoS to ensure prioritization?

And what happens to your space efficiency if you over-allocate disks to the old-fashioned RAID groups above? How do you get the space back?

What if you under-allocated? How easy would it be to add a bit more space or performance? (not 2-3x – let’s say you need just 20% more). Can you expand an old-fashioned VNX RAID group by a couple of disks?

And what’s the overall space efficiency now that this kind of elaborate split is necessary? Hmm… ;)

For more detail, check these Exchange and SQL design documents.

Thin provisioning performance

This is just great.

VNX thin provisioning performs very poorly relative to thick and even more poorly relative to standard RAID groups. The performance issue makes complete sense due to how space is allocated when writing thin on a VNX, with 8KB blocks assigned as space is being used. A nice explanation of how pool space is allocated is here. A VNX writes to pools using 1GB slices. Thick LUNs pre-allocate as many 1GB slices as necessary, which keeps performance acceptable. Thin LUNs obviously don’t pre-allocate space and currently have no way to optimize writes or reads – the result is fragmentation, in addition to the higher CPU, disk and memory overhead to maintain thin LUNs :)

From the Exchange 2010 design document again, page 23:

NewImage

Again, I added the arrows to point out a couple of important things:

  1. Thin provisioning is not recommended for high performance workloads on VNX
  2. Indeed, it’s so slow that you should run your thin pools with RAID10!!!

But wait – thin provisioning is supposed to help me save space, and now I have to run it with RAID10, which chews up more space?

Kind of an oxymoron.

And what if the customer wants the superior reliability of RAID6 for the whole pool? How fast is thin provisioning then?

Oh, and the VNX has no way to fix the fragmentation that’s rampant in its thin LUNs. Short of a migration to another LUN (kind of a theme it seems).

The new VNX snapshots

The VNX has a way to somewhat lower the traditionally extreme impact of FLARE snapshots by switching from COFW (Copy On First Write) to ROFW (Redirect On First Write).

The problem?

The new VNX snapshots need a pool, and need thin LUNs. It makes sense from an engineering standpoint, but…

Those are exactly the 2 VNX features that lower performance.

There are many other issues with the new VNX snapshots, but that’s a story for another day. It’s no wonder EMC pushes RecoverPoint far more than their snaps…

The takeaway

There’s marketing, and then there’s engineering reality.

Since the VNX is able to run both pools and old-fashioned RAID groups, marketing wisely chooses to not be very specific about what works with what.

The reality though is that all the advanced features only work with pools. But those come with significant caveats.

If you’re looking at a VNX – at least make sure you figure out whether the marketed features will be usable for your workload. Ask for a full LUN layout.

And we didn’t even talk about having uniform RAID6 protection in pools, which is yet another story for another day.

D

Technorati Tags: , , , , , , , , ,

Are SSDs reliable enough? The importance of extensive testing under adverse conditions.

Recently, interesting research (see here) from researchers at Ohio State was presented at USENIX. 

To summarize, they tested 15 SSDs, several of them “Enterprise” grade, and subjected them to various power fault conditions. 

Almost all the drives suffered data loss that should not have occurred, and some were so corrupt as to be rendered utterly unusable (could not even be seen on the bus). It’s worth noting that spinning drives used in enterprise arrays would not have suffered the same way.

It’s not just an issue of whether or not the SSD has some supercapacitors in order to de-stage the built-in RAM contents to flash – a certain very prominent SSD vendor was hit with this issue even though the SSDs in question had the supercapacitors, generous overprovisioning and internal RAID. A firmware issue is suspected and this is not fixed yet.

You might ask, why am I mentioning this?

Several storage systems try to lower SSD costs by using cheap SSDs (often consumer models found in laptops, not even eMLC) and then try to get more longevity out said SSDs by using clever write techniques in order to minimize the amount of data written (dedupe, compression) as well as make the most of wear-leveling the flash chips in the box by also writing in flash-friendly ways (more appends, less overwrites, moving data around as needed, and more).

However, all those (perfectly valid) techniques have a razor-sharp focus on the fact that cheaper flash has a very limited number of write/erase cycles, but are utterly unrelated to things like massive corruption stemming from weird power failures or firmware bugs (and, after having lived through multiple UPS and generator failures, I don’t accept those as a complete answer, either).

On the other hand, the Tier 1 storage vendors typically do pretty extensive component testing, including various power failure scenarios, from the normal to the very strange. The system has to withstand those, then come up no matter what. Edge cases are tested as a matter of course – a main reason people buy enterprise storage is how edge cases are handled… :)

At NetApp, when we certify SSDs, they go through an extra-rigorous process since we are paranoid and they are still a relatively new technology. We also offer our standard dual-parity RAID, along with multiple ways to safeguard against lost writes, for all media. The last thing one needs is multiple drives failing due to a strange power failure or a firmware bug.

Protection against failures is even more important in storage systems that lack the extra integrity checks NetApp offers. Those non-NetApp systems that use SSDs either as their only storage or as part of a pool can suffer catastrophic failures if the integrity of the SSDs is compromised sufficiently since, by definition, if part of the pool fails, then the whole pool fails, which could mean the entire storage system may have to be restored from backup.

For those systems where cheap SSDs are merely used as an acceleration mechanism, catastrophic performance failures are a very real potential outcome. 1000 VDI users calling the helpdesk is not my idea of fun.

Such component behavior is clearly unacceptable.

Proper testing comes with intelligence, talent, but also experience and extensive battle scarring. Back when NetApp was young, we didn’t know the things we know today, and couldn’t handle some of the fault conditions we can handle today. Test harnesses in most Tier 1 vendors become more comprehensive as new problems are discovered, and sometimes the only way to discover the really weird problems is through sheer numbers (selling many millions of units of a certain component provides some pretty solid statistics regarding its reliability and failure modes).

“With age comes wisdom”.

 

D

 

Technorati Tags: , , , ,

Beware of benchmarking storage that does inline compression

In this post I will examine the effects of benchmarking highly compressible data and why that’s potentially a bad idea.

Compression is not a new storage feature. Of the large storage vendors, at a minimum NetApp, EMC and IBM can do it (depending on the array). <EDIT (thanks to Matt Davis for reminding me): Some arrays also do zero detection and will not write zeroes to disk – think of it as a specialized form of compression that ONLY works on zeroes>

A lot of the newer storage vendors are now touting real-time compression for all data (often used instead of true deduplication – it’s just easier to implement compression).

Nothing wrong with real-time compression. However, and here’s where I have a problem with some of the sales approaches some vendors follow:

Real-time compression can provide grossly unrealistic benchmark results if the benchmarks used are highly compressible!

Compression can indeed provide a performance benefit for various data types (simply since less data has to be read and written from disk), with the tradeoff being CPU. However, most normal data isn’t composed of all zeroes. Typically, compressing data will provide a decent benefit on average, but usually not several times.

So, what will typically happen is, a vendor will drop off one of their storage appliances and provide the prospect with some instructions on how to benchmark it with your garden variety benchmark apps. Nothing crazy.

Here’s the benchmark problem

A lot of the popular benchmarks just write zeroes. Which of course are extremely easy for compression and zero-detect algorithms to deal with and get amazing efficiency out of, resulting in extremely high benchmark performance.

I wanted to prove this out in an easy way that anyone can replicate with free tools. So I installed Fedora 18 with the btrfs filesystem and ran the bonnie++ benchmark with and without compression. The raw data with mount options etc. is here. An explanation of the various fields here. Not everything is accelerated by btrfs compression in the bonnie++ benchmark, but a few things really are (sequential writes, rewrites and reads):

Bonniebtrfs

Notice the gigantic improvement (in write throughput especially) btrfs compression affords with all-zero data.

Now, does anyone think that, in general, the write throughput will be 300MB/s for a decrepit 5400 RPM SATA disk?  That will be impossible unless the user is constantly writing all-zero data, at which point the bottlenecks lie elsewhere.

Some easy ways for dealing with the compressible benchmark issue

So what can you do in order to ensure you get a more realistic test for your data? Here are some ideas:

  • Always best is to use your own applications and not benchmarks. This is of course more time-consuming and a bigger commitment. If you cant do that, then…
  • Create your own test data using, for example, dd and /dev/random as a source in some sort of Unix/Linux variant. Some instructions here. You can even move that data to use with Windows and IOmeter – just generate the random test data in UNIX-land and move the file(s) to Windows.
  • Another, far more realistic way: Use your own data. In IOmeter, you just copy one of your large DB files to iobw.tst and IOmeter will use your own data to test… Just make sure it’s large enough and doesn’t all fit in array cache. If not large enough, you could probably make it large enough by concatenating multiple data files and random data together.
  • Use a tool that generates incompressible data automatically, like the AS-SSD benchmark (though it doesn’t look like it can deal with multi-TB stuff – but worth a try).

 

In all cases though, be aware of how you are testing. There is no magic :)

D

 

Technorati Tags: , ,

Are some flash storage vendors optimizing too heavily for short-lived NAND flash?

I really resisted using the “flash in the pan” phrase in the title… first, because the term is overused and second, because I don’t believe solid state is of limited value. On the contrary.

However, I am noticing an interesting trend among some newcomers in the array business, desperate to find a flash niche to compete in:

Writing their storage OS around very specific NAND flash technologies. Almost as bad as writing an entire storage OS to support a single hypervisor technology, but that’s a story for another day.

Solid state technology is still too fluid. Unlike spinning disk technology that is overall very reliable and mature and likely won’t see huge advances in the years to come, solid state technology seems to advance almost weekly. New SSD controllers are coming out almost too frequently, and new kinds of solid state storage are either out now (Triple Level Cell, anyone?) or coming in the future (MRAM, ReRAM, FeRAM, PCM, PMC, and probably a lot more that I’m forgetting).

My point is:

How far ahead are certain vendors thinking if they are writing an entire storage OS around the limitations of a class of storage that may look very different in just a year or two?

Some of them go really deep and try to do all kinds of clever optimizations to ensure good wear leveling for the flash chips. Some write their own controller software and use bare NAND flash chips, not even off-the-shelf SSDs. Which is great, but what if you don’t need to do that in two years? Or what if the optimizations need to be drastically different for the new technologies? How long will coding for the new flash technologies take? Or will they be stuck using old technologies? Food for thought.

I guess some of us are in it for the long haul, and some aren’t. “Can’t see the forest for the trees” comes to mind. “Gold rush” also seems relevant.

I strongly believe general-purpose storage OSes need to be flexible enough to be reasonably adaptable to different underlying media. And storage OSes that are specifically designed for solid state storage need to be especially flexible regarding the underlying SSD technology to avoid the problems outlined above, and to avoid the relative lack of reliability of current SSD solutions (another story for another day).

At the moment I don’t see clear winners yet. I see a few great short-term stories, but who has the most flexible architecture to be able to deal with different kinds of technologies for years to come?

D

Technorati Tags: , ,

Filesystem and OS benchmark extravaganza: Software makes a huge difference

I’ve published benchmarks of various OSes and filesystems in the past, but this time I thought I’d try a slightly different approach.

For the regular readers of my articles, I think there is something in here for you. The executive summary is this:

Software is what makes hardware sing. 

All too frequently I see people looking at various systems and focus on what CPU each box has, how many GHz, cores etc, how much memory. They will try to compare systems by putting the specs on a spreadsheet.

OK – let’s see how well that approach works.

For this article I used the NetApp postmark benchmark. You can find the source and various executables here. I’ve been using this version for many years. The way I always run it:

  • set directories 5
  • set number 10000
  • set transactions 20000
  • set read 4096
  • set write 4096
  • set buffering false
  • set size 500 100000
  • run

This way it will create 10,000 files, and do 20,000 things with them, all in 5 directories, with a 4K read and write, no buffered I/O, and the files will range in size from 500 bytes to 100,000 bytes in size. It will also ask the OS to bypass the buffer cache. Then it will delete everything.

It’s fairly easy to interpret since the workload is pre-determined, and a low total time to complete the workload is indicative of good performance, instead of doing an IOPS/latency measurement.

This is very much a transactional/metadata-heavy way to run the benchmark, and not a throughput one (as you will see from the low MB/s figures). Overall, this is an old benchmark, not multithreaded, and probably not the best way to test stuff any more, but, again, for the purposes of this experiment it’s more than enough to illustrate the point of the article (if you want to see more comprehensive tests, visit phoronix.com).

It’s very important to realize one thing before looking at the numbers: This article was NOT written to show which OS or filesystem is overall faster. Just to illustrate the differences an OS and filesystem can potentially mean to I/O performance given the same exact base hardware and the same application driving the same workload.

 

Configuration

OSes tested (all 64-bit, latest patches as of the time of writing):

  • Windows Server 2012
  • Ubuntu 12.10 (in Linux Mint guise)
  • Fedora 18
  • Windows 8
  • Windows 7

For Linux, I compiled postmark with the -O3 directive, the deadline scheduler was used, I disabled filesystem barriers and used the following sysctl tweaks:

  • vm.vfs_cache_pressure=50
  • vm.swappiness=20
  • vm.dirty_background_ratio=10
  • vm.dirty_ratio=30

Filesystems tested:

  • NTFS
  • XFS
  • BTRFS
  • EXT4

The hardware is unimportant, the point is it was the exact same for all tests and that there was no CPU or memory starvation… There was no SSD involved, BTW, even though some of the results may make it seem like there was.

 

The results

First, transactions per second. That doesn’t include certain operations. Higher is better.

Postmark tps

 

Second, total time. This includes all transactions, file creation and deletion:

Postmark time

 

Finally, the full table that includes the mount parameters and other details:

Postmark table

 

Some interesting discoveries…

…At least in the context of this test, the same findings may or may not apply with other workloads and hardware configurations:

  • Notice how incredibly long Windows 8 took with its default configuration. I noticed the readyboost file being modified while the benchmark was running, which must be a serious bug, since readyboost isn’t normally used for ephemeral stuff (windows 7 didn’t touch that file while the benchmark was running). If you’re running Windows 8 you probably want to consider disabling the superfetch service… (I believe it’s disabled by default if Windows detects an SSD).
  • BTRFS performance varies too much between Linux kernels (to be expected from a still experimental filesystem). With The Ubuntu 12.10 kernel, it looks like it ignored the directive to not use buffer cache – I didn’t see the drive light blink throughout the benchmark :) Unless its write allocator is as cool as the ONTAP one, it’s kinda suspect behavior, especially when compared to the Fedora BTRFS results, which were closer to what I was expecting.
  • BTRFS does use CPU much more than the other filesystems, with the box I was using it was OK but it’s probably not a good choice for something like an ancient laptop. But I think overall it shows great promise, given the features it can deliver.
  • The Linux NTFS driver did pretty well compared to Windows itself :)
  • Windows Server 2012 performed well, similar to Ubuntu Linux with ext4
  • XFS did better than ext4 overall

 

What did we learn?

We learned that the choice of OS and filesystem is enough to create a huge performance difference given the exact same workload generated by the exact same application compiled from the exact same source code.

When looking at purchasing something, looking at the base hardware specs is not enough.

Don’t focus too much on the underlying hardware unless the hardware you are looking to purchase is running the exact same OS (for example, you are comparing laptops running the same rev of Windows, or phones running the same rev of Android, or enterprise storage running the same OS, etc). You can apply the same logic to some extent to almost anything – for example, car engines. Bigger can mean more powerful but it depends.

And, ultimately, what really matters isn’t the hardware or even the OS it’s running. It’s how fast it will run your stuff in a reliable fashion with the features that will make you more productive.

D

Technorati Tags: , , , , , , , ,

Are you doing a disservice to your company with RFPs?

Whether we like it or not, RFPs (Request For Proposal) are a fact of life for vendors.

It usually works like this: A customer has a legitimate need for something. They decide (for whatever reason) to get bids from different vendors. They then craft an RFP document that is either:

  1. Carefully written, with the best intentions, so that they get the most detailed proposal possible given their requirements, or
  2. Carefully tailored by them and the help of their preferred vendor to box out the other vendors.

Both approaches have merit, even if #2 seems unethical and almost illegal. I understand that some people are just happy with what they have so they word their document to block anyone from changing their environment, turning the whole RFP process into an exercice in futility. I doubt that whatever I write here will change that kind of mindset.

However – I want to focus more on #1. The carefully written RFP that truly has the best intentions (and maybe some of it will rub off on the #2 “blocking” RFP type folks).

Here’s the major potential problem with the #1 approach:

You don’t know what you don’t know. For example, maybe you are not an expert on how caching works at a very low level, but you are aware of caching and what it does. So – you know that you don’t know about the low-level aspects of caching (or whatever other technology) and word your RFP so that you learn in detail how the various vendors do it.

The reality is – there are things whose existence you can’t even imagine – indeed, most things:

WhatUknow

By crafting your RFP around things you are familiar with, you are potentially (and unintentionally) eliminating solutions that may do things that are entirely outside your past experiences.

Back to our caching example – suppose you are familiar with arrays that need a lot of write cache in order to work well for random writes, so you put in your storage RFP requirements about very specific minimum amounts of write cache.

That’s great and absolutely applicable to the vendors that write to disk the way you are familiar with.

But what if someone writes to disk entirely differently than what your experience dictates and doesn’t need large amounts of write cache to do random writes even better than what you’re familiar with? What if they use memory completely differently in general?

Another example where almost everyone gets it wrong is specifying performance requirements. Unless you truly understand the various parameters that a storage system needs in order to properly be sized for what you need, it’s almost guaranteed the requirements list will be incomplete. For example, specifying IOPS without an I/O size and read/write blend and latency and sequential vs random – at a minimum – will not be sufficient to size a storage system (there’s a lot more here in case you missed it).

By setting an arbitrary limit to something that doesn’t apply to certain technologies, you are unintentionally creating a Type #2 RFP document – and you are boxing out potentially better solutions, which is ultimately not good for your business. And by not providing enough information, you are unintentionally making it almost impossible for the solution providers to properly craft something for you.

So what to do to avoid these RFP pitfalls?

Craft your RFP by asking questions about solving the business problem, not by trying to specify how the vendor should solve the business problem.

For example: Say something like this about space savings:

“Describe what, if any, technologies exist within the gizmo you’re proposing that will result in the reduction of overall data space consumption. In addition, describe what types of data and what protocols such technologies can work with, when they should be avoided, and what, if any, performance implications exist. Be specific.”

Instead of this:

“We need the gizmo to have deduplication that works this way with this block size plus compression that uses this algorithm but not that other one“.

Or, say something like this about reliability:

“Describe the technologies employed to provide resiliency of data, including protection from various errors, like lost or misplaced writes”.

Instead of:

“The system needs to have RAID10 disk with battery-backed write cache”.

It’s not easy. Most of us try to solve the problem and have at least some idea of how we think it should be solved. Just try to avoid that instinct while writing the RFP…

And, last but not least:

Get some help for crafting your RFP. We have this website that will even generate one for you. It’s NetApp-created, so take it with a grain of salt, but it was designed so the questions were fair and open-ended and not really vendor-specific. At least go through it and try building an RFP with it. See if it puts in questions you hadn’t thought of asking, and see how things are worded.

And get some help in getting your I/O requirements… most vendors have tools that can help with that. It may mean that you are repeating the process several times – but at least you’ll get to see how thorough each vendor is regarding the performance piece. Beware of the ones that aren’t thorough.

D

So now it is OK to sell systems using “Raw IOPS”???

As the self-proclaimed storage vigilante, I will keep bringing these idiocies up as I come across them.

So, the latest “thing” now is selling systems using “Raw IOPS” numbers.

Simply put, some vendors are selling based on the aggregate IOPS the system will do based on per-disk statistics and nothing else

They are not providing realistic performance estimates for the proposed workload, with the appropriate RAID type and I/O sizes and hot vs cold data and what the storage controller overhead will be to do everything. That’s probably too much work. 

For example, if one assumes 200x IOPS per disk, and 200 such disks are in the system, this vendor is showing 40,000 “Raw IOPS”.

This is about as useful as shoes on a snake. Probably less.

The reality is that this is the ultimate “it depends” scenario, since the achievable IOPS depend on far more than how many random 4K IOPS a single disk can sustain (just doing RAID6 could result in having to divide the raw IOPS by 6 where random writes are concerned – and that’s just one thing that affects performance, there are tons more!)

Please refer to prior articles on the subject such as the IOPS/latency primer here and undersizing here. And some RAID goodness here.

If you’re a customer reading this, you have the ultimate power to keep vendors honest. Use it!

D

Technorati Tags: ,