Tag Archives: NetApp

When competitors try too hard and miss the point

(edit: fixed the images)

After a long hiatus, we return to our regularly scheduled programming with a 2-part series that will address some wild claims Oracle has been making recently.

I’m pleased to introduce Jeffrey Steiner, ex-Oracle employee and all-around DB performance wizard. He helps some of our largest customers with designing high performance solutions for Oracle DBs:

Greetings from a guest-blogger.

I’m one of the original NetApp customers.

I bought my first NetApp in 1995 (I have a 3-digit support case in the system) and it was an F330. I think it came with 512MB SCSI drives, and maxed out at 16GB. It met our performance needs, it was reliable, and it was cost effective.  I continued to buy more over the following years at other employers. We must have been close to the first company to run Oracle databases on NetApp storage. It was late 1999. Again, it met our performance needs, it was reliable, and it was cost effective. My employer immediately prior to joining NetApp was Oracle.

I’m now with NetApp product operations as the principal architect for enterprise solutions, which usually means a big Oracle database is involved, but it can also include DB2, SAS, MongoDB, and others.

I normally ignore competitive blogs, and I had never commented on a blog in my life until I ran into something entitled “Why your NetApp is so slow…” and found this statement:

If an application such MS SQL is writing data in a 64k chunk then before Netapp actually writes it on disk it will have to split it into 16 different 4k writes and 16 different disk IOPS

That’s just openly false. I tried to correct the poster, but was met with nothing but other unsubstantiated claims and insults to the product line. It was clear the blogger wasn’t going to acknowledge their false premise, so I asked Dimitris if I could borrow some time on his blog.

Here’s one of the alleged results of this behavior with ONTAP– the blogger was nice enough to do this calculation for a system reading at 2.6GB/s:

 

erroneous_oracle_calcs


Seriously?

I’m not sure how to interpret this. Are they saying that this alleged horrible, awful design flaw in ONTAP leads to customers buying 50X more drives than required, and our evil sales teams have somehow fooled our customer based into believing this was necessary? Or, is this a claim that ZFS arrays have some kind of amazing ability to use 50X fewer drives?

Given the false premise about ONTAP chopping up any and all IO’s into little 4K blocks and spraying them over the drives, I’m guessing readers are supposed to believe the first interpretation.

Ordinarily, I enjoy this type of marketing. Customers bring this to our attention, and it allows us to explain how things actually work, plus it discredits the account team who provided the information. There was a rep in the UK who used to tell his customers that Oracle had replaced all competing storage arrays in their OnDemand centers with Pillar. I liked it when he said stuff like that. The reason I’m responding is not because I care about the existence of the other blog, but rather that I care about openly false information being spread about how ONTAP works.

How does ONTAP really work?

Some of NetApp’s marketing folks might not like this, but here’s my usual response:

Why does it matter?

It’s an interesting subject, and I’m happy to explain write tetrises and NVMEM write coalescence, and core utilization, but what does that have to do with your business? There was a time we dealt with accusations that NetApp was slow because we has 25 nanometer process CPU’s while the state of the art was 17nm or something like that. These days ‘cores’ seems to come up a lot, as if this happens:

logic

That’s the Brawndo approach to storage sales (https://www.youtube.com/watch?v=Tbxq0IDqD04)

“Our storage arrays contain

5 kinds of technology

which make them AWESOME

unlike other storage arrays which are

NOT AWESOME.”

A Better Way

I prefer to promote our products based on real business needs. I phrase this especially bluntly when talking to our sales force:

When you are working with a new enterprise customer, shut up about NetApp for at least the first 45 minutes of the discussion

I say that all the time. Not everyone understands it. If you charge into a situation saying, “NetApp is AWESOME, unlike EMC who is NOT AWESOME” the whole conversation turns into PowerPoint wars, links to silly blog articles like the one that prompted this discussion, and whoever wins the deal will win it based on a combination of luck and speaking ability. Providing value will become secondary.

I’m usually working in engineeringland, but in major deals I get involved directly. Let’s say we have a customer with a database performance issue and they’re looking for new storage. I avoid PowerPoint and usually request Oracle AWR/statspack data. That allows me to size a solution with extreme accuracy. I know exactly what the customer needs, I know their performance bottlenecks, and I know whatever solution I propose will meet their requirements. That reduces risk on both sides. It also reduces costs because I won’t be proposing unnecessary capabilities.

None of this has anything to do with who’s got the better SPC-2 benchmark, unless you plan on buying that exact hardware described, configuring it exactly the same way, and then you somehow make money based on running SPC-2 all day.

Here’s an actual Oracle AWR report from a real customer using NetApp. I have pruned the non-storage related parameters to make it easier to read, and I have anonymized the identifying data. This is a major international insurance company calculating its balance sheet at end-of-month. I know of at least 9 or 10 customers that have similar workloads and configurations.

awr1

Look at the line that says “Physical reads”. That’s the blocks read per second. Now look at “Std Block Size”. That’s the block size. This is 90K physical block reads per second, which is 90K IOPS in a sense. The IO is predominantly db_file_scattered_read, which counter-intuitively is sequential IO. A parameter called db_file_multiblock_read_count is set to 128. This means Oracle is attempting to read 128 blocks at a time, which equates to 1MB block sizes. It’s a sequential IO read of a file.

Here’s what we got:

1)     89K read “IOPS”, sort of.

2)     Those 89K read IOPS are actually packaged as units of 8 blocks in a single 64k unit.

3)     3K write IOPS

4)     8MB/sec of redo logging.

The most important point here is that the customer needed about 800MB/sec of throughput, they liked the cost savings of IP, and the storage system is meeting their needs. They refresh with NetApp on occasion, so obviouly they’re happy with the TCO.

To put a final nail in the coffin of the Oracle blogger’s argument, if we are really doing 89K block reads/sec, and those blocks are really chopped up into 4k units, that’s a total of about 180,000 4k IOPS that would need to be serviced at the disk layer, per the blogger’s calculation.

  • Our opposing blogger thinks that  would require about 1000 disks in theory
  • This customer is using 132 drives in a real production system.

There’s also a ton of other data on those drives for other workloads. That’s why we have QoS – it allows mixed workloads to play nicely on a single unified system.

To make this even more interesting, the data would have been randomly written in 8k units, yet they are still able to read at 800MB/sec? How is this possible? For one, ONTAP does NOT break up individual IO’s into 4k units. It tries very, very hard to never break up an IO across disks, although that can happen on occasion, notably if you fill you system up to 99% capacity or do something very much against best practices.

The main reason ONTAP can provide good sequential performance with randomly written data is the blocks are organized contiguously on disk. Strictly speaking, there is a sort of ‘fragmentation’ as our competitors like to say, but it’s not like randomly spraying data everywhere. It’s more like large contiguous chunks of data are evenly distributed across the disks. As long as those contiguous segments are sufficiently large, readahead can ensure good throughput.

That’s somewhat of an oversimplification, but it would take a couple hours and a whiteboard to explain the complete details. 20+ years of engineering can’t exactly be summarized in a couple paragraphs. The document misrepresented by the original blog was clearly dated 2006 (and that was to slightly refresh the original posting back in the nineties), and while it’s still correct as far as I can see, it’s also lacking information on the enhancements and how we package data onto disks.

By the way, this database mentioned above? It’s virtualized in VMware too.

Why did I pick an example of only 90K IOPS?  My point was this customer needed 90K IOPS, so they bought 90K IOPS.

If you need this performance:

awr2

then let us know. Not a problem. This is from a large SAP environment for a manufacturing company. It beats me what they’re doing, because this is about 10X more IO than what we typically see for this type of SAP application. Maybe they just built a really, really good network that permits this level of IO performance even though they don’t need it.

In any case, that’s 201,734 blocks/sec using a block size of 8k. That’s about 2GB/sec, and it’s from a dual-controller FAS3220 configuration which is rather old (and was the smallest box in its range when it was new).

Sing the bizarro-universe math from the other blog, these 200K IOPS would have been chopped up into 4k blocks and require a total of 400K back-end disk IOPS to service the workload. Divided by 125 IOPS/drive, we have a requirement for 3200 drives. It was ACTUALLY using more like 200 drives.

We can do a lot more, especially with the newer platforms and ONTAP clustering, which would enable up to 24 controllers in the storage cluster. So the performance limits per cluster are staggeringly high.

Futures

To put a really interesting (and practical) twist on this, sequential IO in the Oracle realm is probably going to become less important.  You know why? Oracle’s new in-memory feature. Me and several others were floored when we got the first debrief on Oracle In-Memory. I couldn’t have asked for a better implementation if I was in charge of Oracle engineering myself. Folks at NetApp started asking what this means for us, and here’s my list:

  1. Oracle customers will be spending less on storage.

That’s it. That’s my list. The data format on disk remains unchanged, the backup/restore process is the same, the data commitment process is the same. All the NetApp features that earned us around 12,500 Oracle customers are still applicable.

The difference is customers will need smaller controllers, fewer disks, and less bandwidth because they’ll be able to replace a lot of the brute-force full table scan activity with a little In-Memory magic. No, the In-Memory licenses aren’t free, but the benefits will be substantial.

SPC-2 Benchmarks and Engenio Purchases

The other blog demanded two additional answers:

1)     Why hasn’t NetApp done an SPC-2 bencharmk?

2)     Why did NetApp purchase Engenio?

SPC-2

I personally don’t know why we haven’t done an SPC-2 benchmark with ONTAP, but they are rather expensive and it’s aimed at large sequential IO processing. That’s not exactly the prime use case for FAS systems, but not because they’re weak on it. I’ve got AWR reports well into the GB/sec, so it certainly can do all the sequential IO you want in the real world, but what workloads are those?

I see little point in using an ONTAP system for most (but certainly not all) such workloads because the features overall aren’t applicable. I’m aware of some VOD applications on ONTAP where replication and backups were important. Overall, if you want that type of workload, you’d specify a minimum bandwidth requirement, capacity requirement, and then evaluate the proposals from vendors. Cost is usually the deciding factor.

Engenio Acquisition

Again, my personal opinion here on why NetApp acquired Engenio.

Tom Georgens, our CEO, spent 9 years leading Engenio and obviously knew the company and its financials well. I can’t think of any possible way to know you’re getting value for money than having someone in Georgens’ position making this decision.

Here’s the press release about it:

Engenio will enable NetApp to address emerging and fast-growing market segments such as video, including full-motion video capture and digital video surveillance, as well as high performance computing applications, such as genomics sequencing and scientific research.

Yup, sounds about right. That’s all about maximum capacity, high throughput, and low cost. In contrast, ONTAP is about manageability and advanced features. Those are aimed at different sets of business drivers.

Hey, check this out. Here’s an SEC filing:

Since the acquisition of the Engenio business in May 2011, NetApp has been offering the formerly-branded Engenio products as NetApp E-Series storage arrays for SAN workloads. Core differentiators of this price-performance leader include enterprise reliability, availability and scalability. Customers choose E-Series for general purpose computing, high-density content repositories, video surveillance, and high performance computing workloads where data is managed by the application and the advanced data management capabilities of Data ONTAP storage operating system are not required.

Key point here is “where the advanced data management capabilities of Data ONTAP are not required.” It also reflected my logic in storage decisions prior to joining NetApp, and it reflects the message I still repeat to account teams:

  1. Is there any particular feature in ONTAP that is useful for your customer’s actual business requirements? Would they like to snapshot something? Do they need asynchronous replication? Archival? SnapLock? Scale-out clusters with many nodes? Non-disruptive everything? Think carefully, and ask lots of questions.
  2. If the answer is “yes”, go with ONTAP.
  3. If the answer is “no”, go with E-Series.

That’s what I did. I probably influenced or approved around $5M in total purchases. It wasn’t huge, but it wasn’t nothing either. I’d guess we went ONTAP about 70% of the time, but I had a lot of IBM DS3K arrays around too, now known as E-Series.

“Dumb Storage”

I’ve annoyed the E-Series team a few times by referring to it as “dumb storage”, but I mean that in the nicest possible way. It’s primary job is to just sit there and work. It needs to do it fast, reliably, and cost effectively, but on a day-to-day basis it’s not generally doing anything all that advanced.

In some ways, the reliability was a weakness. It was so reliable, that we forgot it was there at all, and we’d do something like changing the email server addresses, and forget to update the RAS feature of the E-Series. Without email notification, it can take a couple years before someone notices the LED that indicates a drive needs replacement.

 

EMC’s VNX2 forklift: The importance of hardware re-use, slowing down obsolescence, and maximizing your investment

It was with interest that I watched the launch of EMC’s VNX refresh. The updated boxes got some long-awaited features, EMC talked a lot about how some pretty severe single-threaded bottlenecks were removed, more CPU and memory was put in, and there was much rejoicing.

Not really trying to pick on the new boxes (that will be in a future post, relax :)), but what I thought was interesting was that the code that makes most of the new features a possibility cannot be loaded on current-gen VNX boxes (not even the biggest, the 7500, which has plenty of CPU and RAM juice even compared to the next gen boxes).

Software-defined storage indeed.

The existing VNX disk shelves also seemingly can’t be re-used at the moment (correct me if I’m wrong please).

This forced obsolescence has been a theme with EMC: Clariion -> VNX -> VNX2 are all complete forklift upgrades. When the original VNX was released, it was utterly incompatible with the CX (Clariion) shelves (SAS replaced FC connectivity). Despite using the exact same code (FLARE).

Other vendors are guilty of this too – a new controller is released, and all prior investments on disk shelves are rendered useless (HDS did this with several iterations of AMS, maybe even with AMS -> HUS, HP with EVA…)

I understand that as technology progresses we sometimes have to abandon the old to make room for the new but, at the same time, customers make significant investments in N-1 technology – and often want to be able to re-use some of their investment with N (and sometimes N+1).

I just had this conversation with a customer, and he said “well, I throw away my gear every 3 years, why should I care?

Let’s try a thought experiment.

Imagine you just bought a system that’s running the fastest controllers a company sells today, and you got 1PB of storage behind it.

Now, imagine that a mere month after you purchased your controllers, new ones are released that are significantly faster. OK, that stuff happens.

Your gear is not 3 years old yet. It’s 1 month old. It’s running OK.

Now, imagine that your array runs out of steam 6 months later due to unprecedented performance growth. Your system is now 7 months old. You can’t just throw it away and start fresh.

You could buy a new storage system and migrate some of the data to share the load. However, you don’t need more space – you just ran out of controller headroom. Indeed, you still have tons of free space.

But what if you could replace the controllers with the new, beefier ones? And maintain your investment in the 1PB of storage, cache etc? Wouldn’t that be nice?

Or at least be able to move some of the storage pools you may have to the new family of controllers? Even if you had to reformat the disk?

Well – most vendors will tell you “sorry, no, you need to migrate your data to the new box”.

Let’s try another thought experiment.

You bought a storage system a year ago. It performs fine, but it lacks true deduplication capabilities. You have determined it would save you a lot of storage space (= money) if your array had deduplication.

The vendor you purchased the system from announces the refreshed storage OS that finally includes deduplication. And that same vendor made a truly gigantic fuss about software-defined storage, which made everyone feel software would be the big enabler and that coolness was a mere firmware upgrade away.

However, you are eventually told they will not allow the code that enables deduplication to be loaded to your array, and, instead, they ask you to migrate to the refreshed array that supports deduplication. Since the updated code somehow only runs on the new box. Something to do with unicorn milk.

But your array has plenty of CPU headroom to handle deduplication… and you could reformat the disks given some swing storage shelves if the underlying disk format is the issue. But the option is not provided.

How NetApp does things instead

At NetApp we sort of take it for granted so we don’t make a big fuss about software-defined storage, but hardware was always considered an enabler for the software and not the other way around. 

For instance: deduplication was released as a free software upgrade for Data ONTAP (the OS for our main line of storage). Back in 2007. For all storage protocols.

In general, we try to let systems be able to load at a minimum N+1 software releases, but most of the time we utterly spoil customers and go far above and beyond, unless we’re talking about the smallest boxes, which naturally have less headroom.

For example, the now aging FAS 3070 I have in the local lab (the bigger of the older midrange boxes, released in 2006) supports anything from ONTAP 7.2.1 (what it was released with) to ONTAP 8.1.3 (released in mid-2013).

This spans multiple major ONTAP releases – huge changes in the code have happened during those releases: 7.3, 8.0, 8.1… Multiple newer arrays were also released as replacements for the 3070: 3170, 3270, 3250.

Yet the 3070 soldiers on with a fully supported, modern OS, 7 years later.

What arrays did our competitors have back then? What is the most modern OS those same arrays can run today? What is that OS missing vs the OS that competitor’s more modern arrays have?

Let’s talk disk shelves.

We used FC loop connectivity for the older shelves (DS14). We then switched to fancy multi-channel SAS and totally different shelves and disks, but never stopped supporting the older shelf technology, even with newer controllers.

That’s the big one. I have customers with DS14 shelves that they purchased for a 3070 that they now have on a 3270 running 8.2. It all works, all supported. Other vendors cut support off after the transition from FC to SAS.

Will we support those older shelves forever? No, that’s impossible, but at least we give our customers a lot of leeway and let them stretch their hardware investments significantly longer than any other major storage vendor I can think of.

Think long term

I encourage customers to always think long term. Try to stop thinking in 3-year increments. Start thinking of other ways you can stretch your investment, other ways to deploy older gear while still keeping it interoperable with newer hardware.

And start thinking about what will happen to your investment once newer gear is released.

D

Technorati Tags: , , ,

What is hard disk short stroking?

This is going to be a short post, to atone for my past sins of overly long posts but mostly because I want to eat dinner.

On storage systems with spinning disks, a favorite method for getting more performance is short-stroking the disk.

It’s a weird term but firmly based on science. Some storage vendors even made a big deal about being able to place data on certain parts of the disk, geometrically speaking.

Consider the relationship between angular and linear velocity first:

Linearvsangular

 

 

 

 

 

 

 

 

 

Assuming something round that rotates at a constant speed (say, 15 thousand revolutions per minute), the angular speed is constant.

The linear speed, on the other hand, increases the further you get away from the center of rotation.

Which means, the part furthest away from the center has the highest linear speed.

Now imagine a hard disk, and let’s say you want to measure its performance for some sort of random workload demanding low latency.

What if you could put all the data of your benchmark at the very outer edge of the disk?

You would get several benefits:

  1. The data would enjoy the highest linear speed and
  2. The disk tracks at the outer edge store more data, further increasing speeds, plus
  3. The disk heads would only have to move a small amount to get to all the data (the short-stroking part). This leads to much reduced seek times, which is highly beneficial for most workloads.

I whipped this up to explain it pictorially:

Short stroking

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Using a lot of data in a benchmark would not be enough to avoid short-stroking. One would also need to ensure that the access pattern touches the entire disk surface.

This is why NetApp always randomizes data placement for the write-intensive SPC-1 benchmark, to ensure we are not accused of short-stroking, no matter how much data is used in the benchmark.

Hope this clears it up.

If you are participating at any vendor proof-of-concept involving performance – I advise you to consider the implications of short-stroking. Caching, too, but I feel that has been tackled sufficiently.

D

NetApp delivers 1.3TB/s performance to giant supercomputer for big data

(Edited: My bad, it was 1.3TB/s, not 1TB/s).

What do you do when you need so much I/O performance that no one single storage system can deliver it, no matter how large?

To be specific: What if you needed to transfer data at 1TB per second? (or 1.3TB/s, as it eventually turned out to be)?

That was the problem faced by the U.S. Department of Energy (DoE) and their Sequoia supercomputer at the Lawrence Livermore National Laboratory (LLNL), one of the fastest supercomputing systems on the planet.

You can read the official press release here. I wanted to get more into the technical details.

People talk a lot about “big data” recently – no clear definition seems to exist, in my opinion it’s something that has some of the following properties:

  • Too much data to be processed by a “normal” computer or cluster
  • Too much data to work with using a relational DB
  • Too much data to fit in a single storage system for performance and/or capacity reasons – or maybe just simply:
  • Too much data to process using traditional methods within an acceptable time frame

Clearly, this is a bit loose – how much is “too much”? How long is “too long”? For someone only armed with a subnotebook computer, “too much” does not have the same meaning as for someone rocking a 12-core server with 256GB RAM and a few TB of SSD.

So this definition is relative… but in some cases, such as the one we are discussing, absolute – given the limitations of today’s technology.

For instance, the amount of storage LLNL required was several tens of PB in a single storage pool that could provide unprecedented I/O performance to the tune of 1TB/s. Both size and performance needed to be scalable. It also needed to be reliable and fit within a reasonable budget and not require extreme space, power and cooling. A tall order indeed.

This created some serious logistics problems regarding storage:

  • No single disk array can hold that amount of data
  • No single disk array can perform anywhere close to 1TB/s

Let’s put this in perspective: The storage systems that scale the biggest are typically scale-out clusters from the usual suspects of the storage world (we make one, for example). Even so, they max out at less PB than the deployment required.

The even bigger problem is that a single large scale-out system can’t really deliver more than a few tens of GB/s under optimal conditions – more than fast enough for most “normal” uses but utterly unacceptable for this case.

The only realistic solution to satisfy the requirements was massive parallelization, specifically using the NetApp E-Series for the back-end storage and the Lustre cluster filesystem.

 

A bit about the solution…

Almost a year ago NetApp purchased the Engenio storage line from LSI. That storage line is resold by several companies like IBM, Oracle, Quantum, Dell, SGI, Teradata and more. IBM also resells the ONTAP-based FAS systems and calls them “N-Series”.

That purchase has made NetApp the largest provider of OEM arrays on the planet by far. It was a good deal – very rapid ROI.

There was a lot of speculation as to why NetApp would bother with the purchase. After all, the ONTAP-based systems have a ton more functionality than pretty much any other array and are optimized for typical mostly-random workloads – DBs, VMs, email, plus megacaching, snaps, cloning, dedupe, compression, etc – all with RAID6-equivalent protection as standard.

The E-Series boxes on the other hand don’t do thin provisioning, dedupe, compression, megacaching… and their snaps are the less efficient copy-on-first-write instead of redirect-on-write. So, almost the anti-ONTAP :)

The first reason for the acquisition was that, on purely financial terms, it was a no-brainer deal even if one sells shoes for a living, let alone storage. Even if there were no other reasons, this one would be enough.

Another reason (and the one germane to this article) was that the E-Series has a tremendous sustained sequential performance density. For instance, the E5400 system can sustain about 4GB/s in 4U (real GB/s, not out of cache), all-in. That’s 4U total for 60 disks including the controllers. Expandable, of course. It’s no slouch for random I/O either, plus you can load it with SSDs, too… :)

Again, note – 60 drives per 4U shelf and that includes the RAID controllers, batteries etc. In addition, all drives are front-loading and stay active while servicing the shelf – as opposed to most (if not all) dense shelves in the market that need the entire (very heavy) shelf pulled out and/or several drives offlined in order to replace a single failed drive… (there’s some really cool engineering in the shelf to do this without thermal problems, performance loss or vibrations). All this allows standard racks and no fear of the racks tipping over while servicing the shelves :) (you know who you are!)

There are some vendors that purely specialize in sequential I/O and tipping racks – yet they have about 3-4x less performance density than the E5400, even though they sometimes have higher per-controller throughput. In a typical marketing exercise, some of our more usual competitors have boasted 2GB/s/RU for their controllers, meaning that in 4U the controllers (that take up 4U in that example) can do 8GB/s, but that requires all kinds of extra rack space to achieve (extra UPSes, several shelves, etc). Making their resulting actual throughput number well under 1GB/s/RU. Not to mention the cost (those systems are typically more expensive than a 5400). Which is important with projects of the scale we are talking about.

Most importantly, what we accomplished at the LLNL was no marketing exercise…

 

The benefits of truly high performance density

Clearly, if your requirements are big enough, you end up spending a lot less money and needing a lot less rack space, power and cooling by going with a highly performance-dense solution.

However, given the requirements of the LLNL, it’s clear that you can’t use just a single E5400 to satisfy the performance and capacity requirements of this use case. What you can do though is use a bunch of them in parallel… and use that massive performance density to achieve about 40GB/s per industry-standard rack with 600x high-capacity disks (1.8PB raw per rack).

For even higher performance per rack, the E5400 can use the faster SAS or SSD drives – 480 drives per rack (up to 432TB raw), providing 80GB/s reads/60GB/s writes.

 

Enter the cluster filesystem

So, now that we picked the performance-dense, reliable, cost-effective building block, how do we tie those building blocks together?

The answer: By using a cluster filesystem.

Loosely defined, a cluster filesystem is simply a filesystem that can be accessed simultaneously by the servers mounting it. In addition, it also typically means it can span storage systems and make them look as one big entity.

It’s not a new concept – and there are several examples, old and new: AFS, Coda, GPFS, and the more prevalent Stornext and Lustre are some.

The LLNL picked Lustre for this project. Lustre is a distributed filesystem that breaks apart I/O into multiple Object Storage Servers, each connected to storage (Object Storage Targets). Metadata is served by dedicated servers that are not part of the I/O stream and thus not a bottleneck. See below for a picture (courtesy of the Lustre manual) of how it is all connected:

 

Lustre Scaled Cluster

 

High-speed connections are used liberally for lower latency and higher throughput.

A large file can reside on many storage servers, and as a result I/O can be spread out and parallelized.

Lustre clients see a single large namespace and run a proprietary protocol to access the cluster.

It sounds good in theory – and it delivered in practice: 1.3TB/s sustained performance was demonstrated to the NetApp block devices. Work is ongoing to finalize the testing with the complete Lustre environment. Not sure what the upper limit would be. But clearly it’s a highly scalable solution.

 

Putting it all together

NetApp has fully realized solutions for the “big data” applications out there – complete with the product and services needed to complete each engagement. The Lustre solution employed by the LLNL is just one of the options available. There is Hadoop, Full Motion uncompressed HD video, and more.

So – how fast do you need to go?

D

 

 

Technorati Tags: , ,

 

NetApp posts world-record SPEC SFS2008 NFS benchmark result

Just as NetApp dominated the older version of the SPEC SFS97_R1 NFS benchmark back in May of 2006 (and was unsurpassed in that benchmark with 1 million SFS operations per second), the time has come to once again dominate the current version, SPEC SFS2008 NFS.

Recently we have been focusing on benchmarking realistic configurations that people might actually put in their datacenters, instead of lab queens with unusable configs focused on achieving the highest result regardless of cost.

However, it seems the press doesn’t care about realistic configs (or to even understand the configs) but instead likes headline-grabbing big numbers.

So we decided to go for the best of both worlds – a headline-grabbing “big number” but also a config that would make more financial sense than the utterly crazy setups being submitted by competitors.

Without further ado, NetApp achieved over 1.5 million SPEC SFS2008 NFS operations per second with a 24-node cluster based on FAS6240 boxes running ONTAP 8 in Cluster Mode. Click here for the specific result. There are other results in the page showing different size clusters so you can get some idea of the scaling possible.

See below table for a high-level analysis (including the list pricing I could find for these specific performance-optimized configs for whatever that’s worth). The comparison is between NetApp and the nearest scale-out competitor result (one of many EMC’s recent acquisitions –  Isilon, the niche, dedicated NAS box – nothing else is close enough to bother including in the comparison).

BTW – the EMC price list is publicly available from here (and other places I’m sure): http://www.emc.com/collateral/emcwsca/master-price-list.pdf

From page 422:

S200-6.9TB & 200GB SSD, 48GB RAMS200-6.9TB & 200GB SSD, 48GB RAM, 2x10GE SFP+ & 2x1G $84,061. Times 140…

Before we dive into the comparison, an important note since it seems the competition doesn’t understand how to read SPEC SFS results:

Out of 1728 450GB disks (the number includes spares and OS drives, otherwise it was 1632 disks), the usable capacity was 574TB (73% of all raw space – even more if one considers a 450GB disk never actually provides 450 real GB in base2). The exported capacity was 288TB. This doesn’t mean we tried to short-stroke or that there is a performance benefit exporting a smaller filesystem – the way NetApp writes to disk, the size of the volume you export has nothing to do with performance. Since SPEC SFS doesn’t use all the available disk space, the person doing the setup thought like a real storage admin and didn’t give it all the available space. 

Lest we be accused of tuning this config or manually making sure client accesses were load-balanced and going to the optimal nodes, please understand this:  23 out of 24 client accesses were not going to the nodes owning the data and were instead happening over the cluster interconnect (which, for any scale-out architecture, is worst-case-scenario performance). Look under the “Uniform Access Rules Compliance” in the full disclosure details of the result in the SPEC website here. This means that, compared to the 2-node ONTAP 7-mode results, there is a degradation due to the cluster operating (intentionally) through non-optimal paths.

EMC NetApp Difference
Cost (approx. USD List) 11,800,000 6,280,000 NetApp is almost half the cost while offering much higher performance
SPEC SFS2008 NFS operations per second 1,112,705 1,512,784 NetApp is over 35% faster, while using potentially better RAID protection
Average Latency (ORT) 2.54 1.53 NetApp offers almost 40% better average latency without using costly SSDs, and is usable for challenging random workloads like DBs, VMs etc.
Space (TB) 864 (out of which 128889GB was used in the test) 574 (out of which 176176GB was used in the test) Isilon offers about 50% more usable space (coming from a lot more drives, 28% more raw space and potentially less RAID protection – N+2 results from Isilon would be different)
$/SPEC SFS2008 NFS operation 10.6 4.15 Netapp is less than half the cost per SPEC SFS2008 NFS operation
$/TB 13,657 10,940 NetApp is about 20% less expensive than EMC per usable TB
RAID Per-file protection. Files < 128K are at least mirrored. Files over 128K are at a 13+1 level protection in this specific test. RAID-DP Ask EMC what 13+1 protection means in an Isilon cluster (I believe 1 node can be completely gone but what about simultaneous drive failures that contain the sameprotected file?)NetApp RAID-DP is mathematically analogous to RAID6 and has a parity drive penalty of 2 drives every 16-20 drives.
Boxes needed to accomplish result 140 nodes, 3,360 drives (incl. 25TB of SSDs for cache), 1,120 CPU cores, 6.7TB RAM. 24 unified controllers, 1,728 drives, 12.2TB Flash Cache, 192 CPU cores, 1.2TB RAM. NetApp is far more powerful per node, and achieves higher performance with a lotless drives, CPUs, RAM and cache.In addition, NetApp can be used for all protocols (FC, iSCSI, NFS, CIFS) and all connectivity methods (FC 4/8Gb, Ethernet 1/10Gb, FCoE).

 

Notice the response time charts:

IsilonVs6240response

NetApp exhibits traditional storage system behavior – latency is very low initially and gradually gets higher the more the box is pushed, as one would expect. Isilon on the other hand starts out slow and gets faster as more metadata gets cached, until the controllers run out of steam (SPEC SFS is very heavy in NAS metadata ops, and should not be compared to heavy-duty block benchmarks like SPC-1).

This is one of the reasons an Isilon cluster is not really applicable for low-latency DB-type apps, or low-latency VMs. It is a great architecture designed to provide high sequential speeds for large files over NAS protocols, and is not a general-purpose storage system. Kudos to the Isilon guys for even getting the great SPEC result in the first place, given that this isn’t what the box is designed to do (the extreme Isilon configuration needed to run the benchmark is testament to that). The better application for Isilon would be capacity-optimized configs (which is what the system is designed for to begin with).

 

Some important points:

  1. First and foremost, the cluster-mode ONTAP architecture now supports all protocols, it is the only unified scale-out architecture available. Any competitors playing in that space only have NAS or SAN offerings but not both in a single architecture.
  2. We didn’t even test with the even faster 6280 box and extra cache (that one can take 8TB cache per node). The result is not the fastest a NetApp cluster can go :) With 6280s it would be a healthy percentage faster, but we had a bunch of the 6240s in the lab so it was easier to test them, plus they’re a more common and less expensive box, making for a more realistic result.
  3. ONTAP in cluster-mode is a general-purpose storage OS, and can be used to run Exchange, SQL, Oracle, DB2, VMs, etc. etc. Most other scale-out architectures are simply not suitable for low-latency workloads like DBs and VMs and are instead geared towards high NAS throughput for large files (IBRIX, SONAS, Isilon to name a few – all great at what they do best).
  4. ONTAP in cluster mode is, indeed, a single scale-out cluster and administered as such. It should not be compared to block boxes with NAS gateways in front of them like VNX, HDS + Bluearc, etc.
  5. In ONTAP cluster mode, workloads and virtual interfaces can move around the cluster non-disruptively, regardless of protocol (FC, iSCSI, NFS and yes, even CIFS can move around non-disruptively assuming you have clients that can talk SMB 2.1 and above).
  6. In ONTAP cluster mode, any data can be accessed from any node in the cluster – again, impossible with non-unified gateway solutions like VNX that have individual NAS servers in front of block storage, with zero awareness between the NAS heads aside from failover.
  7. ONTAP cluster mode can allow certain cool things like upgrading storage controllers from one model to another completely non-disruptively, most other storage systems need some kind of outage to do this. All we do is add the new boxes to the existing cluster :)
  8. ONTAP cluster mode supports all the traditional NetApp storage efficiency and protection features: RAID-DP, replication, deduplication, compression, snaps, clones, thin provisioning. Again, the goal is to provide a scale-out general-purpose storage system, not a niche box for only a specific market segment. It even supports virtualizing your existing storage.
  9. There was a single namespace for the NFS data. Granted, not the same architecture as a single filesystem from some competitors.
  10. Last but not least – no “special” NetApp boxes are needed to run Cluster Mode. In contrast to other vendors selling a completely separate scale-out architecture (different hardware and software and management), normal NetApp systems can enter a scale-out cluster as long as they have enough connectivity for the cluster network and can run ONTAP 8. This ensures investment protection for the customer plus it’s easier for NetApp since we don’t have umpteen hardware and software architectures to develop for and support :)
  11. Since people have been asking: The SFS benchmark generates about 120MB per operation. The slower you go, the less space you will use on the disks, regardless of how many disks you have. This creates some imbalance in large configs (for example, only about 128TB of the 864TB available was used on Isilon).

Just remember – in order to do what ONTAP in Cluster Mode does, how many different architectures would other vendors be proposing?

  • Scale-out SAN
  • Scale-out NAS
  • Replication appliances
  • Dedupe appliances
  • All kinds of management software

How many people would it take to keep it all running? And patched? And how many firmware inter-dependencies would there be?

And what if you didn’t need, say, scale-out SAN to begin with, but some time after buying traditional SAN realized you needed scale-out? Would your current storage vendor tell you you needed, in addition to your existing SAN platform, that other one that can do scale-out? That’s completely different than the one you bought? And that you can’t re-use any of your existing stuff as part of the scale-out box, regardless of how high-end your existing SAN is?

How would that make you feel?

Always plan for the future…

Comments welcome.

D

PS: Made some small edits in the RAID parts and also added the official EMC pricelist link.

 

Technorati Tags: , , , , , , , , ,

 

NetApp vs EMC usability report: malice, stupidity or both?

Most are familiar with Hanlon’s Razor:

Never attribute to malice that which is adequately explained by stupidity.

A variation of that is:

Never attribute to malice that which is adequately explained by stupidity, but don’t rule out malice.

You see, EMC sponsored a study comparing their systems to ones from the company they look up to and try to emulate. The report deals with ease-of-use (and I’ll be the first to admit the current iteration of EMC boxes is far easier to use than in the past and the GUI has some cool stuff in it). I was intrigued, but after reading the official-looking report posted by Chuck Hollis, I wondered who in their right mind will lend it credence, and ignored it since I have a real day job solving actual customer problems and can’t possibly respond to every piece of FUD I see (and I see a lot).

Today I’m sitting in a rather boring meeting so I thought I’d spend a few minutes to show how misguided the document is.

In essence, the document tackles the age-old dilemma of which race car to get by comparing how easy it is to change the oil, and completely ignores the “winning the race with said car” part. My question would be: “which car allows you to win the race more easily and with the least headaches, least cost and least effort?”

And if you think winning a “race” is just about performance, think again.

It is also interesting how the important aspects of efficiency, reliability and performance are not tackled, but I guess this is a “usability” report…

Strange that a company named “Strategic Focus” reduces itself to comparing arrays by measuring the number of mouse clicks. Not sure how this is strategic for customers. They were commissioned by EMC, so maybe EMC considers this strategic.

I’ll show how wrong the document is by pointing at just some of the more glaring issues, but I’ll start by saying a large multinational company has many PB of NetApp boxes around the globe and 3 relaxed guys to manage it all. How’s that for a real example? :)

  1. Page 2, section 4, “Methodology”: EMC’s own engineers set up the VNX properly. No mention of who did the NetApp testing, what their qualifications are, and so on. So, first question: “Do these people even know what they’re doing? Have they really used a NetApp system before?”
  2. Page 10, Table A, showing the configurations. A NetApp FAS3070 was used, running the latest code at this moment (8.01). Thanks EMC for the unintended compliment – you see, that system is 2 generations old (the current one is 3270 and the previous one is 3170) yet it can still run the very latest 64-bit ONTAP code just fine. What about the EMC CX3? Can it run FLARE31? Or is that a forklift upgrade? Something to be said for investment protection :)
  3. Page 3 table 5-1, #1. Storage pools on all modern arrays would typically be created upfront, so the wording is very misleading. In order to create a new LUN one does NOT NEED to create a pool. Same goes for all vendors.
  4. Same table and section (also mentioned in section 7): Figuring out the space available is as simple as going to the aggregate page, where the space is clearly shown for the aggregates. So, unsure what is meant here.
  5. Regarding LUN creation… Let me ask you a question: After you create a LUN on any array, what do you need to do next? You see, the goal is to attach the LUN to a host, do alignment, partition creation, multipathing and create a filesystem and write stuff to it. You know, use it. NetApp largely automates end-to-end creation of host filesystems and, indeed, does not need an administrator to create a LUN on the array at all. Clearly the person doing the testing is either not aware of this or decided to omit this fact.
  6. Page 4, item 4 (thin provisioning). Asinine statement – plus, any NetApp LUN can be made thick or thin with a single click, whereas a VNX needs to do a migration. Indeed, NetApp does not complicate things whether thin or thick is required, does not differentiate between thin and thick when writing, and therefore does not incur a performance penalty, whereas EMC does (according to EMC documentation).
  7. Page 4, item 5 (Creation of virtual CIFS servers). The Multistore feature is free of charge on all new systems and allows one to create fully segregated, secure multitenancy virtual CIFS, NFS and iSCSI NetApp “partitions” – far beyond the capabilities of EMC. Again, misleading.
  8. Page 4, item 6 (growing storage elements). No measurable difference? Kindly show all the steps to grow a LUN until the new space is visible from the host side. End-to-end is important to real users since they want to use the storage. Or maybe not, for the authors of this document.
  9. Page 5, Item 1. We are really talking here about EMC snapshots? Seriously? Versus NetApp? To earn the right to do so assumes your snapshots are a usable and decent feature and that you can take a good number of them without the box crumbling to pieces. Ask any vendor about a production array with the most snaps and ask to talk to the customer using it. Then compare the number of snaps to a typical NetApp customer’s. Don’t be surprised if one number is a few hundred times less than the other.
  10. Page 5, item 3 (storage tiering): part of a longer conversation but this assumes all arrays need to do tiering. If my solution is optimized to the level that it doesn’t need to do this but yours is not optimized so it needs tiering, why on earth am I being penalized for doing storage more efficiently than you? (AKA the “not invented here” syndrome).
  11. Page 6 item 1 (VMware awareness): NetApp puts all the awareness inside vCenter and, indeed, datastore creation (including volume/LUN/NAS creation and resizing), VM cloning etc. all from within vCenter itself. Ask for a demo and prepare to be amazed.
  12. Page 6, item 2 – (dedupe/compress individual VMs): This one had my blood boiling. You see, EMC cannot even dedupe individual VMs, (impossible, given the fact that current DART code only does dedupe at the file and not block level and no two active VMs will ever be exactly the same), can’t dedupe at all for block storage (maybe in the future but not today), and in general doesn’t recommend compression for VMs! Ask to see the best practices guide that states all this is supported and recommended for active production VMs, and to talk to a customer doing it at scale (not 10 VMs). A feature you can theoretically turn on but that will never work is not quite useful, you see…
  13. Page 8, entire table: Too much to comment on, suffice it to say that NetApp systems come with tools not mentioned in this report that go so far beyond what Unisphere does that it’s not even funny (at no additional cost). Used by customers that have thousands of NetApp systems. That’s how much those tools scale. EMC would need vast portions of the Ionix suite to do anything remotely similar (at $$$). Of course, mentioning that would kinda derail this document… and the piece about support and upgrades is utterly wrong, but I like to keep the surprise for when I do the demos and not share cool IP ideas here :)
  14. Page 11, Table B1: In the end, the funniest one of all! If you add up the total number of mouse clicks, NetApp needed 92 vs EMC’s 111. Since the whole point of this usability report is to show overall ease of use by measuring the total number of clicks to do stuff, it’s interesting that they didn’t do a simple total to show who won in the end… :)

I could keep going but I need to pay attention to my meeting now since it suddenly became interesting.

Ultimately, when it comes to ease of use, it’s simple to just do a demo and have the customer decide for themselves which approach they like best. Documents such as this one mean less than nothing for actual end users.

I should have another similar list showing clicks and TIME needed to do certain other things. For instance, using RecoverPoint (or any other method), kindly show the number of clicks and time (and disk space) for creating 30 writable clones of a 10TB SQL DB and mounting them on 30 different DB servers simultaneously. Maintaining unique instance names etc. Kinda goes a bit beyond LUN creation, doesn’t it? :)

All this BTW doesn’t mean any vendor should rest on their laurels and stop working on improving usability. It’s a never-ending quest. Just stop it with the FUD, please…

Finishing with something funny: Check this video for a good demonstration of something needing few clicks yet not being that easy to do.

Comments welcome.

D

Technorati Tags: , , , ,

Examining value for money regarding the SPEC benchmarks

Some of the comments in my previous post asked about $/IOPS and $/TB.

Since SPEC doesn’t require prices to be listed, I did my own analysis.

The NetApp numbers are simply 4x the existing 6240 result, which is what EMC did with their submission, they used 4x separate VNX systems and aggregated the result.

I used this clarifying analogy over at Nigel’s blog to explain why this makes sense before anyone yells “but this is not published”:

A storage system typically has some kind of bottleneck – cluster interconnect, number of drives, bandwidth to the controller, etc.

When you’re testing a single system, you’re ultimately hitting one of those bottlenecks.

If you’re testing multiple systems independent of each other, they do not share the bottlenecks (since they’re separate), and your performance will scale linearly as you add systems.

For example, if 1 truck can hold 10 tons of stuff, 4 like trucks will hold 40 tons of stuff, 10 trucks 100 tons, etc. There’s no limit.

Once you inject a limiting factor (“the trucks all have to fit on a bridge and the bridge can take this much load and it’s this big”) then you will have a limitation on how many trucks you can load and put on that bridge.

EMC tested 4 separate “trucks”. In that same way, I can add up the result of 4 separate NetApp “trucks”. Here are the results:

EMC NetApp Difference
Cost (approx. USD List) 6,000,000 5,000,000 NetApp is over 16% cheaper in absolute terms
SPEC SFS NFS IOPS 497,623 762,700 NetApp is 53% faster in absolute terms
Average Latency (ORT) 0.96 1.17 EMC offers a mere 18% less latency (with less NFS OPS) despite using only SSDs!
Space (TB) 60 343 NetApp offers 5.7 times more usable space
$/SPEC NFS IOPS 12.06 6.56 Netapp is 45.6% less expensive per SPEC NFS operation
$/TB 100,000 14,577 NetApp is less than 1/6 the price of EMC per TB
RAID RAID5 RAID-DP NetApp is thousands of times more reliable
Boxes needed to accomplish result 15 (4x separate VNX, each with 2 controllers, plus a total of 5x Celerra VG8 heads and 2 Control Stations) 8x unified controllers NetApp is far less complex (the benefit of a truly unified architecture)

Who can spot the better deal? Smile

I added the latency in the chart, thanks to my buddy Mark Twomey for pointing it out.

You see, people needing enterprise NAS with that kind of performance usually need speed, plenty of space and high reliability. Not just one of the three. BTW, here’s a paper on relative RAID reliability.

NetApp provides all three, in spades, plus great value for money, a truly simple, flexible unified system, and efficiency.

Most customers want to see how a real configuration performs. I refer customers to our SPEC and SPC results constantly since quite frequently their desired configuration is very similar.

Which makes benchmarking realistic configurations actually useful – imagine that.

Maybe EMC needs to submit results with VNX the way they sell it to people, for example:

  • A mix of SSD cache, SSD, high-speed SAS and high-capacity SAS
  • Autotiering
  • RAID6
  • A typical amount of space for a configuration that size

Then submit results.

Keep your existing result of course, but also show the people how what you actually sell them really performs.

I still don’t understand why this is such a hard concept.

D

Technorati Tags: ,,,,

EMC conclusively proves that VNX bottlenecks NAS performance

A bit of a controversial title, no?

Allow me to elaborate.

EMC posted a new SPEC SFS result as part of a marketing stunt (which is working, look at what I’m doing – I’m talking about them, if only to clear the air).

In simple terms, EMC got almost 500,000 SPEC SFS NFS IOPS (not to be confused with, say, block-based SPC-1 IOPS) with the following configuration:

  1. Four (4) totally separate VNX arrays, each loaded with SSD storage, utterly unaware of each other (8 total controllers since each box has 2)
  2. Five (5) Celerra VG8 NAS heads/gateways (1 spare), one on top of each VNX box
  3. 2 Control Stations
  4. 8 exported filesystems (2 per VG8 head/VNX system)
  5. Multiple pools of storage (at least 1 per VG8) – not shared among the various boxes, no data mobility between boxes
  6. Only 60TB NAS space with RAID5 (or 15TB per box)

Now, this post is not about whether this configuration is unrealistic and expensive (almost nobody would pay $6m for merely 60TB of NAS, not today). I get it that EMC is trying to publish the best possible number by loading a bunch of separate arrays with SSD. It’s OK as long as everyone understands the details.

My beef has to do with how it’s marketed.

EMC is very vague about the configuration, unless you look at the actual SPEC website. In the marketing materials they just mention VNX, as in “The EMC VNX performed at 497,623 SPECsfs2008_nfs.v3 operations per second”. Kinda like saying it’s OK to take 3 5-year olds and a 6-year old to a bar because their age adds up to 21.

No – the far more accurate statement is “four separate VNXs working independently and utterly unaware of each other did 124,405 SPEC fs2008_nfs.v3 operations per second each“.

All EMC did was add up the result of 4 boxes.

Heck, that’s easy to do!

NetApp already has a result for the 6240 (just 2 controllers doing a respectable 190,675 SPEC NFS ops taking care of NAS and RAID all at once since they’re actually unified, no cornucopia of boxes there) without using Solid State Drives (common SAS drives plus a large cache were used instead – a standard, realistic config we sell every day, and not a “lab queen”).

If all we’re doing is adding up the result of different boxes, simply multiply this by 4 (plus we do have Cluster-Mode for NAS so it would count as a single clustered system with failover etc. among the nodes) and end up with the following result:

  1. 762,700 SPEC SFS NFS operations
  2. 8 exported filesystems
  3. 343TB usable with RAID-DP (thousands of times more resilient than RAID5)

So, which one do you think is the better deal? More speed, 343TB and better protection, or less speed, 60TB and far less protection? :)

Customers curious about other systems can do the same multiplication trick for other configs, the sky is the limit!

The other, more serious part, and what prompted me to title the post the way I did, is that EMC’s benchmarking made pretty clear the fact that the VNX is the bottleneck, only able to really support a single VG8 head at top speed, necessitating the need for 4 separate VNX systems to accomplish the final result. So, the fact that a VNX can have up to 8 Celerra heads on top of it means nothing since the back-end is your limiting factor. You might as well stick to a dual-head VG8 config (1 active 1 passive) since that’s all it can comfortably drive (otherwise why benchmark it that way?)

But with only 1 active NAS head you’d be limited to just 256TB max NAS capacity, since that’s how much total space a Celerra head can address as of the time of this writing. Which is probably enough for most people.

I wonder if the NAS heads that can be bought as a package with VNX are slower than VG8 heads, and by how much. You see, most people buying the VNX will be getting the NAS heads that can be packaged with it since it’s cheaper that way. How fast does that go? I’m sure customers would like to know, since that’s what they will typically buy.

I also wonder how fast it would be with RAID6.

Here’s a novel idea: benchmark what customers will actually buy!

So apples-to-apples comparisons can become easier instead of something like this:

Bothapples

For the curious: on the left you see an “Autumn Glory” Malus Floribunda (miniature apple). Photo courtesy of John Fullbright.

D

Technorati Tags: , , , , , , , ,

EMC finally joining the SPC – plus some advice

Just came to my attention that EMC is finally joining the SPC (Storage Performance Council). As I’ve pointed in their past, their absence from this most standard industry benchmark was puzzling, kudos for rectifying this omission.

I do have some advice for EMC (and all other vendors that have already posted results):

  1. Do show results for your various supported RAID types, not just RAID10 – after all, if most of your customers don’t just deploy RAID10, it makes sense to show RAID5 and RAID6, especially if you want to compare results with NetApp RAID-DP (that’s the protection equivalent of RAID6). This will increase your credibility. The argument that you only show RAID10 since that’s the best performing doesn’t hold water – everyone knows the other RAID types will have different performance levels, providing results for everything will at least enable customers to get an idea of how much of a performance hit they’ll have with the different RAID types with a write-intensive workload.
  2. Do show long-running benchmarks with auto-tiering enabled. After all, if you are claiming your auto-tiering implementation doesn’t hurt and can even improve results, this is your chance to show it.
  3. Enable features like snapshots.
  4. Use your large cache if you have one, especially if you keep advertising that it accelerates writes. It will just solidify your claims.

Welcome to the club!

D

 

Technorati Tags: , , , , , , , , , , , ,

Stack Wars: The Clone Wars

It seems that everyone and their granny is trying to create some sort of stack offering these days. Look at all the brouhaha – HP buying 3Par, Dell buying Compellent, all kinds of partnerships being formed left and right. Stacks are hot.

To the uninitiated, a stack is what you can get when a vendor is able to offer multiple products under a single umbrella. For instance, being able to get servers, an OS, a DB, an email system, storage and network switches from a single manufacturer (not a VAR) is an example of a single-sourced stack.

The proponents of stacks maintain that with stacks, customers potentially get simpler service and better integration – a single support number to call, “one throat to choke”, no finger-pointing between vendors. And that’s partially true. A stack potentially provides simpler access to support. On the “better integration” part – read on.

My main problem with stacks is that nobody really offers a complete stack, and those that are more complete than others, don’t necessarily offer best-of-breed products (not even “good enough”), nor do they offer particularly great integration between the products within the stack.

Personal anecdote: a few years ago I had the (mis)fortune of being the primary backup and recovery architect for one of the largest airlines in the world. Said airline had a very close relationship with a certain famous vendor. Said vendor only flew with that airline, always bought business class seats, the airline gave the vendor discounts, the vendor gave the airline discounts, and in general there was a lot of mutual back-scratching going on. So much so that the airline would give that vendor business before looking at anyone else, and only considered alternative vendors if the primary vendor didn’t have anything that even smelled like what the airline was looking for.

All of which resulted in the airline getting a backup system that was designed for small businesses since that’s all that vendor had to offer (and still does).

The problem is, that backup product simply could not scale to what I needed it to. I ended up having to stop file-level logging and could only restore entire directories since the backup database couldn’t handle the load, despite me running multiple instances of the tool for multiple environments. Some of those directories were pretty large, so you can imagine the hilarity that ensued when trying to restore stuff…

The vendor’s crack development team came over from overseas and spent days with me trying to figure out what they needed to change about the product to make it work in my environment (I believe it was the single largest installation they had).

Problem is, they couldn’t deliver soon enough, so, after much pain, the airline moved to a proper enterprise backup system from another vendor, which fixed most problems, given the technology I had to work with at the time.

Had the right decision been made up front, none of that pain would have been experienced. The world would have been one IT anecdote short, but that’s a small price to pay for a working environment. And this is but just one way that single vendor stacks can fail.

How does one decide on a stack?

Let’s examine a high-level view of a few stack offerings. By no means an all-inclusive list.

Microsoft: They offer an OS (catering from servers to phones), a virtualization engine, a DB, a mail system, a backup tool and the most popular office apps in the world, among many other things. Few will argue that all the bits are best-of-breed, despite being hugely popular. Microsoft doesn’t like playing in the hardware space, so they’re a pure software stack. Oh, there’s the XBox, too.

EMC: Various kinds of storage platforms (7-10 depending on how you count), all but one (Symmetrix) coming from acquisition. A virtualization engine (80% owner of VMware – even though it’s run as a totally separate entity). DB, many kinds of backup, document management, security and all other kinds of software. Some bits are very strong, others not so much.

Oracle: They offer an OS, a virtualization engine, a DB, middleware, servers and storage. An office suite. No networking. Oracle is a good example of an incomplete software/hardware stack. Aside from the ultra-strong DB and good OS, few will say their products are all best-of-breed.

Dell: They offer servers desktops, laptops, phones, various flavors of storage, switches. Dell is an example of a pure hardware stack. Not many software aspirations here. Few will claim any of their products are best-of-breed.

HP: They offer servers desktops, laptops, phones, even more flavors of storage, a UNIX OS with its own type of virtualization (can’t run x86), switches, backup software, a big services arm, printers, calculators… All in all, great servers, calculators and printers, not so sure about the rest. Fairly complete stack.

IBM: Servers, 2 strong DBs, at least 3 different OSes, CPUs, many kinds of storage,  email system, middleware, backup software, immense services arm. No x86 virtualization (they do offer virtualization for their other platforms). Very complete stack, albeit without networking.

Cisco: All kinds of networking (including telephony), servers. Limited stack if networking is not your bag, but what it offers can be pretty good.

Apple: Desktops, laptops, phones, tablets, networking gear, software. Great example of a consumer-oriented hardware and software stack. They used to offer storage and servers but they exited that business.

Notice anything common about the various single-vendor stacks? Did you find a stack that can truly satisfy all your IT needs without giving anything up?

The fact of the matter is that none of the above companies, as formidable as they are, offers a complete stack – software and hardware. You can get some of the way there, but it’s next to impossible to single-source everything without shooting yourself in the foot. At a minimum, you’re probably looking at some kind of Microsoft stack + server/storage stack + networking stack – mixing a minimum of 3 different vendors to get what you need without sacrificing much (the above example assumes you either don’t want to virtualize or are OK with Hyper-V).

Most companies have something like this: Microsoft stack + virtualization + DB + server + storage + networking – 6 total stacks.

So why do people keep pushing single-vendor stacks?

Only a few valid reasons (aside from it being fashionable to talk about). One of them is control – the more stuff you have from a company, the tighter their hold on you. The other is that it at least limits the support points you have to deal with, and can potentially get you better pricing (theoretically). For instance, Dell has “given away” many an Equallogic box. Guess what – the cost of that box was blended into everything else you purchased, it’s all a shell game. But if someone does buy a lot of gear from a single vendor, there are indeed ways to get better deals. You just won’t necessarily get best-of-breed or even good enough gear.

What about integration?

One would think that buying as much as possible from a single vendor gets you better integration between the bits. Not necessarily. For instance, most large vendors acquire various technologies and/or have OEM deals – if one looks just at storage as an example, Dell has their own storage (Equallogic and Compellent – two different acquisitions) plus an OEM deal with EMC. There’s not much synergy between the various technologies.

HP has their own storage (EVA, Lefthand, Ibrix, Polyserve, 3Par – four different acquisitions) and two OEM deals with Dot Hill for the MSA boxes and HDS for the high-end XP systems. That’s a lot of storage tin to keep track of (all of which comes from 7 different places and 7 totally different codebases), and any HP management software needs to be able to work with all of those boxes (and doesn’t).

IBM has their own storage (XIV, DS6000, DS8000, SONAS, SVC – I believe three homegrown and two acquisitions) and two different OEM deals (NetApp for N Series and LSI Logic for DS5000 and below). The integration between those products and the rest of the IBM landscape should be examined on a case-by-case basis.

EMC’s challenge is that they have acquired too much stuff, making it difficult to provide proper integration for everything. Supporting and developing for that plethora of systems is not easy and teams end up getting fragmented and inconsistent. Far too many codebases to keep track of. This dilutes the R&D dollars available and prolongs development cycles.

Aren’t those newfangled special-purpose multi-vendor stacks better?

There’s another breed of stack, the special-purpose one where a third party combines gear from different vendors, assembles it and sells it as a supported and pre-packaged solution for a specific application. Such stacks are actually not new – they have been sold for military, industrial and healthcare applications for the longest time. Recently, Netapp and EMC have been promoting different versions of what a “virtualization stack” should be (as usual, with very different approaches, check out FlexPod and Vblock).

The idea behind the “virtualization stack” is that you sell the customer a rack that has inside it network gear, servers, storage, management and virtualization software. Then, all the customer has to do is load up the gear with VMs and off they go.

With such a stack, you don’t limit the customer by making them buy their gear all from one vendor, but instead you limit them by pre-selecting vendors that are “best of breed”. Not everyone will be OK with the choice of gear, of course.

Then there’s the issue of flexibility – some of the special-purpose stacks literally are like black boxes – you are not supposed to modify them or you lose support. To the point where you’re not allowed to add RAM to servers or storage to arrays, both limitations that annoy most customers, but are viewed as a positive by some.

Is it a product or a “kit”?

Back to the virtualization-specific stacks: This is the main argument, do you buy a ready-made “product” or a “kit” some third party assembles after following a detailed design guide. As of this writing (there have been multiple changes to how this is marketed), Vblock is built by a company known as VCE – efectively a third party that puts together a custom stack made of different kinds of EMC storage, Cisco switches and servers, VMware, and a management tool called UIM. It is not built by EMC, VMware or Cisco. VCE then resells the assembled system to other VARs or directly to customers.

NetApp’s FlexPod is built by VARs. The difference is that more than one VAR can build FlexPods (as long as they meet some specific criteria) and don’t need to involve a middleman (also translating to more profits for VARs). All VARs building FlexPods need to follow specific guidelines to build the product (jointly designed by VMware, Cisco and NetApp), use components and firmware tested and certified to work together, and add best-of-breed management software to manage the stack.

The FlexPod emphasis is on sizing and performance flexibility (from tiny to gigantic), Secure Multi Tenancy (SMT – a unique differentiator), space efficiency, application integration, extreme resiliency and network/workload isolation – all highly important features in virtualized environments. In addition, it supports non-virtualized workloads.

Ultimately, in both cases the customer ends up with a pre-built and pre-tested product.

What about support?

This has been both the selling point and the drawback of such multi-vendor stacks. In my opinion, it has been the biggest selling point for Vblock, since a customer calls VCE for support. VCE has support staff that is trained on Cisco, VMware and EMC and can handle many support cases via a single support number – obviously a nice feature.

Where this breaks down a bit: VCE has to engage VMware, EMC and Cisco for anything that’s serious. Furthermore, Vblock support doesn’t support the entire stack but stops at the hypervisor.

For instance, if a customer hits an Enginuity (Symmetrix OS) bug, then the EMC Symm team will have to be engaged, and possibly write a patch for the customer and communicate with the customer. VCE support simply cannot fix such issues, and is best viewed as first-level support. Same goes for Cisco or VMware bugs, and in general deeper support issues that the VCE support staff simply isn’t trained enough to resolve. In addition, Vblocks can be based on several different kinds of EMC storage, that itself requires different teams to support it.

Finally – ask VCE if they are legally responsible for maintaining the support SLAs. For instance, who is responsible if there is a serious problem with the array and the vendor takes 2 days to respond instead of 30 minutes?

FlexPod utilizes a cooperative support model between NetApp, Cisco and VMware, and cases are dealt with by experts from all three companies working in concert. The first company to be called owns the case.

When the going gets tough, both approaches work similarly. For easy cases that can be resolved by the actual VCE support personnel, Vblock probably has an edge.

Who needs the virtualization stack?

I see several kinds of customers that have a need for such a stack (combinations are possible):

  1. The technically/time constrained. For them, it might be easier to get a somewhat pre-integrated solution. That way they can get started more quickly.
  2. The customers needing to hide behind contracts for legal/CYA reasons.
  3. The large. They need such huge amounts of servers and storage, and so frequently, that they simply don’t have the time to put it in themselves, let alone do the testing. They don’t even have time to have a PS team come onsite and build it. They just want to buy large, ready-to-go chunks, shipped as ready to use as possible.
  4. The rapidly growing.
  5. Anyone wanting pre-tested, repeatable and predictable configurations.

Interesting factoid for the #3 case (large customer): They typically need extensive customizations and most of the time would prefer custom-built infrastructure pods with the exact configuration and software they need. For instance, some customers might prefer certain management software over others, and/or want systems to come preconfigured with some of their own customizations in-place – but they are still looking for the packaged product experience. FlexPod is flexible enough to allow that without deviating from the design. Of course, if the customer wants to dramatically deviate (i.e. not use one of the main components like Cisco switches or servers, for instance) – then it stops being a FlexPod and you’re back to building systems the traditional way.

What customers should really be looking for when building a stack?

In my opinion, customers should be looking for a more end-to-end experience.

You see – even with the virtualization stack in place, you will need to add:

  • OSes
  • DBs
  • Email
  • File Services
  • Security
  • Document Management
  • Chargeback
  • Backup
  • DR
  • etc etc.

You should partner with someone that can help you not just with the storage/virtualization/server/network stack, but also with:

  • Proper alignment of VMs to maintain performance
  • Application-level integration
  • Application protection
  • Application acceleration

In essense, treat things holistically. The vendor that provides your virtualization stack needs to be able to help you all the way to, say, configuring Exchange properly so it doesn’t break best practices, and ensuring that firmware revs on the hardware don’t clash with, say, software patch levels.

You still won’t avoid support complexity. Sure, maybe you can have Microsoft do a joint support exercise with the hardware stack VAR, but, no matter how you package it, you are going to be touching the support of the various vendors making up the entire stack.

And you know what?

It’s OK.

D

 

Technorati Tags: , , , , ,