Recovery Monkey http://recoverymonkey.org Musings on storage, tuning, backups and more - delivered with a no-nonsense approach Tue, 23 Aug 2016 02:18:51 +0000 en-US hourly 1 http://recoverymonkey.org/wp-content/uploads/2016/03/cropped-IMG_8515-32x32.png Recovery Monkey http://recoverymonkey.org 32 32 19558135 The Importance of Automated Headroom Management http://recoverymonkey.org/2016/08/22/the-importance-of-automated-headroom-management/ http://recoverymonkey.org/2016/08/22/the-importance-of-automated-headroom-management/#respond Tue, 23 Aug 2016 02:13:11 +0000 http://recoverymonkey.org/?p=832 Continue reading "The Importance of Automated Headroom Management"]]> Before we begin: This is another vendor-neutral post. I realize there may be no architecture that can do everything I’m proposing, but some may come closer to what you need than others. Whether you’re a vendor or a customer, see it as stuff you should be doing or be asking for respectively…

Headroom!

Headroom is a term that applies to almost all technologies, and it’s crucially important for all of them. Some examples:

  • Photography
  • Cars
  • Bridges
  • Storage arrays…

Why is Sufficient Headroom Important?

Maintaining sufficient headroom in any solution is a way to ensure safety and predictability of operation under most conditions (especially under unfavorable ones).

For instance, if the maximum load for an evenly loaded bridge before it collapses is X, the overall recommended load will be a fraction of that. But even the weight/length/axle count of a single truck on a bridge will also be subject to certain strict limits in order to avoid excessive localized stress on the structure.

Headroom in Storage Arrays

Apologies to the seasoned storage pros for all the foundational material but it’s crucial to take this step by step.

It is important to note that headroom in arrays is not necessarily as simple as how busy the CPU is. Headroom is a multi-dimensional concept.

More factors than just CPU come into play, including how busy the underlying storage media are, how saturated various buses are, and how much of the CPU is spent on true workload vs opportunistic tasks (that could be deferred). Not to mention that in some systems, certain tasks are single-threaded and could pose an overall headroom bottleneck by maxing out a single CPU core, while the rest of the CPU is not busy at all.

Maintaining sufficient headroom in storage arrays is necessary in order to provide acceptable latency, especially in the event of high load during a controller failover. Depending on the underlying architecture of an array, different headroom approaches and calculations are necessary.

Some examples of different architectures:

  • Active-Active controllers, per-controller pool
  • Active-Standby, single pool
  • Active-Active, single pool
  • Grid, single pool
  • Permutations thereof (it’s beyond the scope of this article to explore all possible options)

The single vs multiple pool question complicates things a bit, plus things like disk ownership are also hugely important. This isn’t an argument about which architecture is better (it depends anyway), but rather about headroom management in different architectures.

Dual-Controller Headroom

Dual-controller architectures need to be extremely careful with headroom management. After all, there are only two controllers in play. Here’s what sufficient headroom looks like in a dual-controller system:

HeadroomHA2

There are not many options to keep things healthy in a dual-controller architecture. In an Active-Standby system, the Standby controller is ready to immediately take over. There is no danger in loading up the Active controller, aside from expected load-related latency.

In an Active-Active HA system, maintaining a healthy amount of headroom has to be managed so that there is, overall, an entire controller’s worth of free headroom available.

Headroom in a Cluster of HA Pairs Architecture

There are several implementations that make use of a multiple HA Pair architecture. Often, the multiple HA pairs present a virtual pool to the outside world, even if, internally, there are multiple private pools. Some implementations just keep it to pools owned by each controller.

Here’s an example of healthy headroom in such a system:

HeadroomMultiHA2

Even though there are multiple controllers (at least 4 total), in order to maintain an overall healthy system, a total of 100% headroom needs to be maintained in each  HA pair, otherwise the performance of an underlying private pool (in green) might suffer, making the overall virtual pool performance (light blue) unpredictable.

Headroom in a True Grid Architecture

Grid Architectures spread overall load among multiple nodes (often plain servers with some disks inside and connected via a network).

In such a scheme, overall headroom that needs to be maintained per node as a percentage is 100/N, where N is the number of nodes in the storage cluster.

So, in a 4-node cluster, 100/4=25% headroom per node needs to be maintained.

This doesn’t account for the significant work that rebalancing after a node failure takes in such architectures, nor the capacity headroom needed, but it’s roughly accurate enough for our purposes.

Schematically:

HeadroomGrid2

How Headroom is Managed is Crucial

In order to manage headroom, four things need to be able to happen first:

  1. Be able to calculate headroom
  2. Be able to throttle workloads
  3. Be able to prioritize between types of workload
  4. Be able to move workloads around (architecture-dependent).

The only architecture that inherently makes this a bit easier is Active-Standby since there is always a controller waiting to take over if anything bad happens. But even with a single active controller, headroom needs to be managed in order to avoid bad latency conditions during normal operation (see here for an example approach). Remember, headroom is a multi-dimensional thing.

Example Problem Case: Imbalanced & Overloaded Controllers

Consider the following scenario: An Active-Active system has both controllers overloaded, and one of them is really busy:

Headroom Imbalanced2

Clearly, there are a few problems with this picture:

  1. It may be impossible to fail over in the event of a controller failure (total load is 165% of a single controller’s headroom)
  2. The first controller may already be experiencing latency issues
  3. Why did the system even get to this point in the first place?
This is a commonplace occurrence unfortunately.

Automation is Key in Managing Headroom

The biggest problem in our example is actually the last point: Why was the system allowed to get to that state to begin with?

Even if a system is able to calculate headroom, throttle workloads and move workloads around, if nothing is done automatically to prevent problems, it’s extremely easy for users to get into the problem situation depicted above. I’ve seen it affect critical production systems far too many times for comfort.

Manual QoS is Not The Best Answer

Being able to manually throttle workloads can obviously help in such a situation. The problems with the manual QoS approach are outlined in a past article, but, in summary, most users simply have absolutely no idea what the actual limits should be (nor should they be expected to). Most importantly, placing QoS limits up front doesn’t result in balanced controllers… and may even result in other kinds of performance problems.

Of course, using QoS limits reactively is not going to prevent the problem from occurring in the first place.

Some companies offer Data Classification as a Professional Services engagement, in order to try and figure out an IOPS/TB/Application metric. Even if that is done, it doesn’t result in balanced controllers… it’s also not very useful in dynamic environments. It’s more used as a guideline for setting up manual QoS.

Automation Mechanisms to Consider for Managing Headroom

Clearly, pervasive automation is needed in order to keep headroom at safe levels.

I will split up the proposed mechanisms per architecture. There is some common functionality needed regardless of architecture:

Common Automation Needed

Every architecture needs to have the ability to automatically achieve the following, without user intervention at any point:

  1. Conserve headroom per controller
  2. Differentiate between different kinds of user workloads
  3. Differentiate between different kinds of system workloads
  4. Automatically prioritize between different workloads, especially under pressure
  5. Automatically throttle different kinds of workloads, especially under pressure

And now for the extra automation needed per architecture:

Active-Standby Automation

If in a single HA pair, nothing else is needed. If in a scale-out cluster of Active-Standby pairs:

  1. Automatically balance capacity and headroom utilization between HA pairs even if they’re different types
  2. Be able to auto-migrate workloads to other cluster nodes (if using multiple pools instead of one)

Active-Active Automation

  1. Automatically conserve one node’s worth of headroom across the HA pair (50/50, 60/40, 70/30 – all are OK)
  2. When provisioning new workloads, auto-balance them by performance and capacity across the nodes
  3. Be able to balance by auto-migrating workloads to the other node (if using multiple pools instead of one)

Active-Active with Multiple HA Pairs Automation

  1. Automatically conserve one node’s worth of headroom per HA pair
  2. Be able to auto-migrate workloads to any other node
  3. Automatically balance workloads and capacity utilization in the underlying per-HA pools

Grid Automation

  1. Automatically conserve at least one node’s worth of headroom across the grid
  2. Automatically conserve enough capacity to be able to lose one node, rebalance, and have enough capacity left to lose another one (the more cautious may want the capability to lose 2-3 nodes simultaneously)
  3. Automatically take into account grid size and rebalancing effort in order to conserve the right amount of headroom

In Closing…

If you’re a consumer of storage systems, always remember to be running your storage with sufficient headroom to be able to sustain a major failure without overly affecting your performance.

In addition, when looking to refresh your storage system, always ask the vendors about how they automate their headroom management.

Finally, if any vendor is quoting you performance numbers, always ask them how much headroom is left on the array at that performance level… (in addition to the extra questions about read/write percentages and latencies you should be asking already).

The answer may surprise you.

D

]]>
http://recoverymonkey.org/2016/08/22/the-importance-of-automated-headroom-management/feed/ 0 832
The Importance of SSD Firmware Updates http://recoverymonkey.org/2016/08/03/the-importance-of-ssd-firmware-updates/ http://recoverymonkey.org/2016/08/03/the-importance-of-ssd-firmware-updates/#respond Wed, 03 Aug 2016 22:19:22 +0000 http://recoverymonkey.org/?p=823 Continue reading "The Importance of SSD Firmware Updates"]]> I wanted to bring this crucial issue to light since I’m noticing several storage vendors being either cavalier about this or simply unaware.

I will explain why solutions that don’t offer some sort of automated, live SSD firmware update mechanism are potentially extremely risky propositions. Yes, this is another “vendor hat off, common sense hat on” type of post.

Modern SSD Architecture is Complex

The increased popularity and lower costs of fast SSD media are good things for storage users, but there is some inherent complexity within each SSD that many people are unaware of.

Each modern SSD is, in essence, an entire pocket-sized storage array, that includes, among other things:

  • An I/O interface to the outside world (often two)
  • A CPU
  • An OS
  • Memory
  • Sometimes Compression and/or Encryption
  • What is, in essence, a log-structured filesystem, complete with complex load balancing and garbage collection algorithms
  • An array of flash chips driven in parallel through multiple channels
  • Some sort of RAID protection for the flash chips, including sparing, parity, error checking and correction…
  • A supercapacitor to safely flush cache to the flash chips in case of power failure.

Sounds familiar?

With Great Power and Complexity Come Bugs

To make something clear: This discussion has nothing to do with overall SSD endurance & hardware reliability. Only the software aspect of the devices.

All this extra complexity in modern SSDs means that an increased number of bugs compared to simpler storage media is a statistical certainty. There is just a lot going on in these devices.

Bugs aren’t necessarily the end of the world. They’re something understood, a fact of life, and there’s this magical thing engineers thought of called… Patching!

As a fun exercise, go to the firmware download pages of various popular SSDs and check the release notes for some of the bugs fixed. Many fixes address some rather abject gibbering horrors… 🙂

Even costlier enterprise SSDs have been afflicted by some really dangerous bugs – usually latent defects (as in: they don’t surface until you’ve been using something for a while, which may explain why these bugs were missed by QA).

I fondly remember a bug that hit some arrays at a previous place of employment: the SSDs would work great but after a certain number of hours of operation, if you shut your machine down, the SSDs would never come up again. Or, another bug that hit a very popular SSD that would downsize itself to an awesome 8MB of capacity (losing all existing data of course) once certain conditions were met.

Clearly, these are some pretty hairy situations. And, what’s more, RAID, checksums and node-level redundancy wouldn’t protect against all such bugs.

For instance, think of the aforementioned power off bug – all SSDs of the same firmware vintage would be affected simultaneously and the entire array would have zero SSDs that functioned. This actually happened, I’m not talking about a theoretical possibility. You know, just in case someone starts saying “but SSDs are reliable, and think of all the RAID!”

It’s all about approaching correctness from a holistic point of view. Multiple lines of defense are necessary.

The Rules: How True Enterprise Storage Deals with Firmware

Just like with Fight Club, there are some basic rules storage systems need to follow when it comes to certain things.

  1. Any firmware patching should be a non-event. Doesn’t matter what you’re updating, there should be no downtime.
  2. ANY firmware patching should be a NON-EVENT. Doesn’t matter what you’re updating, there should be NO downtime!
  3. Firmware updates should be automated even when dealing with devices en masse.
  4. The customer should automatically be notified of important updates they need to perform.
  5. Different vintage and vendor component updates should be handled automatically and centrally. And, most importantly: Safely.

If these rules are followed, bug risks are significantly mitigated and higher uptime is possible. Enterprise arrays typically will follow the above rules (but always ask the vendor).

Why Firmware Updating is a Challenge with Some Storage Solutions

Certain kinds of solutions make it inherently harder to manage critical tasks like component firmware updates.

You see, being able to hot-update different kinds of firmware in any given set of hardware means that the mechanism doing the updating must be intimately familiar with the underlying hardware & software combination, however complex.

Consider the following kind of solution, maybe for someone sold on the idea that white box approaches are the future:

  • They buy a bunch of diskless server chassis from Vendor A
  • They buy a bunch of SSDs from Vendor B
  • They buy some Software Defined Storage offering from Vendor C
  • All running on the underlying OS of Vendor D…

Now, let’s say Vendor B has an emergency SSD firmware fix they made available, easily downloadable on their website. Here are just some of the challenges:

  1. How will that customer be notified by Vendor B that such a critical fix is available?
  2. Once they have the fix located, which Vendor will automate updating the firmware on the SSDs of Vendor B, and how?
  3. How does the customer know that Vendor B’s firmware fix doesn’t violently clash with something from Vendor A, C or D?
  4. How will all that affect the data-serving functionality of Vendor C?
  5. Can any of Vendors A, B, C or D orchestrate all the above safely?
  6. With no downtime?

In most cases I’ve seen, the above chain of events will not even progress past #1. The user will simply be unaware of any update, simply because component vendors don’t usually have a mechanism that alerts individual customers regarding firmware.

You could inject a significant permutation here: What if you buy the servers pre-built, including SSDs, from Vendor A, including full certification with Vendors C and D? 

Sure – it still does not materially change the steps above. One of Vendors A, C or D still need to somehow:

  1. Automatically alert the customer about the critical SSD firmware fix being available
  2. Be able to non-disruptively update the firmware…
  3. …While not clashing with the other hardware and software from Vendors A, C and D
I could expand this type of conversation to other things like overall environmental monitoring and checksums but let’s keep it simple for now and focus on just component firmware updates…

Always Remember – Solve Business Problems & Balance Risk

Any solution is a compromise. Always make sure you are comfortable with the added risk certain areas of compromise bring (and that you are fully aware of said risk).

The allure of certain approaches can be significant (at the very least because of lower promised costs). It’s important to maintain a balance between increased risk and business benefit.

In the case of SSDs specifically, the utter criticality of certain firmware updates means that it’s crucially important for any given storage solution to be able to safely and automatically address the challenge of updating SSD firmware.

D

]]>
http://recoverymonkey.org/2016/08/03/the-importance-of-ssd-firmware-updates/feed/ 0 823
The Importance of the Effective Capacity Ratio in Modern Storage Systems http://recoverymonkey.org/2016/05/19/the-importance-of-the-effective-capacity-ratio-in-modern-storage-systems/ http://recoverymonkey.org/2016/05/19/the-importance-of-the-effective-capacity-ratio-in-modern-storage-systems/#respond Thu, 19 May 2016 23:26:43 +0000 http://recoverymonkey.org/?p=811 Continue reading "The Importance of the Effective Capacity Ratio in Modern Storage Systems"]]> In modern storage devices (especially All Flash Arrays), extensive data reduction techniques are commonplace and expected by customers.

This has, unavoidably, led to various marketing schemes that aim to make certain systems seem more appealing than the rest. Or at least not less appealing…

I will attempt to explain what customers should be looking for when trying to decipher capacity claims from a manufacturer.

In a nutshell – and for the ADD-afflicted – the most important number you should be looking for is the Effective Capacity Ratio, which is simply: (Effective Capacity)/(Raw Capacity). Ignore the more common but far less useful Data Reduction Ratio, which is: (Effective Capacity)/(Usable Capacity).

Read on for more detail…

The problem

Most modern (and some not so modern) storage vendors claim a similar Data Reduction Ratio for their devices. Numbers like 5:1 seem commonplace these days. Which, to anyone with a modicum of common sense, simply means “I can store five times more stuff in the same space”. Unfortunately, the same 5:1 ratio does not mean the same end result for all vendors…

Why this is a problem

Customers simply want to get a good deal for their money. The reality is that the exact same 5:1 Data Reduction Ratio may mean very different things between storage devices.

For instance: Not all vendors count space savings using the same math. Is the virtual size of snapshots calculated in the final savings ratio? (for example, if I take 5 snaps of a 1TB volume one after another, is that showing up as 6:1 savings?)

How about Thin Provisioning? Is that included? Such math can wildly alter the overall efficiency numbers.

Here’s a clear example of why counting thin provisioning in the overall ratio is probably misleading, and best used only for educational purposes… Remember that overall savings ratios are multiplicative. For example, 2:1 compression and 2:1 deduplication mean 4:1 overall savings:

Thin Craziness

Does anyone really believe that such a system is actually providing 1000:1 savings? Or even 10:1? Yet that’s what many vendors are counting towards savings ratios, without separating the thin provisioning savings from the overall savings number.

However, there is another dimension, and it has to do with how efficiently the raw capacity is actually utilized. It will be one of my many Captain Obvious moments for some of you, but I’ve seen enough people get confused, so it’s worth explaining.

The Effective Capacity Ratio

Different storage systems have different ways of utilizing their raw capacity. For example, a mirrored system can never have better than a 50% Usable:Raw ratio. By definition, it’s mathematically impossible since 2 copies are needed. That’s not even counting spares and other possible overheads.

Systems that do triple mirroring can’t do better than a theoretical 33.3% Usable:Raw etc.

Some definitions are in order:

  • Usable Capacity: How much data I can store in a system after overheads such as RAID, sparing etc. but before data reduction techniques.
  • Raw Capacity: Add up the capacity of all the storage media in the system. Usually a Base 10 number (TB/GB, not Base 2, TiB/GiB)
  • Effective Capacity: How much data I can store in a system after data reduction techniques like Deduplication and Compression but not Thin Provisioning
  • Data Reduction Ratio: (Effective Capacity)/(Usable Capacity)
  • Effective Capacity Ratio: (Effective Capacity)/(Raw Capacity)

Who is More Efficient?

If every vendor is claiming 5:1 average savings, who is truly more efficient? A Reductio ad Absurdum example makes it pretty clear:

  1. A vendor that can do a Data Reduction Ratio of 5:1 but has a Usable Capacity of 10% vs Raw or…
  2. A vendor that can do a Data Reduction Ratio of 5:1 but has a Usable Capacity of 70% vs Raw?

Let’s put some numbers on a table. They roughly correspond to some existing storage vendors today (there may be some variation depending on whether the numbers for each line are TB vs TiB – everyone shows numbers differently – but the overall point remains the same):

Effective Cap Ratio

As you can see, the same Data Reduction Ratio, on the same amount of Raw Capacity, can have wildly different Effective Capacity results, depending on the system.

Clearly, a truly efficient system is one that can both:

  • Provide a high Usable:Raw ratio and
  • Provide a high Data Reduction Ratio (that does not include fluff like Thin Provisioning)

The Business Benefits of a High Effective Capacity Ratio

There are multiple business reasons why chasing a high Effective Capacity Ratio is important:

  • High rack density. Some vendors are eight times more dense than others (ironically, the ones claiming “white box economics” are the least dense and most expensive). This could lead to significant power/cooling and $/rack savings
  • Lower CapEx – if a vendor needs 2x the Raw Capacity vs another, guess who’s paying for the difference?
  • Less complexity – denser devices means less devices, less cables, less switching

What you can do as a Customer

Level the playing field. It’s actually pretty easy:

  • Insist on seeing all capacity numbers expressed as TiB from every vendor – and if they don’t know what that means, run… (or at least subtract 9% from any capacity number you see and you’ll be safer)
  • Insist on seeing the Data Reduction Ratio without including Thin Provisioning or Snapshots. If they can’t break out the savings by category, run
  • Insist on seeing the Usable:Raw percentage for various configurations. Is it better in larger configs vs smaller? Is it following all best practices? What are the best practices for space consumption? If they tell you to ignore the man behind the curtain, run… (there is a theme developing)
  • Do your own Effective Capacity Ratio calculation and assign a $/Effective TiB to each vendor

D

Technorati Tags: , , , , , ,

]]>
http://recoverymonkey.org/2016/05/19/the-importance-of-the-effective-capacity-ratio-in-modern-storage-systems/feed/ 0 811
The Well-Behaved Storage System: Automatic Noisy Neighbor Avoidance http://recoverymonkey.org/2016/04/08/the-well-behaved-storage-system-automatic-noisy-neighbor-avoidance/ http://recoverymonkey.org/2016/04/08/the-well-behaved-storage-system-automatic-noisy-neighbor-avoidance/#comments Fri, 08 Apr 2016 08:57:13 +0000 http://recoverymonkey.org/?p=796 Continue reading "The Well-Behaved Storage System: Automatic Noisy Neighbor Avoidance"]]> This topic is very near and dear to me, and is one of the big reasons I came over to Nimble Storage.

I’ve always believed that storage systems should behave gracefully and predictably under pressure. Automatically. Even under complex and difficult situations.

It sounds like a simple request and it makes a whole lot of sense, but very few storage systems out there actually behave this way. This creates business challenges and increases risk and OpEx.

The Problem

The simplest way to state the problem is that most storage systems can enter conditions where workloads can suffer from unfair and abrupt performance starvation under several circumstances.

OK, maybe that wasn’t the simplest way.

Consider the following scenarios:

  1. A huge sequential I/O job (backup, analytics, data loads etc.) happening in the middle of latency-sensitive transaction processing
  2. Heavy array-generated workloads (garbage collection, post-process dedupe, replication, big snapshot deletions etc.) happening at the same time as user I/O
  3. Failed drives
  4. Controller failover (due to an actual problem or simply a software update)

#3 and #4 are more obvious – a well-behaved system will ensure high performance even during a drive failure (or three), and after a controller fails over. For instance, if total system headroom is automatically kept at 50% for a dual-controller system (or, simplistically, 100/n, where n is the controller count for shared-everything architectures), even after a controller fails, performance should be fine.

#1 and #2 are a bit more complicated to deal with. Let’s look at this in more detail.

The Case of Competing Workloads During Hard Times

Inside every array, at any given moment, a balancing act occurs. Multiple things need to happen simultaneously.

Several user-generated workloads, for instance:

  • DB
  • VDI
  • File Services
  • Analytics

Various internal array processes – they also are workloads, just array-generated, and often critical:

  • Data reduction (dedupe, compression)
  • Cleanup (object deletion, garbage collection)
  • Data protection (integrity-related)
  • Backups (snaps, replication)

If the system has enough headroom, all these things will happen without performance problems.

If the system runs out of headroom, that’s where most arrays have challenges with prioritizing what happens when.

The most common way a system may run out of headroom is the sudden appearance of a hostile “bully” workload. This is also called a “noisy neighbor”. Here’s an example of system behavior in the presence of a bully workload:

bully_vs_victim_2

 

In this example, the latency-sensitive workload will greatly and unfairly suffer after the “noisy neighbor” suddenly appears. If the latency-sensitive workload is a mission-critical application, this could cause a serious business problem (slow processing of financial transactions, for instance).

This is an extremely common scenario. A lot of the time it’s not even a new workload. Often, an existing workload changes behavior (possibly due to an application change – for instance a patch or a modified SQL query). This stuff happens.

How some vendors have tried to fix the issue with Manual “QoS”

As always, there is more than one way to skin a cat, if one is so inclined. Here are a couple of manual methods to fix workload contention:

  • Some arrays have a simple IOPS or throughput limit that an administrator can manually adjust in order to fix a performance problem. This is an iterative and reactive method and hard to automate properly in real time. In addition, if the issue was caused by an internal array-generated workload, there is often no tooling available to throttle those processes.
  • Other arrays insist on the user setting up minimum, maximum and burst IOPS values for every single volume in the system, upon volume creation. This assumes the user knows in advance what performance envelope is required, in detail, per volume. The reality is that almost nobody knows these things beforehand, and getting the numbers wrong can itself cause a huge problem with latencies. Most people just want to get on with their lives and have their stuff work without babysitting.
Manual mechanisms for fixing the “bully” workload challenge result in systems that are hard to consume and complex to support while under performance pressure. Moreover, when a performance issue occurs, speed of resolution is critical. The issue needs to be resolved immediately, especially for latency-sensitive workloads. Manual methods will simply not be fast enough. Business will be impacted.

How Nimble Storage Fixed the Noisy Neighbor Issue

No cats were harmed in the process. Nimble engineers looked at the extensive telemetry in InfoSight, used data science, and neatly identified areas that could be massively automated in order to optimize system behavior under a wide variety of adverse conditions. Some of what was done:

  • Highly advanced Fair Share disk scheduling (separate mechanisms that deal with different scenarios)
  • Fair Share CPU scheduling
  • Dynamic Weight Adjustment – automatically adjust priorities in various ways under different resource contention conditions, so that the system can always complete critical tasks and not fall dangerously behind

The end result is a system that:

  • Lets system latency increase gracefully and progressively as load increases
  • Carefully and automatically balances user and system workloads
  • Achieves I/O deadlines and preemption behavior
  • Eliminates the Noisy Neighbor problem without the need for any manual QoS adjustments
  • Allows latency-sensitive small-block I/O to proceed without interference from bully workloads
Such automation achieves a better business result: less risk, less OpEx, easy supportability, simple and safe overall consumption even under difficult conditions.

What Should Nimble Customers do to get this Capability?

As is typical with Nimble systems and their impressive Ease of Consumption, nothing fancy needs to be done apart from simply upgrading to a specific release of the code (in this case 3.1 and up – 2.3 did some of the magic but 3.1 is the fully realized vision).

A bit anticlimactic, apologies… if you like complexity, watching this instead is probably more fun than juggling QoS manually.

D

Technorati Tags: , , , ,

]]>
http://recoverymonkey.org/2016/04/08/the-well-behaved-storage-system-automatic-noisy-neighbor-avoidance/feed/ 2 796
Going Green: Why I Joined Nimble Storage http://recoverymonkey.org/2016/03/28/going-green-why-i-joined-nimble-storage/ http://recoverymonkey.org/2016/03/28/going-green-why-i-joined-nimble-storage/#comments Mon, 28 Mar 2016 12:15:51 +0000 http://recoverymonkey.org/?p=790 Continue reading "Going Green: Why I Joined Nimble Storage"]]> I am proud to announce that, as of today, I am a member of the Nimble Storage team.

Nimble Logo

This marks the end of an era – I spent quite a bit of time at NetApp: learned a lot, did a lot – by the end I had my hands in all kinds of sausage making… 🙂

I wish my friends at NetApp the best of luck for the future. The storage industry is a very tough arena, and one that will be increasingly harder and with less tolerance than ever before.

Why?

I compared Nimble Storage with many competitors before making my decision. Quite simply, Nimble’s core values agree with mine. It goes without saying that I wouldn’t choose to move to a company unless I believed they had the best technology (and the best support), but the core values is where it all starts. The product is built upon those core values.

I firmly believe that modern storage should be easy to consume. Indeed, it should be a joy to consume, even for complex multi-site environments. It should not be a burden to deal with. Nor should it be a burden to sell.

Systems that are holistically easy to consume have several business benefits, some of which are: lower OPEX and CAPEX, increased productivity, less risk, easier planning, faster problem resolution.

It’s important to understand that easy to consume is not at all the same as easy to usethat is but a very small subset of easy consumption.

The core value of easy consumption encompasses several aspects, many of which are ignored by most storage vendors. Most modern players will focus on ease of use, show demos of a pretty GUI and suchlike. “Look how easy it is to install” or “look how easy it is to create a LUN”. Well – there’s a lot more to worry about in real life.

The lifecycle of a storage system

Beyond initial installation and simple element creation, there is a multitude of things users and vendors need to be concerned with. Here’s a partial list:

  • Installation
  • Migration to/from
  • Provisioning
  • Host/fabric configuration
  • Backups, restores, replication
  • Scaling up/out
  • Upgrading from a smaller/older version
  • Firmware updates for all components (including drives)
  • Tech refresh
  • Support

What about more advanced topics?

A storage solution cannot exist in a vacuum. There are several ancillary (but extremely important) services needed in order to help consume storage, especially at scale. Services that typically cannot (and many that should not) reside on the storage system itself. How about…

  • Initial and future sizing
  • Capacity planning based on long-term usage data
  • Performance analysis and profiling
  • Performance issue resolution/recommendations
  • Root cause analysis
  • What-if scenario modeling
  • Support case resolution
  • Comprehensive end-to-end monitoring and alerting
  • Comprehensive reporting (including auditing)
  • Security (including RBAC and delegation)
  • Upgrade planning
  • Pervasive automation (including host-side)
  • Ensuring adherence to best practices
If a storage solution doesn’t make all or most of the above straightforward, then it is not truly easy to consume.

The problem:

Storage vendors will typically either be lacking in many of the above areas, or may need many different tools, and significant manual effort, in order to provide even some of these services.

Not having the tools creates an obvious problem – the customer and vendor simply can’t perform these functions, or the implementation is too basic. Most smaller vendors are in this camp. Not much functionality beyond what’s inside the storage device itself. Long-term consumption, especially at scale, becomes challenging.

On the other hand, having a multitude of tools to help with these areas also makes the solution hard to consume overall. Larger vendors fall into this category.

For instance: Customers may need to access many different tools just to monitor and alert on various metrics. One tool may provide certain information, another tool provides different metrics (often with significant overlap with the first tool), and so on. And not all tools work with all versions of the product. This increases administrative complexity and overall time and money spent. And the end result is often compromised and incredibly hard to support.

Vendors that need many different tools also create a problem for themselves: Almost nobody on staff will have the expertise to deal with the plethora of tools necessary to do certain things like sizing, performance troubleshooting or even a tech refresh. Or optimizing a product for specific workloads. Deep expertise is often needed to interpret the results of the tools. This causes interminable delays in problem resolution, lengthens sales cycles, complicates product development, creates staffing challenges, increases costs, and in general makes life miserable.

RG autojack

How?

What always fascinated me about Nimble Storage is that not only did they recognize these challenges, they actually built an entire infrastructure and innovative approach in order to solve the problem.

Nimble recognized the value of Predictive Analytics.

The challenge: How to use Big Data to solve the challenges faced by storage customers and storage vendors. And how to do this in a way that achieves a dramatically better end result.

While most vendors have call-home features, and some even have rudimentary capacity, configuration and maybe even performance telemetry being sent to some central repository (usually very infrequently), Nimble elected instead to send extremely comprehensive sensor telemetry to a huge analytics engine. A difficult undertaking, but one that would define the company in the years to come.

Nimble also recognized the need to do this from the very beginning. Each Nimble array sends 30-70 million data points back to Nimble every day. Trying to retrofit telemetry of this scope would be extremely difficult if not impossible to achieve effectively.

This wealth of data (the largest storage-related analytics engine in the world, by far) is used to help customers with the challenges mentioned previously, while at the same time lowering complexity.

It also, crucially, helps Nimble better support customers and design better products without having to bother customers for data dumps.

For example: What if a Nimble engineer trying to optimize SQL I/O performance wants to see detailed I/O statistics only for SQL workloads on all Nimble arrays in the world? Or on one array? Or on all arrays at a certain customer? It’s only a simple query away… and that’s just scratching the surface of what’s possible. It certainly beats trying to design storage based on arbitrary synthetic benchmarks, or manually grabbing performance traces from customer gear…

What?

Enter InfoSight. That’s the name of the gigantic analytics engine currently ingesting trillions of anonymized sensor data points every week. And growing. Check some numbers here

Nimble Storage customers do not need to install custom monitoring tools to perform highly advanced storage analytics, performance troubleshooting, and even hardware upgrade recommendations based on automated performance analysis heuristics.

No need to use the CLI, no need to manually send data dumps to the vendor, no need to use 10 different tools.

All the information customers need is available through a browser GUI. Even the vast majority of support cases are automatically handled by InfoSight, and I’m not talking about simply sending replacement hardware (that’s trivial).

I always saw InfoSight as the core offering of Nimble Storage, the huge differentiator that works hand in hand with the hardware and helps customers consume storage easily. Yes, Nimble Storage arrays are fast, reliable, easy to use, have impressive data reduction abilities, scale nicely, have great features, are cost-effective etc. But other vendors can claim they can satisfy at least some of those attributes.

Nobody else has anything even remotely the depth and capability of InfoSight. This is why Nimble calls their offering the Predictive Flash Platform. InfoSight Predictive Analytics + great hardware = Predictive Flash.

I will be covering this fascinating topic in a lot more depth in the future. An AI Expert System powered by a behemoth analytics engine, helping reduce complexity and making the solution Easy To Consume is a pretty impressive piece of engineering.

Watch this space…

D

Technorati Tags: , , , ,

]]>
http://recoverymonkey.org/2016/03/28/going-green-why-i-joined-nimble-storage/feed/ 14 790
7-Mode to Clustered ONTAP Transition http://recoverymonkey.org/2016/02/05/7-mode-to-clustered-ontap-transition/ http://recoverymonkey.org/2016/02/05/7-mode-to-clustered-ontap-transition/#comments Sat, 06 Feb 2016 00:38:44 +0000 http://recoverymonkey.org/?p=775 Continue reading "7-Mode to Clustered ONTAP Transition"]]> I normally deal with different aspects of storage (arguably far more exciting) but I thought I would write something to provide some common sense perspective on the current state of 7-Mode to cDOT adoption.

I will tackle the following topics:

  1. cDOT vs 7-Mode capabilities
  2. Claims that not enough customers are moving to cDOT
  3. 7-Mode to cDOT transition is seen by some as difficult and expensive
  4. Some argue it might make sense to look at competitors and move to those instead
  5. What programs and tools are offered by NetApp to make transition easy and quick
  6. Migrating from competitors to cDOT

cDOT vs 7-Mode capabilities

I don’t want to make this into a dissertation or take a trip down memory lane. Suffice it to say that while cDOT has most of the 7-Mode features, it is internally very different but also much, much more powerful than 7-Mode – cDOT is a far more capable and scalable storage OS in almost every possible way (and the roadmap is utterly insane).

For instance, cDOT is able to nondisruptively do anything, including crazy stuff like moving SMB shares around cluster nodes (moving LUNs around is much easier than dealing with the far more quirky SMB protocol). Some reasons to move stuff around the cluster could be node balancing, node evacuation and replacement… all done on the fly in cDOT, regardless of protocol.

cDOT also handles Flash much better (cDOT 8.3.x+ can be several times faster than 7-Mode for Flash on the exact same hardware). Even things like block I/O (FC and iSCSI) are completely written from the ground up in cDOT. Cloud integration. Automation. How failover is done. Or how CPU cores are used, how difficult edge conditions are handled… I could continue but then the ADD-afflicted would move on, if they haven’t already…

In a nutshell, cDOT is a more flexible, forward-looking architecture that respects the features that made 7-Mode so popular with customers, but goes incredibly further. There is no competitor with the breadth of features available in cDOT, let alone the features coming soon.

cDOT is quite simply the next logical step for a 7-Mode customer.

Not enough customers moving to cDOT?

The reality is actually pretty straightforward.

  • Most new customers simply go with cDOT, naturally. 7-Mode still has a couple of features cDOT doesn’t, and if those features are really critical to a customer, that’s when someone might go with 7-Mode today. With each cDOT release the feature delta list gets smaller and smaller. Plus, as mentioned earlier, cDOT has a plethora of features and huge enhancements that will never make it to 7-Mode, with much more coming soon.
  • The things still missing in cDOT (like WORM) aren’t even offered by the majority of storage vendors… many large customers use our WORM technology.
  • Large existing customers, especially ones running critical applications, naturally take longer to cycle technologies. The average time to switch major technologies (irrespective of vendor) is around 5 years. The big wave of cDOT transitions hasn’t even hit yet!
  • Given that cDOT 8.2 with SnapVault was the appropriate release for many of our customers and 8.3 the release for most of our customers, a huge number of systems are still within that 5-year window prior to converting to cDOT, given when those releases came out.
  • Customers with mission-critical systems will typically not convert an existing system – they will wait for the next major refresh cycle. Paranoia rules in those environments (that’s a general statement regardless of vendor). And we have many such customers.

7-Mode to cDOT transition is seen by some as difficult and expensive

This is a fun one, and a favorite FUD item for competitors and so-called “analysts”. I sometimes think we confused people by calling cDOT “ONTAP”. I bet expectations would be different if we’d called it “SuperDuper ClusterFrame OS”.

You see, cDOT is radically different in its internals versus 7-Mode – however, it’s still officially also called “ONTAP”. As such, customers are conditioned to super-easy upgrades between ONTAP releases (just load the new code and you’re done). cDOT is different enough that we can’t just do that.

I lobbied for the “ClusterFrame” name but was turned down BTW. I still think it rocks.

The fact that you can run either 7-Mode or cDOT on the same physical hardware confuses people even further. It’s a good thing to be able to reuse hardware (software-defined and all that). Some vendors like to make each new rev of the same family line (and its code) utterly incompatible with the last one… we don’t do that.

And for the startup champions: Startups haven’t been around long enough to have seen a truly major hardware and/or software change! (another thing conveniently ignored by many). Nor do they have the sheer amount of features and ancillary software ONTAP does. And of course, some vendors forget to mention what even a normal tech refresh looks like for their fancy new “built from the ground up” box with the extremely exciting name.

We truly know how to do upgrades… probably better than any vendor out there. For instance: What most people don’t know is that WAFL (ONTAP’s underlying block layout abstraction layer) has been quietly upgraded many, many times over the years. On the fly. In major ways. With a backout option. Another vendor’s product (again the one with the extremely exciting name) needed to be wiped twice by as many “upgrades” in one year in order to have its block layout changed.

Here’s the rub:

Transition complexity really depends on how complex your current deployment is, your appetite for change and tolerance of risk. But transition urgency depends on how much you need the fully nondisruptive nature of cDOT and all the other features it has vs 7-Mode.

What I mean by that:

We have some customers that lose upwards of $4m/hour of downtime. The long-term benefits of a truly nondisruptive architecture make any arguments regarding migration efforts effectively moot.

If you are using a lot of the 7-Mode features and companion software (and it has more features than almost any other storage OS), specific tools written only for 7-Mode, older OS clients only supported on 7-Mode, tons of snaps and clones going back to several years’ retention etc…

Then, in order to retain that kind of similar elaborate deployment in cDOT, the migration effort will also naturally be a bit more complex. But still doable. And we can automate most of it. Including moving over all the snapshots and archives!

On the other hand, if you are using the system like an old-fashioned device and aren’t taking advantage of all the cool stuff, then moving to anything is relatively easy. And especially if you’re close to 100% virtualized, migration can be downright trivial (though simply moving VMs around storage systems ignores any snapshot history – the big wrinkle with VM migrations).

Some argue it might make sense to look at competitors and move to those instead

Looking at options is something that makes business sense in general. It would be very disingenuous of me to say it’s foolish to look at options.

But this holds firmly true: If you want to move to a competitor platform, and use a lot of the 7-Mode features, it would arguably be impossible to do cleanly and maintain full functionality (at a bare minimum you’d lose all your snaps and clones – and some customers have several years’ worth of backup data in SnapVault – try asking them to give that up).

This is true for all competitor platforms: Someone using specific features, scripts, tools, snaps, clones etc. on any platform, will find it almost impossible to cleanly migrate to a different platform. I don’t care who makes it. Doesn’t matter. Can you cleanly move from VMware + VMware snaps to Hyper-V and retain the snaps?

Backup/clone retention is really the major challenge here – for other vendors. Do some research and see how frequently customers switch backup platforms… <img src=" class="wp-smiley" style="height: 1em; max-height: 1em;" /> We can move snaps etc. from 7-Mode to cDOT just fine <img src=" class="wp-smiley" style="height: 1em; max-height: 1em;" />

The less features you use, the easier the migration and acclimatization to new stuff becomes, but the less value you are getting out of any given product.

Call it vendor lock-in if you must, but it’s merely a side effect of using any given device to its full potential.

The reality: it is incredibly easier to move from 7-Mode to cDOT than from 7-Mode to other vendor products. Here’s why…

What programs and tools are offered by NetApp to make transition easy and quick?

Initially, migration of a complex installation was harder. But we’ve been doing this a while now, and can do the following to make things much easier:

  1. The very cool 7MTT (7-Mode Transition Tool). This is an automation tool we keep rapidly enhancing that dramatically simplifies migrations of complex environments from 7-Mode to cDOT. Any time and effort analysis that ignores how this tool works is quite simply a flawed and incomplete analysis.
  2. After migrating to a new cDOT system, you can take your old 7-Mode gear and convert it to cDOT (another thing that’s impossible with a competitor – you can’t move from, say, a VNX to a VMAX and then convert the VNX to a VMAX).
  3. As of cDOT 8.3.0: We made SnapMirror replication work for all protocols from 7-Mode to cDOT! This is the fundamental way we can easily move over not just the baseline data but also all the snaps, clones etc. Extremely important, and something that moving to a competitor would simply be impossible to carry forward.
  4. As of cDOT 8.3.2 we are allowing something pretty amazing: CFT (Copy Free Transition). Which does exactly what the name suggests: Allows not having to move any data over to cDOT! It’s a combination ONTAP and 7MTT feature, and allows disconnecting the shelves from 7-Mode controllers and re-attaching them to cDOT controllers, and thereby converting even a gigantic system in practically no time. See here for a quick guide, here for a great blog post.
And before I forget…

What about migrating from other vendors to cDOT?

It cuts both ways – anything less would be unfair. Not only is it far easier to move from 7-Mode to cDOT than to a competitor, it’s also easy to move from a competitor to cDOT! Since it’s all about growth, and that’s the only way real growth happens.

As of version 8.3.1 we have what’s called Online Foreign LUN Import (FLI). See link here. It’s all included – no special licenses needed.

With Online FLI we can migrate LUNs from other arrays and maintain maximum uptime during the migration (a cutover is needed at some point of course but that’s quick).

And all this we do without external “helper” gear or special software tools.

In the case of NAS migrations, we have the free and incredibly cool XCP software that can migrate things like high file count environments 25-100x faster than traditional methods. Check it out here.

In summary

I hardly expect to change the minds of people suffering from acute confirmation bias (I wish I could say “you know who you are”, not knowing you are afflicted is the major problem), but hopefully the more level-headed among us should recognize by now that:

  • 7-Mode to cDOT migrations are extremely straightforward in all but the most complex and custom environments
  • Those same complex environments would find it impossible to transparently migrate to anything else anyway
  • Backups/clones is one of those things that complicates migrations for any vendor – ONTAP happens to be used by a lot of customers to handle backups as part of its core value prop
  • NetApp provides extremely powerful tools to help with migrations from 7-Mode to cDOT and from competitors to cDOT (with amazing tools for both SAN and NAS!) that will also handle the backups/clones/archives!
  • The grass isn’t always greener on the other side – The transition from 7-Mode to cDOT is the first time NetApp has asked customers to do anything that major in over 20 years. Other, especially younger, vendors haven’t even seen a truly major code change yet. How will they react to such a thing? NetApp is handling it just fine 🙂

D

Technorati Tags: , , , , , ,

]]>
http://recoverymonkey.org/2016/02/05/7-mode-to-clustered-ontap-transition/feed/ 3 775
NetApp E-Series posts top ten price performance SPC-2 result http://recoverymonkey.org/2015/12/07/netapp-e-series-posts-top-ten-price-performance-spc-2-result/ http://recoverymonkey.org/2015/12/07/netapp-e-series-posts-top-ten-price-performance-spc-2-result/#comments Mon, 07 Dec 2015 15:54:14 +0000 http://recoverymonkey.org/?p=766 Continue reading "NetApp E-Series posts top ten price performance SPC-2 result"]]> NetApp published a Top Ten Price/Performance SPC-2 result for the E5660 platform (the all-SSD EF560 variant of the platform also comfortably placed in the Top Ten Price/Performance for the SPC-1 test). In this post I will explain why the E-Series platform is an ideal choice for workloads requiring high performance density while being cost effective.

Executive Summary

The NetApp E5660 offers a unique combination of high performance, low cost, high reliability and extreme performance density, making it perfectly suited for applications that require high capacity, very high speed sequential I/O and a dense form factor. In EF560 form with all SSD, it offers that same high sequential speed shown here coupled with some of the lowest latencies for extremely hard OLTP workloads as demonstrated by the SPC-1 results.

The SPC-2 test

Whereas the SPC-1 test deals with an extremely busy OLTP system (high percentage of random I/O, over 60%writes, with SPC-1 IOPS vs latency being a key metric), SPC-2 instead focuses on throughput, with SPC-2 MBPS (MegaBytes Per Second) being a key metric.

SPC-2 tries to simulate applications that need to move a lot of data around very rapidly. It is divided into three workloads:

  1. Large File Processing: Applications that need to do operations with large files, typically found in analytics, scientific computing and financial processing.
  2. Large Database Queries: Data mining, large table scans, business intelligence…
  3. Video on Demand: Applications that deal with providing multiple simultaneous video streams (even of the same file) by drawing from a large library.
The SPC-2 test measures each workload and then provides an average SPC-2 MBPS across the three workloads. Here’s the link to the SPC-2 NetApp E5660 submission.

Metrics that matter

As I’ve done in previous SPC analyses, I like using the published data to show business value with additional metrics that may not be readily spelled out in the reports. Certainly $/SPC-2 MBPS is a valuable metric, as is the sheer SPC-2 MBPS metric that shows how fast the array was in the benchmark.

However… think of what kind of workloads represent the type of I/O shown in SPC-2. Analytics, streaming media, data mining. 

Then think of the kind of applications driving such workloads.

Such applications typically treat storage like lego bricks and can use multiple separate arrays in parallel (there’s no need or even advantage in having a single array). Those deployments invariably need high capacities, high speeds and at the same time don’t need the storage to do anything too fancy beyond being dense, fast and reliable.

In addition, such solutions often run in their own dedicated silo, and don’t share their arrays with the rest of the applications in a datacenter. Specialized arrays are common targets in these situations.

It follows that a few additional metrics are of paramount importance for these types of applications and workloads to make a system relevant in the real world:

  • I/O density – how fast does this system go per rack unit of space?
  • Price per used (not usable) GB – can I use most of my storage space to drive this performance, or just a small fraction of the available space?
  • Used capacity per Rack Unit – can I get a decent amount of application space per Rack Unit and maintain this high performance?

Terms used

Some definition of the terms used in the chart will be necessary. The first four are standard metrics found in all SPC-2 reports:

  • $/SPC-2 MBPS – the standard SPC-2 reported metric of price/performance
  • SPC-2 MBPS – the standard SPC-2 reported metric of the sheer performance attained for the test. It’s important to look at this number in the context of the application used to generate it – SPC-2. It will invariably be a lot less than marketing throughput numbers… Don’t compare marketing numbers for systems that don’t have SPC-2 results to official, audited SPC-2 numbers. Ask the vendor that doesn’t have SPC-2 results to submit their systems and get audited numbers.
  • Price – the price (in USD) submitted for the system, after all discounts etc.
  • Used Capacity – this is the space actually consumed by the benchmark, in GB. “ASU Capacity” in SPC parlance. This is especially important since many vendors will use a relatively small percentage of their overall capacity (Oracle for example uses around 28% of the total space in their SPC-2 submissions, NetApp uses over 79%). Performance is often a big reason to use a small percentage of the capacity, especially with spinning disks.

This type of table is found in all submissions, the sections before it explain how capacity was calculated. Page 8 of the E5660 submission:

E5660 Utilization

The next four metrics are very easily derived from existing SPC-2 metrics in each report:

  • $/ASU GB: How much do I pay for each GB actually consumed by the benchmark? Since that’s what the test actually used to get the reported performance… metrics such as cost per raw GB are immaterial. To calculate, just divide $/ASU Capacity.
  • Rack Units: Simply the rack space consumed by each system based on the list of components. How much physical space does the system consume?
  • SPC-2 MBPS/Rack Unit: How much performance is achieved by each Rack Unit of space? Shows performance density. Simply divide the posted MBPS/total Rack Units.
  • ASU GB/Rack Unit: Of the capacity consumed by the benchmark, how much of it fits in a single Rack Unit of space? Shows capacity density. To calculate, divide ASU Capacity/total Rack Units.
A balanced solution needs to look at a combination of the above metrics.

Analysis of Results

Here are the Top Ten Price/Performance SPC-2 submissions as of December 4th, 2015:

SPC 2 Analysis

Notice that while the NetApp E5660 doesn’t dominate in sheer SPC-2 MBPS in a single system, it utterly annihilates the competition when it comes to performance and capacity density and cost/used GB. The next most impressive system is the SGI InfiniteStorage 5600, which is the SGI OEM version of the NetApp E5600 🙂 (tested with different drives and cost structure).

Notice that the NetApp E5660 with spinning disks is faster per Rack Unit than the fastest all-SSD system HP has to offer, the 20850… this is not contrived, it’s simple math. Even after a rather hefty HP discount, the HP system is over 20x more expensive per used GB than the NetApp E5600.

If you were buiding a real system for large-scale heavy-duty sequential I/O, what would you rather have, assuming non-infinite budget and rack space?

Another way to look at it: One could buy 10x the NetApp systems for the cost of a single 3Par system, and achieve, while running them easily in parallel:

  • 30% faster performance than the 3Par system
  • 19.7x the ASU Capacity of the 3Par system
These are real business metrics – same cost, almost 20x the capacity and 30% more performance.

This is where hero numbers and Lab Queens fail to match up to real world requirements at a reasonable cost. You can do similar calculations for the rest of the systems.

A real-life application: The Lawrence Livermore National Laboratory. They run multiple E-Series systems in parallel and can achieve over 1TB/s throughput. See here.

Closing Thoughts

With Top Ten Price/Performance rankings for both the SPC-1 and SPC-2 benchmarks, the NetApp E-Series platform has proven that it can be the basis for a highly performant, reliable yet cost-efficient and dense solution. In addition, it’s the only platform that is in the Top Ten Price/Performance for both SPC-1 and SPC-2.

A combination of high IOPS, extremely low latency and very high throughput at low cost is possible with this system. 

When comparing solutions, look beyond the big numbers and think how a system would provide business value. Quite frequently, the big numbers come with big caveats…

D

Technorati Tags: , , , ,

]]>
http://recoverymonkey.org/2015/12/07/netapp-e-series-posts-top-ten-price-performance-spc-2-result/feed/ 4 766
Architecture has long term scalability implications for All Flash Appliances http://recoverymonkey.org/2015/11/26/architecture-has-long-term-scalability-implications-for-all-flash-appliances/ http://recoverymonkey.org/2015/11/26/architecture-has-long-term-scalability-implications-for-all-flash-appliances/#comments Thu, 26 Nov 2015 13:16:43 +0000 http://recoverymonkey.org/?p=759 Continue reading "Architecture has long term scalability implications for All Flash Appliances"]]> Recently, NetApp announced the availability of a 3.84TB SSD. It’s not extremely exciting – it’s just a larger storage medium. Sure, it’s really advanced 3D NAND, it’s fast and ultra-reliable, and will allow some nicely dense configurations at a reduced $/GB. Another day in Enterprise Storage Land.

But, ultimately, that’s how drives roll – they get bigger. And in the case of SSD, the roadmaps seem extremely aggressive regarding capacities.

Then I realized that several of our competitors don’t have this large SSD capacity available. Some don’t even have half that.

But why? Why ignore such a seemingly easy and hugely cost-effective way to increase density?

In this post I will attempt to explain why certain architectural decisions may lead to inflexible design constructs that can have long-term flexibility and scalability ramifications.

Design Center

Each product has its genesis somewhere. It is designed to address certain key requirements in specific markets and behave in a better/different way than competitors in some areas. Plug specific gaps. Possibly fill a niche or even become a new product category.

This is called the “Design Center” of the product.

Design centers can evolve over time. But, ultimately, every product’s Design Center is an exercise in compromise and is one of the least malleable parts of the solution.

There’s no such thing as a free lunch. Every design decision has tradeoffs. Often, those tradeoffs sacrifice long term viability for speed to market. There’s nothing wrong with tradeoffs as long as you know what those are, especially if the tradeoffs have a direct impact on your data management capabilities long term.

It’s all about the Deduplication/RAM relationship

Aside from compression, scale up and/or scale out, deduplication is a common way to achieve better scalability and efficiencies out of storage.

There are several ways to offer deduplication in storage arrays: Inline, post-process, fixed chunk, variable chunk, per volume scope, global scope – all are design decisions that can have significant ramifications down the line and various business impacts.

Some deduplication approaches require very large amounts of memory to store metadata (hashes representing unique chunk signatures). This may limit scalability or make a product more expensive, even with scale-out approaches (since many large, costly controllers would be required).

There is no perfect answer, since each kind of architecture is better at certain things than others. This is what is meant by “tradeoffs” in specific Design Centers. But let’s see how things look for some example approaches (this is not meant to be a comprehensive list of all permutations).

I am keeping it simple – I’m not showing how metadata might get shared and compared between nodes (in itself a potentially hugely impactful operation as some scale-out AFA vendors have found to their chagrin). In addition, I’m not exploring container vs global deduplication or different scale-out methods – this post would become unwieldy… If there’s interest drop me a line or comment and I will do a multi-part series covering the other aspects.

Fixed size chunk approach

In the picture below you can see the basic layout of a fixed size chunk deduplication architecture. Each data chunk is represented by a hash value in RAM. Incoming new chunks are compared to the RAM hash store in order to determine where and whether they may be stored:

Hashes fixed chunk

The benefit of this kind of approach is that it’s relatively straightforward from a coding standpoint, and it probably made a whole lot of sense a couple of years ago when small SSDs were all that was available and speed to market was a major design decision.

The tradeoff is that a truly exorbitant amount of memory is required in order to store all the hash metadata values in RAM. As SSD capacities increase, the linear relationship of SSD size vs RAM size results in controllers with multi-TB RAM implementations – which gets expensive.

It follows that systems using this type of approach will find it increasingly difficult (if not impossible) to use significantly larger SSDs without either a major architectural change or the cost of multiple TB of RAM dropping dramatically. You should really ask the vendor what their roadmap is for things like 10+TB SSDs… and whether you can expand by adding the larger SSDs into a current system without having to throw everything you’ve already purchased away.

Variable size chunk approach

This one is almost identical to the previous example, but instead of a small, fixed block, the architecture allows for variable size blocks to be represented by the same hash size:

Hashes variable chunk

This one is more complex to code, but the massive benefit is that metadata space is hugely optimized since much larger data chunks are represented by the same hash size as smaller data chunks. The system does this chunk division automatically. Less hashes are needed with this approach, leading to better utilization of memory.

Such an architecture needs far less memory than the previous example. However, it is still plagued by the same fundamental scaling problem – only at a far smaller scale. Conversely, it allows a less expensive system to be manufactured than in the previous example since less RAM is needed for the same amount of storage. By combining multiple inexpensive systems via scale-out, significant capacity at scale can be achieved at a lesser cost than with the previous example.

Fixed chunk, metadata both in RAM and on-disk

An approach to lower the dependency on RAM is to have some metadata in RAM and some on SSD:

Hashes fixed chunk metadata on disk

This type of architecture finds it harder to do full speed inline deduplication since not all metadata is in RAM. However, it also offers a more economical way to approach hash storage. SSD size is not a concern with this type of approach. In addition, being able to keep dedupe metadata on cold storage aids in data portability and media independence, including storing data in the cloud.

Variable chunk, multi-tier metadata store

Now that you’ve seen examples of various approaches, it starts making logical sense what kind of architectural compromises are necessary to achieve both high deduplication performance and capacity scale.

For instance, how about variable blocks and the ability to store metadata on multiple tiers of storage? Upcoming, ultra-fast Storage Class Memory technologies are a good intermediate step between RAM and SSD. Lots of metadata can be placed there yet retain high speeds:

Hashes variable chunk metadata on 3 tiers

Coding for this approach is of course complex since SCM and SSD have to be treated as a sort of Level 2/Level 3 cache combination but with cache access time spans in the days or weeks, and parts of the cache never going “cold”. It’s algorithmically more involved, plus relies on technologies not yet widely available… but it does solve multiple problems at once. One could of course use just SCM for the entire metadata store and simplify implementation, but that would somewhat reduce the performance afforded by the approach shown (RAM is still faster). But if the SCM is fast enough… 🙂

However, being able to embed dedupe metadata in cold storage can still help with data mobility and being able to retain deduplication even across different types of storage and even cloud. This type of flexibility is valuable.

Why should you care?

Aside from the academic interest and nerd appeal, different architecture approaches have a business impact:

  • Will the storage system scale large enough for significant future growth?
  • Will I be able to use significantly new media technologies and sizes without major disruption?
  • Can I use extremely large media sizes?
  • Can I mix media sizes?
  • Can I mix controller types in a scale-out cluster?
  • Can I use cost-optimized hardware?
  • Does deduplication at scale impact performance negatively, especially with heavy writes?
  • If inline efficiencies aren’t comprehensive, how does that affect overall capacity sizing?
  • Does the deduplication method enforce a single large failure domain? (single pool – meaning that any corruption would result in the entire system being unusable)
  • What is the interoperability with Cloud and Disk technologies?
  • Can data mobility from All Flash to Disk to Cloud retain deduplication savings?
  • What other tradeoffs is this shiny new technology going to impose now and in the future? Ask to see a 5-year vision roadmap!

Always look beyond the shiny feature and think of the business benefits/risks. Some of the above may be OK for you. Some others – not so much.

There’s no free lunch.

D

Technorati Tags: , , , ,

]]>
http://recoverymonkey.org/2015/11/26/architecture-has-long-term-scalability-implications-for-all-flash-appliances/feed/ 3 759
Proper Testing vs Real World Testing http://recoverymonkey.org/2015/10/01/proper-testing-vs-real-world-testing/ http://recoverymonkey.org/2015/10/01/proper-testing-vs-real-world-testing/#respond Thu, 01 Oct 2015 22:13:11 +0000 http://recoverymonkey.org/?p=745 Continue reading "Proper Testing vs Real World Testing"]]> The idea for this article came from seeing various people attempt product testing. Though I thought about storage when writing this, the ideas apply to most industries.

Three different kinds of testing

There are really three different kinds of testing.

The first is the incomplete, improper, useless in almost any situation testing. Typically done by people that have little training on the subject. Almost always ends in misleading results and is arguably dangerous, especially if used to make purchasing decisions.

The second is what’s affectionately and romantically called “Real World Testing”. Typically done by people that will try to simulate some kind of workload they believe they encounter in their environment, or use part of their environment to do the testing. Much more accurate than the first kind, if done right. Usually the workload is decided arbitrarily 🙂

The third and last kind is what I term “Proper Testing”. This is done by professionals (that usually do this type of testing for a living) that understand how complex testing for a broad range of conditions needs to be done. It’s really hard to do, but pays amazing dividends if done thoroughly.

Let’s go over the three kinds in more details, with some examples.

Useless Testing

Hopefully after reading this you will know if you’re a perpetrator of Useless Testing and never do it again.

A lot of what I deal with is performance, so I will draw examples from there.

Have you or someone you know done one or more of the following after asking to evaluate a flashy, super high performance enterprise storage device?

  • Use tiny amounts of test data, ensuring it will all be cached
  • Try to test said device with a single server
  • With only a couple of I/O ports
  • With a single LUN
  • With a single I/O thread
  • Doing only a certain kind of operation (say, all random reads)
  • Using only a single I/O size (say, 4K or 32K) for all operations
  • Not looking at latency
  • Using extremely compressible data on an array that does compression

I could go on but you get the idea. A good move would be to look at the performance primer here before doing anything else…

Pitfalls of Useless Testing

The main reason people do such poor testing is usually that it’s easy to do. Another reason is that it’s easy to use it to satisfy a Confirmation Bias.

The biggest problem with such testing is that it doesn’t tell you how the system behaves if exposed to different conditions.

Yes, you see how it behaves in a very specific situation, and that situation might even be close to a very limited subset of what you need to do in “Real Life”, but you learn almost nothing about how the system behaves in other kinds of scenarios.

In addition, you might eliminate devices that would normally behave far better for your applications, especially under duress, since they might fail Useless Testing scenarios (since they weren’t designed for such unrealistic situations).

Making purchasing decisions after doing Useless Testing will invariably result in making bad purchasing decisions unless you’re very lucky (which usually means that your true requirements weren’t really that hard to begin with).

Part of the problem of course is not even knowing what to test for.

“Real World” Testing

This is what most sane people strive for.

How to test something in conditions most approximating how it will be used in real life.

Some examples of how people might try to perform “Real World” testing:

  • One could use real application I/O traces and advanced benchmarking software that can replay them, or…
  • Spin synthetic benchmarks designed to simulate real applications, or…
  • Put one of their production applications on the system and use it like they normally would.

Clearly, any of this would result in more accurate results than Useless Testing. However…

Pitfalls or “Real World” Testing

The problem with “Real World” testing is that it invariably does not reproduce the “Real World” closely enough to be comprehensive and truly useful.

Such testing addresses only a small subset of the “Real World”. The omissions dictate how dangerous the results are.

In addition, extrapolating larger-scale performance isn’t really possible. How the system ran one application doesn’t mean you know how it will run ten applications in parallel.

Some examples:

  • Are you testing one workload at a time, even if that workload consists of multiple LUNs? Shared systems will usually have multiple totally different workloads hitting them in parallel, often wildly different and conflicting, coming and going at different times, each workload being a set of related LUNs. For instance, an email workload plus a DSS plus an OLTP plus a file serving plus a backup workload in parallel… 🙂
  • Are you testing enough workloads in parallel? True Enterprise systems thrive on crunching many concurrent workloads.
  • Are you injecting other operations that you might be doing in the “Real World”? For instance, replicate and delete large amounts of data while doing I/O on various applications? Replications and deletions have been known to happen… 😉
  • Are you testing how the system behaves in degraded mode while doing all the above? Say, if it’s missing a controller or two, or a drive or two… also known to happen.

That’s right, this stuff isn’t easy to do properly. It also takes a very long time. Which is why serious Enterprise vendors have huge QA organizations going through these kinds of scenarios. And why we get irritated when we see people drawing conclusions based on incomplete testing 😉

Proper Testing

See, the problem with the “Real World” concept is that there are multiple “Real Worlds”.

Take car manufacturers for instance.

Any car manufacturer worth their salt will torture-test their cars in various wildly different conditions, often rapidly switching from one to the other if possible to make things even worse:

  • Arctic conditions
  • Desert, dusty conditions plus harsh UV radiation
  • Extremely humid conditions
  • All the above in different altitudes

You see, maybe your “Real World” is Alaska, but another customer’s “Real World” will be the very wet Cherrapunji, and a third customer’s “Real World” might be the Sahara desert. A fourth might be a combination (cold, very high altitude, harsh UV radiation, or hot and extremely humid).

These are all valid “Real World” conditions, and the car needs to be able to deal with all of them while maintaining full functioning until beyond the warranty period.

Imagine if moving from Alaska to the Caribbean meant your car would cease functioning. People have been known to move, too… 🙂

Storage manufacturers that actually know what they’re doing have an even harder job:

We need to test multiple “Real Worlds” in parallel. Oh, and we do test the different climate conditions as well, don’t think we assume everyone operates their hardware in ideal conditions… especially folks in the military or any other life-or-death situation.

Closing thoughts

If I’ve succeeded in stopping even one person from doing Useless Testing, this article was a success 🙂

It’s important to understand that Proper Testing is extremely hard to do. Even gigantic QA organizations miss bugs, imagine someone doing incomplete testing. There are just too many permutations.

Another sobering thought is that very few vendors are actually able to do Proper Testing of systems. The know-how, sheer time and numbers of personnel needed is enough to make most smaller vendors skimp on the testing out of pure necessity. They test what they can, otherwise they’d never ship anything.

A large number of data points helps. Selling millions of units of something provides a vendor with way more failure data than selling a thousand units of something possibly can. Action can be taken to characterize the failure data and see what approach is best to avoid the problem, and how common it really is.

One minor failure in a million may not be even worth fixing, whereas if your sample size is ten, one failure means 10%. It might be the exact same failure, but you don’t know that. Or you might have zero failures in the sample size of ten, which might make you think your solution is better than it really is…

But I digress into other, admittedly interesting areas.

All I ask is… think really hard before testing anything in the future. Think what you really are trying to prove. And don’t be afraid to ask for help.

D

Technorati Tags: , , ,

]]>
http://recoverymonkey.org/2015/10/01/proper-testing-vs-real-world-testing/feed/ 0 745
NetApp Enterprise Grade Flash http://recoverymonkey.org/2015/06/24/netapp-enterprise-grade-flash/ http://recoverymonkey.org/2015/06/24/netapp-enterprise-grade-flash/#comments Wed, 24 Jun 2015 06:55:59 +0000 http://recoverymonkey.org/?p=733 Continue reading "NetApp Enterprise Grade Flash"]]> June 23rd marked the release of new NetApp Enterprise Flash technology. The release consists of:

  • New ONTAP 8.3.1 with significant performance, feature and usability enhancements
  • New Inline Storage Efficiencies
  • New All-Flash FAS systems (AFF works only with SSDs)
  • New, aggressive pricing model
  • New maintenance options (price protection on warranty for up to 7 years, even if you buy less warranty up front)

Enterprise Grade Flash

So what does “Enterprise Grade Flash” mean?

Simple:

The combination of new technologies that Flash made possible, like high performance and inline storage efficiencies, plus ease of use, plus Enterprise features such as:

  • Deep application integration for clones, backups, restores, migrations
  • Comprehensive automation
  • Scale up
  • Scale out
  • Non-disruptive everything
  • Multiprotocol (SAN and NAS in the same system)
  • Secure multi-tenancy
  • Encryption (FIPS 140-2 validated hardware)
  • Hardened storage OS
  • Enterprise hardware
  • Comprehensive backup, cloning, archive
  • Comprehensive integration with leading backup software
  • Synchronous replication
  • Active-Active Datacenters
  • The ability to have AFF systems in the same cluster as hybrid
  • Seamless data movement between All-Flash and Hybrid
  • Easy replication between All-Flash and Hybrid
  • Cloud capable (storage OS ability to run in the cloud) – with replication to and from the cloud instances of ONTAP
  • QoS and SLO provisioning
  • Inline Foreign LUN Import (to make migrations from third-party arrays less impactful to the users)

The point is that competitor All-Flash offerings typically focus on a few of these points (for example, performance, ease of use and maybe inline storage efficiencies) but are severely lacking in the rest. This limited vision approach creates inflexibility and silos, and neither inflexibility nor silos are particularly desirable elements in Enterprise IT.

But what if you could have Flash without compromises? That’s what we are offering with the new AFF systems. A high performance offering with all the “coolness” but also all the “seriousness” and features Enterprise Grade storage demands.

It’s a bit like this Venn diagram:

NewImage

AFF simply offers far more flexibility than All-Flash competitors. There may be certain things AFF doesn’t do vs some of the competitors, but they pale vs what AFF does and the competitors cannot.

And even if you don’t need all the features – at least they’re there waiting for you in case you do need them in the future (for instance, you may not need to replicate to the cloud and back today, but knowing that you have the option is reassuring in case your IT strategy changes).

Architecture

It’s far harder to add serious enterprise data management features to newly built architectures than add new architecture benefits to a platform that already had the Enterprise Grade stuff down pat.

WAFL (the block layout engine of ONTAP) is already naturally well suited to working with SSDs:

  • Avoids modifying data in place
  • Writes to free space
  • Performs I/O coalescing in order to lump many operations in a single large I/O
  • Preserves the temporal locality of user data with metadata to further reduce I/O
  • Achieves a naturally low Write Amplification Factor (it’s worth noting that in all the years we’ve been selling Flash and after hundreds of PB, we have had exactly zero worn out SSDs – they’re not even close to wearing out).

So we optimized ONTAP where it mattered, while keeping the existing codebase where needed. It helps that the ONTAP architecture is already modular – it was fairly straightforward to enhance the parts of the code that dealt with storage media, for instance, or the parts that dealt with the I/O path. This significant optimization started with 8.3.0 and continued with 8.3.1.

The overall effect has been dramatically reduced latencies, enhanced code parallelism, increased resiliency, while at the same time enabling inline storage efficiencies. For customers still on 8.2.x and prior, or on 7-mode ONTAP, the differences with 8.3.1 will be pretty extreme… 🙂

Ease of Use

New with ONTAP 8.3.1 is a completely redesigned administrative GUI, and a SAN-optimized config for the ability to be serving I/O in 15 minutes from unpacking the system.

In addition, wizards allow the easy creation of LUNs for Databases with just 3 questions, and ONTAP can now be upgraded from the GUI.

Storage Efficiencies

ONTAP has had various flavors of storage efficiencies for a while. Those efficiencies typically had to be turned on manually, and often affected performance.

With ONTAP 8.3.1, Inline Compression is on by default, as is Inline Zero Deduplication (very helpful in VM deployments that use EagerZeroThick). In addition, Always On Deduplication is also available – a deduplication that runs very frequently (every 5 minutes).

In conjunction with already excellent thin provisioning plus state-of-the-art cloning and snapshot capabilities, some excellent efficiency ratios are possible. We can show up to 30:1 for certain kinds of VDI deployments, while things like Databases will not be able to be squeezed quite that much (especially if DB-side compression is already active). The overall efficiency ratio will vary depending on how the system is used.

Ultimately, Storage Efficiencies aim to reduce overall cost. Focus not so much on the actual efficiency ratio. Instead look at the effective price/TB. The efficiency differences between most vendors are probably less than most vendors want you to think they are – in real terms you will not save more than a few SSD’s worth of capacity. The real value lies elsewhere.

Performance

The AFF systems are fast. A maximum-size AFF cluster will do about 4 million IOPS at 1ms latency (8K random, with inline efficiencies). Throughput-wise, 100GB/s is possible.

We wanted to have performance stop being a discussion point except in the most extreme of situations. The reality is that any top-tier Flash solution will be fast. Once again – the real value lies elsewhere.

For the curious, here’s an IOPS vs Latency chart for a 2-node AFF8080 system, given that this level of performance is more than enough for the vast majority of customers out there (the max speed of a cluster is 12x what’s shown in this graph):

NewImage

Many optimizations were implemented in ONTAP 8.3.1 – certain operations are over 4x faster than with ONTAP 8.2.x, and almost 50% faster than with 8.3.0. SSDs, being as fast as they are, benefit from those optimizations the most.

If you want more performance proof points, we have previously shown SSD performance for a very difficult workload (over 60% writes, combination of block sizes, random and sequential I/O) with 8.3.0 in our SPC-1 benchmarks (needs to be updated for 8.3.1 but even the 8.3.0 result is solid). We also have SQL, Oracle and VDI Technical Reports.

We are also always happy to demonstrate these systems for you.

Pricing

There are some significant pricing changes, leading to an overall far more cost-effective solution. For instance, the cost delta between different controller models with the same amount of storage is far smaller than in the past. The scalability of all the systems is the same (240 SSDs * 1.6TB max currently per 2 nodes). The only difference is performance and the amount of connectivity possible.

Warranty pricing is now stable even if you buy 3 years up front and extend later on.

Oh – and all AFF models now include all NetApp FAS software: All the protocols, all the SnapManager application integration modules, replication, the works.

Final Words

The pace of innovation at NetApp is accelerating dramatically. We had to work hard to bring Clustered ONTAP to feature parity with the older 7-mode, which delayed things. Now with  8.3.x dropping 7-mode altogether, we have many more developers to focus on improving Clustered ONTAP. The big enhancements in 8.3.1 came very rapidly after 8.3.0 became GA… and there’s a lot more to come soon.

Make no mistake: NetApp is a storage giant and has an Engineering organization not to be trifled with.

Now, how do you like your Flash? With or without compromises? 🙂

D

Technorati Tags: , , , , , , ,

]]>
http://recoverymonkey.org/2015/06/24/netapp-enterprise-grade-flash/feed/ 6 733
Calculating the true cost of space efficient Flash solutions http://recoverymonkey.org/2015/06/15/calculating-the-true-cost-of-space-efficient-flash-solutions/ http://recoverymonkey.org/2015/06/15/calculating-the-true-cost-of-space-efficient-flash-solutions/#comments Tue, 16 Jun 2015 00:58:50 +0000 http://recoverymonkey.org/?p=728 Continue reading "Calculating the true cost of space efficient Flash solutions"]]> In this post I will try to help you understand how to objectively calculate the cost of space-efficient storage solutions – there’s just too much misinformation out there and it’s getting irritating since certain vendors aren’t exactly honest with how they do certain calculations…

A brief history lesson:

The faster a storage device, the smaller and more expensive it usually is. Flash was initially insanely expensive relative to spinning disk, so it was used in small amounts, typically as a tier and/or cache augmentation.

And so it came to be that flash-based storage systems started implementing some of the more interesting space efficiency techniques around. Interesting because it’s algorithmically easy to reduce data dramatically, but hard to do under high load while maintaining impressive IOPS and low latency.

Space efficiencies plus lower flash media costs bring us to today’s ability to use all-flash storage in ever-increasingly cost-effective amounts.

But how does one figure out the best deal?

There are some factors I won’t get into in this article. Company size and viability, support staff strength, maturity of the code, automation, overall features etc. all may play a huge role depending on the environment and requirements (and, indeed, will often eliminate several of the players from further consideration). However, I want to focus on the basics.

Recommended metric: Cost per effective TB

It’s easy to get lost in the hype. One company says they reduce by 3:1, another might say 5:1, yet another 10:1, etc. The high efficiency ratios seem to be attractive, right?

Well – you’re not paying for a high efficiency ratio. What you are paying for is for usable capacity.

If all solutions cost the same, the systems with high efficiency ratios would win this battle every day of the week and twice on Sundays.

However, solutions don’t all cost the same. Ask your vendor what the projected effective capacity will be for each specific configuration, and the Cost/Effective TB is a trivial calculation.

But there’s one more thing to do in order for the calculation to be correct:

Insist on calculating the efficiency ratio yourself.

Most storage systems will show a nice picture in the GUI with an overall efficiency ratio. Looks nice and easy. Well – the devil is in the details.

If a vendor is upfront about how they measure efficiency, your numbers might make sense.

This is where you trust but verify. Some pointers:

  • Take a note of the initial usable space before putting anything on the system.
  • If you store a 1TB DB and do nothing else to the data, what’s the efficiency?
  • Calculate the ratio yourself! Divide the amount of capacity the data is taking in the OS by the amount it’s taking on the storage.
  • Does the number make sense given the size of the data you just put on the system and how much usable space is left now?
  • If you take 10 snapshots of the data, what’s the efficiency? How about if you delete the snaps, does the efficiency change?
  • If you take a clone of the DB, what’s the efficiency?
  • If you delete the clone you just took, what’s the efficiency?
  • Create a large LUN (10TB for example) and only store 1TB of data in it. What’s the efficiency? Do you count thin provisioning as data reduction?
  • Does this all add up if you do the math manually instead of the GUI doing it for you?
  • Does it all meet your expectations? For example, if a vendor is claiming 5:1 reduction, can you actually store 5 different DBs in the space of one? Or do they really mean something else? That’s a pretty easy test…

You see, most vendors count savings a bit differently. In the examples above, that 1TB DB, if stored in a 10TB LUN, and cloned 10 times, will probably result in a very high efficiency number. It doesn’t mean however that 10 different DBs of the same size would have nearly the same efficiency ratio.

If you don’t have time to do a test in-house, have the vendor prove their claims and show how they do their math in their labs while you watch. You will typically find that each data type has a wildly different space efficiency ratio.

The bottom line

It’s pretty easy. Figure out the efficiency ratio on your own based on how you expect to use the system, then plug that ratio into the Price/Effective TB formula like so:

Real Cost per TB = Price/(Usable TB * Real Efficiency Ratio as a multiplier)

And, finally, a word on capacity guarantees:

Some vendors will guarantee capacity efficiencies. Always, always demand to see the fine print. If a vendor insists they will guarantee x:1 efficiency, have them sign an official legally binding agreement that has the backing of the vendor’s HQ (and isn’t some desperate local sales office ploy that might not be worth the paper it’s printed on).

Insist the guarantee states you will get that claimed efficiency no matter what you’re storing on the box.

Notice how quickly the small print will come 🙂

D

Technorati Tags: , , , ,

]]>
http://recoverymonkey.org/2015/06/15/calculating-the-true-cost-of-space-efficient-flash-solutions/feed/ 2 728
NetApp posts SPC-1 Top Ten Performance results for its high end systems – Tier 1 meets high functionality and high performance http://recoverymonkey.org/2015/04/22/netapp-posts-spc-1-top-ten-performance-results-for-its-high-end-systems-tier-1-meets-high-functionality-and-high-performance/ http://recoverymonkey.org/2015/04/22/netapp-posts-spc-1-top-ten-performance-results-for-its-high-end-systems-tier-1-meets-high-functionality-and-high-performance/#comments Wed, 22 Apr 2015 20:15:43 +0000 http://recoverymonkey.org/?p=653 Continue reading "NetApp posts SPC-1 Top Ten Performance results for its high end systems – Tier 1 meets high functionality and high performance"]]> It’s been a while since our last SPC-1 benchmark submission with high-end systems in 2012. Since then we launched all new systems, and went from ONTAP 8.1 to ONTAP 8.3, big jumps in both hardware and software.

In 2012 we posted an SPC-1 result with a 6-node FAS6240 cluster  – not our biggest system at the time but we felt it was more representative of a realistic solution and used a hybrid configuration (spinning disks boosted by flash caching technology). It still got the best overall balance of low latency (Average Response Time or ART in SPC-1 parlance, to be used from now on), high SPC-1 IOPS, price, scalability, data resiliency and functionality compared to all other spinning disk systems at the time.

Today (April 22, 2015) we published SPC-1 results with an 8-node all-flash high-end FAS8080 cluster to illustrate the performance of the largest current NetApp FAS systems in this industry-standard benchmark.

For the impatient…

  • The NetApp All-Flash FAS8080 SPC-1 submission places the system in the #5 performance spot in the SPC-1 Top Ten by performance list.
  • And #3 if you look at performance at load points around 1ms Average Response Time (ART).
  • The NetApp system uses RAID-DP, similar to RAID-6, whereas the other entries use RAID-10 (typically, RAID-6 is considered slower than RAID-10).
  • In addition, the FAS8080 shows the best storage efficiency, by far, of any Top Ten SPC-1 submission (and without using compression or deduplication).
  • The FAS8080 offers far more functionality than any other system in the list.

We also recently posted results with the NetApp EF560  – the other major hardware platform NetApp offers. See my post here and the official results here. Different value proposition for that platform – less features but very low ART and great cost effectiveness are the key themes for the EF560.

In this post I want to explain the current Clustered Data ONTAP results and why they are important.

Flash performance without compromise

Solid state storage technologies are becoming increasingly popular.

The challenge with flash offerings from most vendors is that customers typically either have to give up a lot in order to get the high performance of flash, or have to combine 4-5 different products into a complex “solution” in order to satisfy different requirements.

For instance, dedicated all-flash offerings may not be able to natively replicate to less expensive, spinning-drive solutions.

Or, a flash system may offer high performance but not the functionality, scalability, reliability and data integrity of more mature solutions.

But what if you could have it all? Performance and reliability and functionality and scalability and maturity? That’s exactly what Clustered Data ONTAP 8.3 provides.

Here are some Clustered Data ONTAP 8.3 running on FAS8080 highlights:

  • All the NetApp signature ultra-tight application integration and automation for replication, SnapShots, Clones
  • Fancy write-accelerated RAID6-equivalent protection by default
  • Comprehensive data integrity and protection against insidious lost write/torn page/misplaced write errors that RAID and normal checksums don’t always catch
  • Non-disruptive data mobility for all protocols
  • Non-disruptive operations – no downtime even when doing things that would require downtime and extensive PS with other vendors
  • Granular QoS
  • Deduplication and compression
  • Highly scalable – 5,760 drives possible in an 8-node cluster, 17,280 drives possible in the max 24 nodes. Various drive types in the cluster, from SSD to SATA and everything else in between.
  • Multiprotocol (FC, iSCSI, NFS, SMB1,2,3) on the same hardware (no “helper” boxes needed, no dedicated SAN vs NAS pools needed)
  • 96,000 LUNs per 8-node cluster (that’s right, ninety-six thousand LUNs – about 50% more than the maximum possible with the other high-end systems)
  • ONTAP is VMware vVol ready
  • The only array that has been validated by VMware for VMware Horizon 6 with vVols – hopefully the competitors will follow our lead
  • Over 460TB (yes, TeraBytes) of usable cache after all overheads are accounted for (and without accounting for cache amplification through deduplication and clones) in an 8-node cluster. Makes competitor maximum cache amounts seem like rounding errors – indeed, the actual figure might be 465TB or more, but it’s OK… 🙂 (and 3x that number in a 24-node cluster, over 1.3PB cache!)
  • The ability to virtualize other storage arrays behind it
  • The ability to have a cluster with dissimilar size and type nodes – no need to keep all engines the same (unlike monolithic offerings). Why pay the same for all nodes when some nodes may not need all the performance? Why be forced to keep all nodes in the same hardware family? What if you don’t want to buy all at once? Maybe you want to upgrade part of the cluster with a newer-gen system? 🙂
  • The ability to evacuate part of a cluster and build that part as a different cluster elsewhere
  • The ability to have multiple disk types in a cluster and, indeed, dedicate nodes to functions (for instance, have a few nodes all-flash, some nodes with flash-accelerated SAS and a couple with very dense yet flash-accelerated NL-SAS, with full online data mobility between nodes)
That last bullet deserves a picture:
 MixedCluster

 

“SVM” stands for Storage Virtual Machine –  it means a logical storage partition that can span one or more cluster nodes and have parts of the underlying capacity (performance and space) available to it, with its own users, capacity and performance limits etc.

In essence, Clustered Data ONTAP offers the best combination of performance, scalability, reliability, maturity and features of any storage system extant as of this writing. Indeed – look at some of the capabilities like maximum cache and number of LUNs. This is designed to be the cornerstone of a datacenter.

it makes most other systems seem like toys in comparison…

Ships

FUD buster

Another reason we wanted to show this result was FUD from competitors struggling to find an angle to fight NetApp. It goes a bit like this: “NetApp FAS systems aren’t real SAN, it’s all simulated and performance will be slow!”

Right…

Drevilsimulated

Well – for a “simulated” SAN (whatever that means), the performance is pretty amazing given the level of protection used (RAID6-equivalent – far more resilient and capacity-efficient for large pooled deployments than the RAID10 the other submissions use) and all the insane scalability, reliability and functionality on tap 🙂

Another piece of FUD has been that ONTAP isn’t “flash-optimized” since it’s a very mature storage OS and wasn’t written “from the ground up for flash”. We’ll let the numbers speak for themselves. It’s worth noting that we have been incorporating a lot of flash-related innovations into FAS systems well before any other competitor did so, something conveniently ignored by the FUD-mongers. In addition, ONTAP 8.3 has a plethora of flash optimizations and path length improvements that helped with the excellent response time results. And lots more is coming.

The final piece of FUD we made sure was addressed was system fullness – last time we ran the test we didn’t fill up as much as we could have, which prompted the FUD-mongers to say that FAS systems need gigantic amounts of free space to perform. Let’s see what they’ll come up with this time 😉

On to the numbers!

As a refresher, you may want to read past SPC-1 posts here and here, and my performance primer here.

Important note: SPC-1 is a 100% block-based benchmark with its own I/O blend and, as such, the results from any vendor SPC-1 submission should not be compared to marketing IOPS numbers of all reads or metadata-heavy NAS benchmarks like SPEC SFS (which are far easier on systems than the 60% write blend of the SPC-1 workload). Indeed, the tested configuration might perform in the millions of “marketing” IOPS – but that’s decidedly not the point of this benchmark.

The SPC-1 Result links if you want the detail are here (summary) and here (full disclosure). In addition, here’s the link to the “Top 10 Performance” systems page so you can compare other submissions that are in the upper performance echelon (unfortunately, SPC-1 results are normally just alphabetically listed, making it time-consuming to compare systems unless you’re looking at the already sorted Top 10 list).

I recommend you look beyond the initial table in each submission showing the performance and $/SPC-1 IOPS and at least go to the price table to see the detail. The submissions calculate $/SPC-1 IOPS based on submitted price but not all vendors use discounted pricing. You may want to do your own price/performance calculations.

The things to look for in SPC-1 submissions

Typically you’re looking for the following things to make sense of an SPC-1 submission:

  • ART vs IOPS – many submissions will show high IOPS at huge ART, which would be rather useless when it comes to Flash storage
  • Sustainability – was performance even or are there constant huge spikes?
  • RAID level – most submissions use RAID10 for speed, what would happen with RAID6?
  • Application Utilization. This one is important yet glossed over. It signifies how much capacity the benchmark consumed vs the overall raw capacity of the system, before RAID, spares etc.

Let’s go over these one by one.

ART vs IOPS

Our ART was 1.23ms at 685,281.71 SPC-1 IOPS, and pretty flat over time during the test:

Response_time_complete

Sustainability

The SPC-1 rules state the minimum runtime should be 8 hours. We ran the test for 18 hours to observe if there would be variation in the performance. There was no significant variation:

IOdistributionRamp

RAID level

RAID-DP was used for all FAS8080EX testing. This is mathematically analogous in protection to RAID-6. Given that these systems are typically deployed in very large pooled configurations, we elected long ago to not recommend single parity RAID since it’s simply not safe enough. RAID-10 is fast and fine for smaller capacity SSD systems but, at scale, it gets too expensive for anything but a lab queen (a system that nobody in their right mind will ever buy but which benchmarks well).

Application Utilization

Our Application Utilization was a very high 61.92% – unheard of by other vendors posting SPC-1 results since they use RAID10 which, by definition, wastes half the capacity (plus spares and other overheads to worry about on top of that).

AppUtilization

Some vendors using RAID10 will fill up the resulting space after RAID, spares etc. to a very high degree, and call out the “Protected Application Utilization” as being the key thing to focus on.

This could not be further from the truth – Application Utilization is the only metric that really shows how much of the total possible raw capacity the benchmark actually used and signifies how space-efficient the storage was.

Otherwise, someone could do quadruple mirroring of 100TB, fill up the resulting 25TB to 100%, and call that 100% efficient… when in fact it only consumed 25% 🙂

It is important to note there was no compression or deduplication enabled by any vendor since it is not allowed by the current version of the benchmark.

Compared to other vendors

I wanted to show a comparison between the Top Ten Performance results both in absolute terms and also normalized around 1ms ART.

Here are the Top Ten highest performing systems as of April 22, 2015, with vendor results links if you want to look at things in detail:

FYI, the HP XP 9500 and the Hitachi system above it in the list are the exact same system, HP resells the HDS array as their high-end offering.

I will show columns that explain the results of each vendor around 1ms. Why 1ms and not more or less? Because in the Top Ten SPC-1 performance list, most results show fairly low ART, but some have very high ART, and it’s useful to show performance at that lower ART load point, which is becoming the ART standard for All-Flash systems. 1ms seems to be a good point for multi-function SSD systems (vs simpler, smaller but more speed-optimized architectures like the NetApp EF560).

The way you determine the 1ms ART load point is by looking at the table that shows ART vs SPC-1 IOPS. Let’s pick IBM’s 780 since it has a very interesting curve so you learn what to look for.

From page 5 of the IBM Power Server 780 SPC-1 Executive Summary:

IBM780

IBM’s submitted SPC-1 IOPS are high but at a huge ART number for an all-SSD solution (18.90ms). Not very useful for customers picking an all-SSD system. Even the next load point, with an average ART of 6.41ms, is high for an all-flash solution.

To more accurately compare this to the rest of the vendors with decent ART, you need to look at the table to find the closest load point around 1ms (which, in this case, it’s the 10% load point at 0.71ms – the next one up is much higher at 2.65ms).

You can do a similar exercise for the rest, it’s worth a look – I don’t want to paste all these tables and graphs since this post will get too big. But it’s interesting to see how SPC-1 IOPS vs ART are related and translate that to your business requirements for application latency.

Here’s the table with the current Top Ten SPC-1 Performance results as of 4/22/2015. Click on it for a clearer picture, there’s a lot going on.

8080_chart_spc1

Key for the chart (the non-obvious parts anyway):

  • The “SPC-1 Load Level near 1ms” is the load point in each SPC-1 Report that corresponds to the SPC-1 IOPS achieved near 1ms. This is not how busy each array was (I see this misinterpreted all the time).
  • The “Total ASU Capacity” is the amount of capacity the test consumed.
  • The “Physical Storage Capacity” is the total amount of capacity in the array before RAID etc.
  • “Application Utilization” is ASU Capacity divided by Physical Storage Capacity.

What do the results show?

Predictably, all-flash systems trump disk-based and hybrid systems for performance and can offer very nice $/SPC-1 IOPS numbers. That is the major allure of flash – high performance density.

Some takeaways from the comparison:

  • Based on SPC-1 IOPs around 1ms Average Response Time load points, the FAS8080 EX shifts from 5th place to 3rd
  • The other vendors used RAID10 – NetApp used RAID-DP (similar to RAID6 in protection). What would happen to their results if they switched to RAID6 to provide a similar level of protection and efficiency?
  • Aside from the NetApp FAS result, the rest of the Top Ten Performance submissions offer vastly lower Application Utilization – about half! Which means that NetApp is able to use 2x the capacity vs raw compared to the other submissions. And that’s before starting to count the possible storage efficiencies we can turn on like dedupe and compression.

How does one pick a flash array?

It depends. What are you trying to do? Solve a tactical problem? Just need a lot of extra speed and far lower latency for some workloads? No need for the array to have a ton of functionality? A lot of the data management happens in the application? Need something cost-effective, simple yet reliable? Then an all-flash system like the NetApp EF560 is a solid answer, and it can still be front-ended by a Clustered Data ONTAP system to provide more functionality if the need arises in the future (we are firm believers in hardware reuse and investment protection – you see, some companies talk about Software Defined Storage, we do Software Defined Storage).

On the other hand, if you would prefer an Enterprise architecture that can serve as the cornerstone of your datacenter for almost any workload and protocol, offers rich data management functionality and tight application integration, insane scalability, non-disruptive everything and offers the most features (reliably) compared to any other platform – then the FAS line running Clustered Data ONTAP is the only possible answer.

Couple that with OnCommand Insight – the best multivendor fabric management tool on the planet – plus Workflow Automation, and we’ve got you covered.

In summary – the all-flash FAS8080EX gets a pretty amazing performance and efficiency SPC-1 result, especially given the extensive portfolio of functionality it offers. In my opinion, no competitor system offers the sheer functionality the FAS8080 does – not even close. Additionally, I believe that certain competitors have very questionable viability and/or tiny market penetration, making them a risky proposition for a high end system purchase.

Thx

D

Technorati Tags: , , , , , , , ,

]]>
http://recoverymonkey.org/2015/04/22/netapp-posts-spc-1-top-ten-performance-results-for-its-high-end-systems-tier-1-meets-high-functionality-and-high-performance/feed/ 15 653
Marketing fun: NetApp industry first of up to 13 million IOPS in a single rack http://recoverymonkey.org/2015/02/18/marketing-fun-netapp-industry-first-of-up-to-13-million-iops-in-a-single-rack/ http://recoverymonkey.org/2015/02/18/marketing-fun-netapp-industry-first-of-up-to-13-million-iops-in-a-single-rack/#comments Thu, 19 Feb 2015 00:24:06 +0000 http://recoverymonkey.org/?p=686 Continue reading "Marketing fun: NetApp industry first of up to 13 million IOPS in a single rack"]]> I’m seeing some really “out there” marketing lately, every vendor (including us) trying to find an angle that sounds exciting while not being an outright lie (most of the time).

A competitor recently claimed an industry first of up to 1.7 million (undefined type) IOPS in a single rack.

The number (which admittedly sounds solid), got me thinking. Was the “industry first” that nobody else did up to 1.7 million IOPS in a single rack?

Would that statement also be true if someone else did up to 5 million IOPS in a rack?

I think that, in the world of marketing, it would – since the faster vendor doesn’t do up to 1.7 million IOPS in a rack, they do up to 5! It’s all about standing out in some way.

Well – let’s have some fun.

I can stuff 21x EF560 systems in a single rack.

Each of those systems can do 650,000 random 4K reads at a stable 800 microseconds (since I like defining my performance stats), 600,000 random 8K reads at under 1ms, and over 300,000 random 32KB reads at under 1ms. Also each system can do 12GB/s of large block sequential reads. This is sustained I/O straight from the SSDs and not RAM cache (the I/O from cache can of course be higher but let’s not count that).

See here for the document showing some of the performance numbers.

Well – some simple math shows a standard 42U rack fully populated with EF560 will do the following:

  • 13,650,000 IOPS.
  • 252GB/s throughput.
  • Up to 548TB of usable SSD capacity using DDP protection (up to 639TB with RAID5).

Not half bad.

Doesn’t quite roll off the tongue though – industry first of up to thirteen million six hundred and fifty thousand IOPS in a single rack. 🙂

I hope rounding down to 13 million is OK with everyone.

 

D

Technorati Tags: , , ,

]]>
http://recoverymonkey.org/2015/02/18/marketing-fun-netapp-industry-first-of-up-to-13-million-iops-in-a-single-rack/feed/ 7 686
NetApp Posts Top Ten SPC-1 Price-Performance Results for the new EF560 All-Flash Array http://recoverymonkey.org/2015/01/27/netapp-posts-top-ten-spc-1-price-performance-results-for-the-new-ef560-all-flash-array/ http://recoverymonkey.org/2015/01/27/netapp-posts-top-ten-spc-1-price-performance-results-for-the-new-ef560-all-flash-array/#comments Tue, 27 Jan 2015 10:55:50 +0000 http://recoverymonkey.org/?p=669 Continue reading "NetApp Posts Top Ten SPC-1 Price-Performance Results for the new EF560 All-Flash Array"]]> <edit: updated with the changes in the SPC-1 price/performance lineup as of 3/27/2015, fixed some typos>

I’m happy to announce that today we announced the new, third-gen EF560 all-flash array, and also posted SPC-1 results showing the impressive performance it is capable of in this extremely difficult benchmark.

If you have no time to read further – the EF560 achieves, by far, the absolute best price/performance at very low latencies in the SPC-1 benchmark.

The EF line has been enjoying great success for some time now with huge installations in some of the biggest companies in the world with the highest profile applications (as in, things most of us use daily).

The EF560 is the latest all-flash variant of the E-Series family, optimized for very low latency and high performance workloads while ensuring high reliability, cost effectiveness and simplicity.

EF560 highlights

The EF560 runs SANtricity – a lean, heavily optimized storage OS with an impressively short path length (the overhead imposed by the storage OS itself to all data going through the system). In the case of the EF the path length is tiny, around 30 microseconds. Most other storage arrays have a much longer path length as a result of more features and/or coding inefficiencies.

Keeping the path length this impressively short is one of the reasons the EF does away with fashionable All-Flash features like compression and deduplication –  make no mistake, no array that performs those functions is able to sustain that impressively short a path length. There’s just too much in the way. If you really want data reduction and an incredible number of features, we offer that in the FAS line – but the path length naturally isn’t as short as the EF560’s.

A result of the short path length is impressively low latency while maintaining high IOPS with a very reasonable configuration, as you will see further in the article.

Some other EF560 features:

  • No write cliff due to SSD aging or fullness
  • No performance impact due to SSD garbage collection
  • Enterprise components – including SSDs
  • Six-nines available
  • Up to 120x 1.6TB SSDs per system (135TB usable with DDP protection, even more with RAID5/6)
  • High throughput – 12GB/s reads, 8GB/s writes per system (many people forget that DB workloads need not just low latency and high IOPS but also high throughput for certain operations).
  • All software is included in the system price, apart from encryption
  • The system can do snaps and replication, including fully synchronous replication
  • Consistency Group support
  • Several application plug-ins
  • There are no NAS capabilities but instead there is a plethora of block connectivity options: FC, iSCSI, SAS, InfiniBand
  • The usual suspects of RAID types – 5, 10, 6 plus…
  • DDP – Dynamic Disk Pools, a type of declustered RAID6 implementation that performs RAID at the sub-disk level – very handy for large pools, rapid disk rebuilds with minimal performance impact and overall increased flexibility (for instance, you could add a single disk to the system instead of entire RAID groups’ worth)
  • T10-PI to help protect against insidious data corruption that might bypass RAID and normal checksums, and provide end-to-end protection, from the application all the way to the storage device
  • Can also be part of a Clustered Data ONTAP system using the FlexArray license on FAS.

The point of All-Flash Arrays

Going back to the short path length and low latency discussion…

Flash has been a disruptive technology because, if used properly, it allows an unprecedented performance density, at increasingly reasonable costs.

The users of All-Flash Arrays typically fall in two camps:

  1. Users that want lots of features, data reduction algorithms, good but not deterministic performance and not crazy low latencies – 1-2ms is considered sufficient for this use case (with the occasional latency spike), as it is better than hybrid arrays and way better than all-disk systems.
  2. Users that need the absolute lowest possible latency (starting in the microseconds – and definitely less than 1ms worst-case) while maintaining uncompromising reliability for their applications, and are willing to give up certain features to get that kind of performance. The performance for this type of user needs to be deterministic, without weird latency spikes, ever.

The low latency camp typically uses certain applications that need very low latency to generate more revenue. Every microsecond counts, while failures would typically mean significant revenue loss (to the point of making the cost of the storage seem like pocket change).

Some of you may be reading this and be thinking “so what, 1ms to 2ms is a tiny difference, it’s all awesome”. Well – at that level of the game, 2ms is twice the latency of 1ms, and it is a very big deal indeed. For the people that need low latency, a 1ms latency array is half the speed of a 500 microsecond array, even if both do the same IOPS.

You may also be thinking “SSDs that fit in a server’s PCI slot have low latency, right?”

The answer is yes, but what’s missing is the reliability a full-fledged array brings. If the server dies, access is lost. If the card dies, all is lost.

So, when looking for an All-Flash Array, think about what type of flash user you are. What your business actually needs. That will help shape your decisions.

All-Flash Array background operations can affect latency

The more complex All-Flash Arrays have additional capabilities compared to the ultra-low-latency gang, but also have a higher likelihood of producing relatively uneven latency under heavy load while full, and even latency spikes (besides their naturally higher latency due to the longer path length).

For instance, things like cleanup operations, various kinds of background processing that kicks off at different times, and different ways of dealing with I/O depending on how full the array is, can all cause undesirable latency spikes and overall uneven latency. It’s normal for such architectures, but may be unacceptable for certain applications.

Notably, the EF560 doesn’t suffer from such issues. We have been beating competitors in difficult performance situations with the slower predecessors of the EF560, and we will keep doing it with the new, faster system 🙂

Enough already, show me the numbers!

As a refresher, you may want to read past SPC-1 posts here and here, and my performance primer here.

Important note: SPC-1 is a block-based benchmark with its own I/O blend and, as such, the results from any vendor’s SPC-1 Result should not be compared to marketing IOPS numbers of all reads or metadata-heavy NAS benchmarks like SPEC SFS (which are far easier on systems than the 60% write blend and hotspots of the SPC-1 workload). Indeed, the tested configuration could perform way more “marketing” IOPS – but that’s decidedly not the point of this benchmark.

The EF560 SPC-1 Result links if you want the detail are here (summary) and here (full disclosure). In addition, here’s the link to the “Top 10 by Price-Performance” systems page so you can compare to other submissions (unfortunately, SPC-1 results are normally just alphabetically listed, making it time-consuming to compare systems unless you’re looking at the already sorted Top 10 lists).

The things to look for in SPC-1 submissions

Typically you’re looking for the following things to make sense of an SPC-1 submission:

  • Latency vs IOPS – many submissions will show high IOPS at huge latency, which would be rather useless for the low-latency crowd
  • Sustainability – was performance even or are there constant huge spikes?
  • RAID level – most submissions use RAID10 for speed, what would happen with RAID6?
  • Application Utilization. This one is important yet glossed over. It signifies how much capacity the benchmark consumed vs the overall raw capacity of the system, before RAID, spares etc.
  • Price – discounted or list?

Let’s go over these one by one.

Latency vs IOPS

Our average latency was 0.93ms at 245,011.76 SPC-1 IOPS, and extremely flat during the test:

EF560distrib

Sustainability

The SPC-1 rules state the minimum runtime should be 8 hours. There was no significant variation in performance during the test:

EF560distramp

RAID level

RAID-10 was used for all testing, with T10-PI Data Assurance enabled (which has a performance penalty but the applications these systems are used for typically need paranoid data integrity). This system would perform slower with RAID5 or RAID6. But for applications where the absolute lowest latency is important, RAID10 is a safe bet, especially with systems that are not write-optimized for RAID6 writes like Data ONTAP is. Not to fret though – the price/performance remained stellar as you will see.

Application Utilization

Our Application Utilization was a very high 46.90% – among the highest of any submission with RAID10 (and among the highest overall, only Data ONTAP submissions can go higher due to RAID-DP).

EF560CapUtil

We did almost completely fill up the resulting RAID10 space, to show that the system’s performance is unaffected when very full. However, Application Utilization is the only metric that really shows how much of the total possible raw capacity the benchmark actually used and signifies how space-efficient the storage was.

Otherwise, someone could do quadruple mirroring of 100TB, fill up the resulting 25TB to 100%, and call that 100% efficient… when in fact it only consumed 25% 🙂

It is important to note there was no compression or deduplication enabled by any vendor since it is not allowed by the current version of the benchmark.

Compared to other vendors

I wanted to show a comparison between the SPC-1 Top Ten Price-Performance results both in absolute terms and also normalized around 500 microsecond latency to illustrate the fact that very low latency with great performance is still possible at a compelling price point with this solution.

Why 500 microseconds you might ask? Because that’s a good place for very low latency flash storage systems. Why not 1 millisecond you might also ask? Well, 1ms is more commonly found on systems that have more features and don’t concentrate on low latency as much (1ms is half the speed of 500 microseconds).

Here are the Top Ten Price-Performance systems as of March 27, 2015, with SPC-1 Results links if you want to look at things in detail:

  1. X-IO ISE 820 G3 All Flash Array
  2. Dell Storage SC4020 (6 SSDs)
  3. NetApp EF560 Storage System
  4. Huawei OceanStor Dorado2100 G2
  5. HP 3PAR StoreServ 7400 Storage System
  6. FUJITSU ETERNUS DX200 S3
  7. Kaminario K2 (28 nodes)
  8. Huawei OCEANSTOR Dorado 5100
  9. Huawei OCEANSTOR Dorado 2100
  10. FUJITSU ETERNUS DX100 S3

I will show columns that explain the results of each vendor around 500 microseconds, plus how changing the latency target affects SPC-1 IOPS and also how it affects $/SPC1-IOPS.

The way you determine that lower latency point (SPC calls it “Average Response Time“) is by looking at the graph that shows latency vs SPC-1 IOPS and finding the load point closest to 500 microseconds. Let’s pick Kaminario’s K2 so you learn what to look for:

K2curve

Notice how the SPC-1 IOPS around half a millisecond is about 10x slower than the performance around 3ms latency. The system picks up after that very rapidly, but if your requirements are for latency to not exceed 500 microseconds, you will be better off spending your money elsewhere (indeed, a very high profile client asked us for 400 microsecond max response at the host level from the first-gen EF systems for their Oracle DBs – this is actually very realistic for many market segments).

Here’s the table with all this analysis done for you. BTW, the “adjusted latency” $/SPC-1 IOPS is not something in the SPC-1 Reports but simply calculated for our example by dividing system price by the SPC-1 IOPS found at the 500 microsecond point in all the reports.

What do the results show?

As submitted, the EF560 is #3 in the absolute Price-Performance ranking. Interestingly, once adjusted for latency around 500 microseconds at list prices (to keep a level playing field), the price/performance of the EF560 is far better than anything else on the chart.

Regarding pricing: Note that some vendors have discounted pricing and some not, always check the SPC-1 report for the prices and don’t just read the summary at the beginning (for example, Fujitsu has 30% discounts showing in the reports, Dell, X-IO and HP all at 45% off – the rest aren’t discounted).

Our price-performance is even better once you adjust for discounts in some of the other results. Update: In this edited version of the chart I show the list price calculations as well. We are #1 in price/performance when adjusted for list pricing even at the higher submitted latencies for all vendors… 🙂

Another interesting observation is the effects of longer path length on some platforms – for instance, Dell’s lowest reported latency is 0.70ms at a mere 11,249.97 SPC-1 IOPS. Clearly, that is not a system geared towards high performance at very low latency. In addition, the response time for the submitted max SPC-1 IOPS for the Dell system is 4.83ms, firmly in the “nobody cares” category for all-flash systems 🙂 (sorry guys).

Conversely… The LRT (Least Response Time) we submitted for the EF560 was a tiny 0.18ms (180 microseconds) at 24,501.04 SPC-1 IOPS. This is the lowest LRT anyone has ever posted on any array for the SPC-1 benchmark.

Clearly we are doing something right 🙂

Final thoughts

If your storage needs require very low latency coupled with very high reliability, the EF560 would be an ideal candidate. In addition, the footprint of the system is extremely compact, the SPC-1 results shown are with just a 2U EF560 with 24x 400GB SSDs.

Coupled with Clustered Data ONTAP systems and OnCommand Insight and WorkFlow Automation, NetApp has an incredible portfolio, able to take on any challenge.

Thx

D

Technorati Tags: , , , ,

]]>
http://recoverymonkey.org/2015/01/27/netapp-posts-top-ten-spc-1-price-performance-results-for-the-new-ef560-all-flash-array/feed/ 9 669
Beware of storage performance guarantees http://recoverymonkey.org/2014/12/23/beware-of-storage-performance-guarantees/ http://recoverymonkey.org/2014/12/23/beware-of-storage-performance-guarantees/#respond Wed, 24 Dec 2014 01:23:23 +0000 http://recoverymonkey.org/?p=640 Continue reading "Beware of storage performance guarantees"]]> Ah, nothing to bring joy to the holidays like a bit of good old-fashioned sales craziness.

Recently we started seeing weird performance “guarantees” by some storage vendors, who seem will try anything for a sale.

Probably by people that haven’t read this.

It goes a bit like this:

“Mr. Customer, we guarantee our storage will do 100,000 IOPS no matter the I/O size and workload”.

Next time a vendor pulls this, show them the following chart. It’s a simple plot of I/O size vs throughput for 100,000 IOPS:

Throughput IO Size

Notice that at a 1MB I/O size the throughput is a cool 100GB/s 🙂

Then ask that vendor again if they’re sure they still want to make that guarantee. In writing. With severe penalties if it’s not met. As in free gear UNTIL the requirement is met. At any point during the lifetime of the equipment.

Then sit back and enjoy the backpedalling. 

You can make it even more fun, especially if it’s a hybrid storage vendor (mixed spinning and flash storage for caching, with or without autotiering):

  • So you will guarantee those IOPS even if the data is not in cache?
  • For completely random reads spanning the entire pool?
  • For random overwrites? (that should be a fun one, 100GB/s of overwrite activity).
  • For non-zero or at least not crazily compressible data?
  • And what’s the latency for the guarantee? (let’s not forget the big one).
  • etc. You get the point.
 
Happy Holidays everyone!
 
Thx
 
D
 

 

]]>
http://recoverymonkey.org/2014/12/23/beware-of-storage-performance-guarantees/feed/ 0 640
When competitors try too hard and miss the point – part two http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/ http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/#comments Thu, 18 Sep 2014 17:25:35 +0000 http://recoverymonkey.org/?p=629 Continue reading "When competitors try too hard and miss the point – part two"]]> This will be another FUD-busting post in the two-part series (first part here).

It’s interesting how some competitors, in their quest to beat us at any cost, set aside all common sense.

Recently, an Oracle blogger attempted to understand a document NetApp originally wrote in the 90’s (and which we haven’t really updated since, which is admittedly our bad) that explains how WAFL, the block layout engine of Data ONTAP (the storage OS on the FAS platform) works at a high level.

Apparently, he thinks that we turn everything into 4K I/Os, so if someone tried to read 256K, it would have to become 64 separate I/Os, and, by extension, believes this means no NetApp system running ONTAP can ever sustain good read throughput since the back-end would be inundated with IOPS.

The conclusions he comes to are interesting to say the least. I will copy-paste one of the calculations he makes for a 100% read workload:

Erroneous oracle calcs

I like the SAS logo, I guess this is meant to make the numbers look legit, as if they came from actual SAS testing 🙂

So this person truly believes that to read 2.6GB/s we need 5,120 drives due to the insane back-end IOPS we purportedly generate 🙂

This would be hilarious if it were true since it would mean NetApp managed to quietly perpetrate the biggest high tech scam in history, fooling customers for 22 years, and somehow managing to become the industry’s #1 storage OS and remain so.

Because customers are that gullible.

Right.

Well – here are some stats from a single 8040 controller (not an HA system with at least 2 controllers, I really mean a single controller doing work, not two or more), with 24 drives, doing over 2.7GB/s reads, at well under 1ms latency, so it’s not even stressed. Thanks to the Australian team for providing the stats:

8040 singlenode

In this example, 2.74GB/s are being read. From stable storage, not cache.

Now, if we do the math the way the competitor would like, it means the back-end is running at over 700,000 4K IOPS. On a single mid-range controller 🙂

That would be really impressive and hugely wasteful at the same time. Wait – maybe I should turn this around and claim 700,000 4K IOPS at 0.6ms capability per mid-range controller! Imagine how fast the big ones go!

It would also assume 35,000 IOPS per disk at a consistent speed and sub-millisecond response (0.64ms) – because the numbers above are from a single node with only about 20 data SSDs (plus parity and spares).

SSDs are fast but they’re not really that fast, and the purpose of this blog is to illuminate and not obfuscate.

Remember Occam’s razor. What explanation do you think makes more sense here? Pixie-dust drives and controllers, or that the Oracle blogger is massively wrong? 🙂

Another example – with spinning disks this time

This is a different output, to also illustrate our ability to provide detailed per-disk statistics.

From a single 8060 node, running at over 3GB/s reads during an actual RMAN job and not a benchmark tool (to use a real Oracle application example). There are 192x 10,000 RPM 600GB disks in the config (180x data, 24x parity – we run dual-parity RAID, there were 12x 16-drive RAID groups in a 14+2 config).

Numbers kindly provided by the legendary neto from Brazil (@netofrombrazil on Twitter). Check the link for his blog and all kinds of DB coolness.

This is part of the statit command’s output. I’m not showing all the disks since there are 192 of them after all and each one is a line in the output:

Read chain

The key in these stats is the “chain” column. This shows, per read command, how many blocks were read as a single entity. In this case, the average is about 49, or 196KB per read operation.

Notice the “xfers” – these drives are only doing about 88 physical IOPS on average per drive, and each operation just happens to be large. They could go faster (see the “ut%” column) but that’s just how much they were loaded during the RMAN job.

Again, if we used the blogger’s calculations, this system would have needed over 5,000 drives and generated over 750,000 back-end disk IOPS.

A public apology and retraction would be nice, guys…

Let’s extrapolate this performance at scale.

My examples are for single mid-range controllers. You can multiply that by 24 to see how fast it could go in a full cluster (yes, it’s linear). And that’s not the max these systems will do – just what was in the examples I found that were close to the competitor’s read performance example.

You see, where most of the competition is still dealing with 2-controller systems, NetApp FAS systems running Clustered ONTAP can run 8 engines for block workloads and 24 engines for NAS (8 if mixed), and each engine can have multiple TB of read/write cache (18TB max cache per node currently with ONTAP 8.2.x).

Even if a competitor’s 2 engines are faster than 2 FAS engines, if they stop at 2 and FAS stops at 24, the fight is over before it begins.

People that live in glass houses shouldn’t throw stones.

Since the competitor questioned why NetApp bought Engenio (the acquisition for our E-Series), I have a similar question: Why did Oracle buy Pillar Data? It was purchased after the Sun acquisition. Does that signify a major lack in the ZFS boxes that Pillar is supposed to address?

The Oracle blogger mentioned how their ZFS system had a great score in the SPC-2 tests (which measure throughput and not IOPS). Great.

Interestingly, Oracle ZFS systems can significantly degrade in performance over time (see here http://blog.delphix.com/uday/2013/02/19/78/) especially after writes, deletes and overwrites. Unlike ONTAP systems, ZFS boxes don’t have mechanisms to perform the necessary block reallocations to optimize the data layout in order to bring performance back to original levels (backing up, wiping the box, rebuilding and restoring is not a solution, sorry). There are ways to delay the inevitable, but nothing to fix the core issue.

It follows that the ZFS performance posted in the benchmarks may not be anywhere near what one will get long-term once the ZFS pools are fragmented and full. Making the ZFS SPC-2 benchmark result pretty useless.

NetApp E-Series inherently doesn’t have this fragmentation problem (and is near the top as a price-performance leader in the SPC-2 benchmark, as tested by SGI that resells it). Since there is no long-term speed deterioration issue with E-Series, the throughput you see in the SPC-2 benchmark will be perpetually maintained. The box is in it for the long haul.

Wouldn’t E-Series then be a better choice for a system that needs to constantly deal with such a workload? Both cost-effective and able to sustain high throughput no matter what?

As an aside, I do need to write an article on block layout optimizations available in ONTAP. Many customers are unaware of the possibilities, and competitors use FUD based on observations from back when mud was a novelty. In the meantime, if you’re a NetApp FAS customer, ask your SE and/or check your documentation for the volume option read_realloc space_optimized – great for volumes containing DB data files. Also, check the documentation for the Aggregate option free_space_realloc.

So you’re fast. What else can you do?

There were other “fighting words” in the blogger’s article and they were all about speed and how much faster the new boxes from the competitor are versus some ancient boxes they had from us. Amazing, new controllers being faster than old ones! 🙂

I see this trend recently, new vendors focusing solely on speed. Guess what – it’s easy to go fast. It’s also easy to be cheap. I’ll save that for a full post another time. But I fully accept that speed sells.

I can build you a commodity-based million-IOPS box during my lunch break. It’s really not that hard. Building a server with dozens of cores and TB of RAM is pretty easy.

But for Enterprise Storage, Reliability is extremely important, far more than sheer speed.

Plus Availability and Serviceability (where the RAS acronym comes from).

Predictability.

Non-Disruptive Operations, even during events that would leave other systems down for extended periods of time.

Extensive automation, management, monitoring and alerting at scale as well.

And of crucial importance is Application Integration, including the ability to perform application-aware data manipulation (fully consistent backups, restores, clones, replication).

So if a system can go fast but can’t do much else, its utility is more towards being a point solution rather than as part of a large, strategic, long-term deployment. Point solutions are useful, yes – but they are also interchangeable with the next cheap fast thing. Most won’t survive.

You know who you are.

D

Technorati Tags: , , , , , , ,

]]>
http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/feed/ 3 629
When competitors try too hard and miss the point http://recoverymonkey.org/2014/09/09/when-competitors-try-too-hard-and-miss-the-point/ http://recoverymonkey.org/2014/09/09/when-competitors-try-too-hard-and-miss-the-point/#comments Tue, 09 Sep 2014 20:07:38 +0000 http://recoverymonkey.org/?p=611 Continue reading "When competitors try too hard and miss the point"]]> (edit: fixed the images)

After a long hiatus, we return to our regularly scheduled programming with a 2-part series that will address some wild claims Oracle has been making recently.

I’m pleased to introduce Jeffrey Steiner, ex-Oracle employee and all-around DB performance wizard. He helps some of our largest customers with designing high performance solutions for Oracle DBs:

Greetings from a guest-blogger.

I’m one of the original NetApp customers.

I bought my first NetApp in 1995 (I have a 3-digit support case in the system) and it was an F330. I think it came with 512MB SCSI drives, and maxed out at 16GB. It met our performance needs, it was reliable, and it was cost effective.  I continued to buy more over the following years at other employers. We must have been close to the first company to run Oracle databases on NetApp storage. It was late 1999. Again, it met our performance needs, it was reliable, and it was cost effective. My employer immediately prior to joining NetApp was Oracle.

I’m now with NetApp product operations as the principal architect for enterprise solutions, which usually means a big Oracle database is involved, but it can also include DB2, SAS, MongoDB, and others.

I normally ignore competitive blogs, and I had never commented on a blog in my life until I ran into something entitled “Why your NetApp is so slow…” and found this statement:

If an application such MS SQL is writing data in a 64k chunk then before Netapp actually writes it on disk it will have to split it into 16 different 4k writes and 16 different disk IOPS

That’s just openly false. I tried to correct the poster, but was met with nothing but other unsubstantiated claims and insults to the product line. It was clear the blogger wasn’t going to acknowledge their false premise, so I asked Dimitris if I could borrow some time on his blog.

Here’s one of the alleged results of this behavior with ONTAP– the blogger was nice enough to do this calculation for a system reading at 2.6GB/s:

 

erroneous_oracle_calcs


Seriously?

I’m not sure how to interpret this. Are they saying that this alleged horrible, awful design flaw in ONTAP leads to customers buying 50X more drives than required, and our evil sales teams have somehow fooled our customer based into believing this was necessary? Or, is this a claim that ZFS arrays have some kind of amazing ability to use 50X fewer drives?

Given the false premise about ONTAP chopping up any and all IO’s into little 4K blocks and spraying them over the drives, I’m guessing readers are supposed to believe the first interpretation.

Ordinarily, I enjoy this type of marketing. Customers bring this to our attention, and it allows us to explain how things actually work, plus it discredits the account team who provided the information. There was a rep in the UK who used to tell his customers that Oracle had replaced all competing storage arrays in their OnDemand centers with Pillar. I liked it when he said stuff like that. The reason I’m responding is not because I care about the existence of the other blog, but rather that I care about openly false information being spread about how ONTAP works.

How does ONTAP really work?

Some of NetApp’s marketing folks might not like this, but here’s my usual response:

Why does it matter?

It’s an interesting subject, and I’m happy to explain write tetrises and NVMEM write coalescence, and core utilization, but what does that have to do with your business? There was a time we dealt with accusations that NetApp was slow because we has 25 nanometer process CPU’s while the state of the art was 17nm or something like that. These days ‘cores’ seems to come up a lot, as if this happens:

logic

That’s the Brawndo approach to storage sales (https://www.youtube.com/watch?v=Tbxq0IDqD04)

“Our storage arrays contain

5 kinds of technology

which make them AWESOME

unlike other storage arrays which are

NOT AWESOME.”

A Better Way

I prefer to promote our products based on real business needs. I phrase this especially bluntly when talking to our sales force:

When you are working with a new enterprise customer, shut up about NetApp for at least the first 45 minutes of the discussion

I say that all the time. Not everyone understands it. If you charge into a situation saying, “NetApp is AWESOME, unlike EMC who is NOT AWESOME” the whole conversation turns into PowerPoint wars, links to silly blog articles like the one that prompted this discussion, and whoever wins the deal will win it based on a combination of luck and speaking ability. Providing value will become secondary.

I’m usually working in engineeringland, but in major deals I get involved directly. Let’s say we have a customer with a database performance issue and they’re looking for new storage. I avoid PowerPoint and usually request Oracle AWR/statspack data. That allows me to size a solution with extreme accuracy. I know exactly what the customer needs, I know their performance bottlenecks, and I know whatever solution I propose will meet their requirements. That reduces risk on both sides. It also reduces costs because I won’t be proposing unnecessary capabilities.

None of this has anything to do with who’s got the better SPC-2 benchmark, unless you plan on buying that exact hardware described, configuring it exactly the same way, and then you somehow make money based on running SPC-2 all day.

Here’s an actual Oracle AWR report from a real customer using NetApp. I have pruned the non-storage related parameters to make it easier to read, and I have anonymized the identifying data. This is a major international insurance company calculating its balance sheet at end-of-month. I know of at least 9 or 10 customers that have similar workloads and configurations.

awr1

Look at the line that says “Physical reads”. That’s the blocks read per second. Now look at “Std Block Size”. That’s the block size. This is 90K physical block reads per second, which is 90K IOPS in a sense. The IO is predominantly db_file_scattered_read, which counter-intuitively is sequential IO. A parameter called db_file_multiblock_read_count is set to 128. This means Oracle is attempting to read 128 blocks at a time, which equates to 1MB block sizes. It’s a sequential IO read of a file.

Here’s what we got:

1)     89K read “IOPS”, sort of.

2)     Those 89K read IOPS are actually packaged as units of 8 blocks in a single 64k unit.

3)     3K write IOPS

4)     8MB/sec of redo logging.

The most important point here is that the customer needed about 800MB/sec of throughput, they liked the cost savings of IP, and the storage system is meeting their needs. They refresh with NetApp on occasion, so obviouly they’re happy with the TCO.

To put a final nail in the coffin of the Oracle blogger’s argument, if we are really doing 89K block reads/sec, and those blocks are really chopped up into 4k units, that’s a total of about 180,000 4k IOPS that would need to be serviced at the disk layer, per the blogger’s calculation.

  • Our opposing blogger thinks that  would require about 1000 disks in theory
  • This customer is using 132 drives in a real production system.

There’s also a ton of other data on those drives for other workloads. That’s why we have QoS – it allows mixed workloads to play nicely on a single unified system.

To make this even more interesting, the data would have been randomly written in 8k units, yet they are still able to read at 800MB/sec? How is this possible? For one, ONTAP does NOT break up individual IO’s into 4k units. It tries very, very hard to never break up an IO across disks, although that can happen on occasion, notably if you fill you system up to 99% capacity or do something very much against best practices.

The main reason ONTAP can provide good sequential performance with randomly written data is the blocks are organized contiguously on disk. Strictly speaking, there is a sort of ‘fragmentation’ as our competitors like to say, but it’s not like randomly spraying data everywhere. It’s more like large contiguous chunks of data are evenly distributed across the disks. As long as those contiguous segments are sufficiently large, readahead can ensure good throughput.

That’s somewhat of an oversimplification, but it would take a couple hours and a whiteboard to explain the complete details. 20+ years of engineering can’t exactly be summarized in a couple paragraphs. The document misrepresented by the original blog was clearly dated 2006 (and that was to slightly refresh the original posting back in the nineties), and while it’s still correct as far as I can see, it’s also lacking information on the enhancements and how we package data onto disks.

By the way, this database mentioned above? It’s virtualized in VMware too.

Why did I pick an example of only 90K IOPS?  My point was this customer needed 90K IOPS, so they bought 90K IOPS.

If you need this performance:

awr2

then let us know. Not a problem. This is from a large SAP environment for a manufacturing company. It beats me what they’re doing, because this is about 10X more IO than what we typically see for this type of SAP application. Maybe they just built a really, really good network that permits this level of IO performance even though they don’t need it.

In any case, that’s 201,734 blocks/sec using a block size of 8k. That’s about 2GB/sec, and it’s from a dual-controller FAS3220 configuration which is rather old (and was the smallest box in its range when it was new).

Sing the bizarro-universe math from the other blog, these 200K IOPS would have been chopped up into 4k blocks and require a total of 400K back-end disk IOPS to service the workload. Divided by 125 IOPS/drive, we have a requirement for 3200 drives. It was ACTUALLY using more like 200 drives.

We can do a lot more, especially with the newer platforms and ONTAP clustering, which would enable up to 24 controllers in the storage cluster. So the performance limits per cluster are staggeringly high.

Futures

To put a really interesting (and practical) twist on this, sequential IO in the Oracle realm is probably going to become less important.  You know why? Oracle’s new in-memory feature. Me and several others were floored when we got the first debrief on Oracle In-Memory. I couldn’t have asked for a better implementation if I was in charge of Oracle engineering myself. Folks at NetApp started asking what this means for us, and here’s my list:

  1. Oracle customers will be spending less on storage.

That’s it. That’s my list. The data format on disk remains unchanged, the backup/restore process is the same, the data commitment process is the same. All the NetApp features that earned us around 12,500 Oracle customers are still applicable.

The difference is customers will need smaller controllers, fewer disks, and less bandwidth because they’ll be able to replace a lot of the brute-force full table scan activity with a little In-Memory magic. No, the In-Memory licenses aren’t free, but the benefits will be substantial.

SPC-2 Benchmarks and Engenio Purchases

The other blog demanded two additional answers:

1)     Why hasn’t NetApp done an SPC-2 bencharmk?

2)     Why did NetApp purchase Engenio?

SPC-2

I personally don’t know why we haven’t done an SPC-2 benchmark with ONTAP, but they are rather expensive and it’s aimed at large sequential IO processing. That’s not exactly the prime use case for FAS systems, but not because they’re weak on it. I’ve got AWR reports well into the GB/sec, so it certainly can do all the sequential IO you want in the real world, but what workloads are those?

I see little point in using an ONTAP system for most (but certainly not all) such workloads because the features overall aren’t applicable. I’m aware of some VOD applications on ONTAP where replication and backups were important. Overall, if you want that type of workload, you’d specify a minimum bandwidth requirement, capacity requirement, and then evaluate the proposals from vendors. Cost is usually the deciding factor.

Engenio Acquisition

Again, my personal opinion here on why NetApp acquired Engenio.

Tom Georgens, our CEO, spent 9 years leading Engenio and obviously knew the company and its financials well. I can’t think of any possible way to know you’re getting value for money than having someone in Georgens’ position making this decision.

Here’s the press release about it:

Engenio will enable NetApp to address emerging and fast-growing market segments such as video, including full-motion video capture and digital video surveillance, as well as high performance computing applications, such as genomics sequencing and scientific research.

Yup, sounds about right. That’s all about maximum capacity, high throughput, and low cost. In contrast, ONTAP is about manageability and advanced features. Those are aimed at different sets of business drivers.

Hey, check this out. Here’s an SEC filing:

Since the acquisition of the Engenio business in May 2011, NetApp has been offering the formerly-branded Engenio products as NetApp E-Series storage arrays for SAN workloads. Core differentiators of this price-performance leader include enterprise reliability, availability and scalability. Customers choose E-Series for general purpose computing, high-density content repositories, video surveillance, and high performance computing workloads where data is managed by the application and the advanced data management capabilities of Data ONTAP storage operating system are not required.

Key point here is “where the advanced data management capabilities of Data ONTAP are not required.” It also reflected my logic in storage decisions prior to joining NetApp, and it reflects the message I still repeat to account teams:

  1. Is there any particular feature in ONTAP that is useful for your customer’s actual business requirements? Would they like to snapshot something? Do they need asynchronous replication? Archival? SnapLock? Scale-out clusters with many nodes? Non-disruptive everything? Think carefully, and ask lots of questions.
  2. If the answer is “yes”, go with ONTAP.
  3. If the answer is “no”, go with E-Series.

That’s what I did. I probably influenced or approved around $5M in total purchases. It wasn’t huge, but it wasn’t nothing either. I’d guess we went ONTAP about 70% of the time, but I had a lot of IBM DS3K arrays around too, now known as E-Series.

“Dumb Storage”

I’ve annoyed the E-Series team a few times by referring to it as “dumb storage”, but I mean that in the nicest possible way. It’s primary job is to just sit there and work. It needs to do it fast, reliably, and cost effectively, but on a day-to-day basis it’s not generally doing anything all that advanced.

In some ways, the reliability was a weakness. It was so reliable, that we forgot it was there at all, and we’d do something like changing the email server addresses, and forget to update the RAS feature of the E-Series. Without email notification, it can take a couple years before someone notices the LED that indicates a drive needs replacement.

 

]]>
http://recoverymonkey.org/2014/09/09/when-competitors-try-too-hard-and-miss-the-point/feed/ 3 611
Is convenience devaluing products? Does quality suffer because of it? http://recoverymonkey.org/2014/01/24/is-convenience-devaluing-products-does-quality-suffer-because-of-it/ http://recoverymonkey.org/2014/01/24/is-convenience-devaluing-products-does-quality-suffer-because-of-it/#comments Fri, 24 Jan 2014 07:45:02 +0000 http://recoverymonkey.org/?p=607 Continue reading "Is convenience devaluing products? Does quality suffer because of it?"]]> Kind of a long hiatus posting (far too busy working on cool stuff) and for you looking for a deep technical post this may not be it… but here goes anyway since the content may also apply to my more usual subjects.

Recently I decided to discard my Luddite membership card and join the hordes of people using network-based services for music.

The experiment is ongoing – I do like the convenience of being able to select almost any song or album for the monthly equivalent cost of less than what an album is worth.

It’s a pretty good deal if you listen to a lot of new music, and/or you don’t like listening to ads on the radio.

There’s a plethora of free offerings but if you are mobile and want to use it on your phone, there’s usually a cost involved to have the convenience of selecting the exact songs you like.

How convenience has affected me

I did notice several interesting aspects in which this newfound convenience has changed my listening habits in a positive way:

  1. I am discovering a lot more new music since it’s so incredibly easy to do so. And some old music I never gave a chance to.
  2. Sharing music with other people is easy and involves no illegal copying of data.
  3. I don’t have to worry about putting the “right” music in my portable device – I can stream what I want, from wherever I want, even on devices I don’t own, and even designate items for “offline use” – meaning they’ll be cached and playable even if I’m not connected to a network.
  4. I have easy access to most of my oldie favorites that I might normally not keep in my device due to space reasons.
  5. The quality is very good. But we won’t go into psychoacoustics here 🙂

However, there have also been some pretty negative aspects to all this convenience… for instance:

  1. I realize I now suffer from music ADD – I seldom just sit down and listen to a whole album like we all used to do in the olden days.
  2. Albums now have zero monetary value in my mind – they’re just part of the low monthly fee.
  3. If albums have a perceived zero monetary value, they become a commodity and not something to be treasured. I remember when it was a huge deal to get a new album from my favorite artists: the anticipation, the excitement, the trip to the record store, waiting in line, scarcity, the artwork in the packaging, the sheer physicality of it all. This combination of attributes ensured I would at least give that album a chance – indeed, I was likely to listen to it repeatedly, analyze it and appreciate the artistry involved. I was invested.
  4. As a result of this devaluing, amazing works of art that were extremely difficult to accomplish may now be skipped altogether because they may be a bit time-consuming or even difficult to “get into” – some concept albums you just need to be in the right frame of mind for and/or have the requisite amount of time to listen to the story unfold. Since there’s no perceived investment and no excitement, it’s less likely to spend the energy trying to get into the album, no matter how rewarding it may be in the end.
  5. For something more practical: The toll on the mobile devices’ batteries is 2-3x that of just playing music natively (even without streaming – the tracks are encrypted so you can’t just lift them from the storage, which adds CPU cycles to decrypt, plus some products use codecs more computationally intensive than mp3). Best have a device with a fast CPU.
  6. An extended unplanned network or music provider outage will mean no access to music.

How this applies to other aspects of our lives

I wonder now what other conveniences have affected our lives significantly?

And are we all looking for that quick fix, the easy way out?

Are we heading towards the world depicted in the movie Idiocracy? (very interesting flick – it’s worth watching for the premise alone).

Already, most of us in the more “civilized” parts of the globe don’t know how to hunt down and skin an animal, build a weapon, start a fire, build a shelter. That is knowledge that convenience robbed us of many years ago. You can study how to do those things, but chances are, if you’re in need to do so, you won’t have the training to be anywhere near as successful as our ancestors were in those endeavors.

Same goes for taking pictures – aside from a few people that still develop and print their own film, most of us use digital (with the same deluge of information problem described in the music section above – thousands of pictures may now be taken during a vacation, where previously no more than a hundred would, with tremendous love and care – but most of the hundred were keepers).

Many of us are getting heavier, too – convenient access to food and low levels of physical activity (since locomotion is so convenient) being the killer combination.

Does quality suffer because of convenience?

Conveniences aren’t a bad thing overall – I am not hankering for the destruction of all things convenient. However, I posit that certain aspects of quality absolutely suffer because of convenience:

  1. Consumers are more likely to pick an easier to use, throw-away and even short-sighted product over a better-engineered, longer-lasting one – shifting the engineering emphasis instead to ease-of-use and disposability.
  2. The quality of workers in many fields isn’t what it used to be.
  3. We are heading towards more generalists and less specialists.
  4. Troubleshooting is becoming a lost art.

I’m not sure how to even conclude – I’m probably part of the problem since one of the things I do is help make very advanced technology easier to consume and more forgiving.

Just don’t get too comfortable.

Toilet chair

Thx

D

]]>
http://recoverymonkey.org/2014/01/24/is-convenience-devaluing-products-does-quality-suffer-because-of-it/feed/ 1 607
EMC’s VNX2 forklift: The importance of hardware re-use, slowing down obsolescence, and maximizing your investment http://recoverymonkey.org/2013/09/24/emcs-vnx2-forklift-the-importance-of-hardware-re-use-slowing-down-obsolescence-and-maximizing-your-investment/ http://recoverymonkey.org/2013/09/24/emcs-vnx2-forklift-the-importance-of-hardware-re-use-slowing-down-obsolescence-and-maximizing-your-investment/#comments Tue, 24 Sep 2013 15:48:48 +0000 http://recoverymonkey.org/?p=604 Continue reading "EMC’s VNX2 forklift: The importance of hardware re-use, slowing down obsolescence, and maximizing your investment"]]> It was with interest that I watched the launch of EMC’s VNX refresh. The updated boxes got some long-awaited features, EMC talked a lot about how some pretty severe single-threaded bottlenecks were removed, more CPU and memory was put in, and there was much rejoicing.

Not really trying to pick on the new boxes (that will be in a future post, relax :)), but what I thought was interesting was that the code that makes most of the new features a possibility cannot be loaded on current-gen VNX boxes (not even the biggest, the 7500, which has plenty of CPU and RAM juice even compared to the next gen boxes).

Software-defined storage indeed.

The existing VNX disk shelves also seemingly can’t be re-used at the moment (correct me if I’m wrong please).

This forced obsolescence has been a theme with EMC: Clariion -> VNX -> VNX2 are all complete forklift upgrades. When the original VNX was released, it was utterly incompatible with the CX (Clariion) shelves (SAS replaced FC connectivity). Despite using the exact same code (FLARE).

Other vendors are guilty of this too – a new controller is released, and all prior investments on disk shelves are rendered useless (HDS did this with several iterations of AMS, maybe even with AMS -> HUS, HP with EVA…)

I understand that as technology progresses we sometimes have to abandon the old to make room for the new but, at the same time, customers make significant investments in N-1 technology – and often want to be able to re-use some of their investment with N (and sometimes N+1).

I just had this conversation with a customer, and he said “well, I throw away my gear every 3 years, why should I care?

Let’s try a thought experiment.

Imagine you just bought a system that’s running the fastest controllers a company sells today, and you got 1PB of storage behind it.

Now, imagine that a mere month after you purchased your controllers, new ones are released that are significantly faster. OK, that stuff happens.

Your gear is not 3 years old yet. It’s 1 month old. It’s running OK.

Now, imagine that your array runs out of steam 6 months later due to unprecedented performance growth. Your system is now 7 months old. You can’t just throw it away and start fresh.

You could buy a new storage system and migrate some of the data to share the load. However, you don’t need more space – you just ran out of controller headroom. Indeed, you still have tons of free space.

But what if you could replace the controllers with the new, beefier ones? And maintain your investment in the 1PB of storage, cache etc? Wouldn’t that be nice?

Or at least be able to move some of the storage pools you may have to the new family of controllers? Even if you had to reformat the disk?

Well – most vendors will tell you “sorry, no, you need to migrate your data to the new box”.

Let’s try another thought experiment.

You bought a storage system a year ago. It performs fine, but it lacks true deduplication capabilities. You have determined it would save you a lot of storage space (= money) if your array had deduplication.

The vendor you purchased the system from announces the refreshed storage OS that finally includes deduplication. And that same vendor made a truly gigantic fuss about software-defined storage, which made everyone feel software would be the big enabler and that coolness was a mere firmware upgrade away.

However, you are eventually told they will not allow the code that enables deduplication to be loaded to your array, and, instead, they ask you to migrate to the refreshed array that supports deduplication. Since the updated code somehow only runs on the new box. Something to do with unicorn milk.

But your array has plenty of CPU headroom to handle deduplication… and you could reformat the disks given some swing storage shelves if the underlying disk format is the issue. But the option is not provided.

How NetApp does things instead

At NetApp we sort of take it for granted so we don’t make a big fuss about software-defined storage, but hardware was always considered an enabler for the software and not the other way around. 

For instance: deduplication was released as a free software upgrade for Data ONTAP (the OS for our main line of storage). Back in 2007. For all storage protocols.

In general, we try to let systems be able to load at a minimum N+1 software releases, but most of the time we utterly spoil customers and go far above and beyond, unless we’re talking about the smallest boxes, which naturally have less headroom.

For example, the now aging FAS 3070 I have in the local lab (the bigger of the older midrange boxes, released in 2006) supports anything from ONTAP 7.2.1 (what it was released with) to ONTAP 8.1.3 (released in mid-2013).

This spans multiple major ONTAP releases – huge changes in the code have happened during those releases: 7.3, 8.0, 8.1… Multiple newer arrays were also released as replacements for the 3070: 3170, 3270, 3250.

Yet the 3070 soldiers on with a fully supported, modern OS, 7 years later.

What arrays did our competitors have back then? What is the most modern OS those same arrays can run today? What is that OS missing vs the OS that competitor’s more modern arrays have?

Let’s talk disk shelves.

We used FC loop connectivity for the older shelves (DS14). We then switched to fancy multi-channel SAS and totally different shelves and disks, but never stopped supporting the older shelf technology, even with newer controllers.

That’s the big one. I have customers with DS14 shelves that they purchased for a 3070 that they now have on a 3270 running 8.2. It all works, all supported. Other vendors cut support off after the transition from FC to SAS.

Will we support those older shelves forever? No, that’s impossible, but at least we give our customers a lot of leeway and let them stretch their hardware investments significantly longer than any other major storage vendor I can think of.

Think long term

I encourage customers to always think long term. Try to stop thinking in 3-year increments. Start thinking of other ways you can stretch your investment, other ways to deploy older gear while still keeping it interoperable with newer hardware.

And start thinking about what will happen to your investment once newer gear is released.

D

Technorati Tags: , , ,

]]>
http://recoverymonkey.org/2013/09/24/emcs-vnx2-forklift-the-importance-of-hardware-re-use-slowing-down-obsolescence-and-maximizing-your-investment/feed/ 10 604
How to decipher EMC’s new VNX pre-announcement and look behind the marketing. http://recoverymonkey.org/2013/05/16/how-to-decipher-emcs-new-vnx-pre-announcement-and-look-behind-the-marketing/ http://recoverymonkey.org/2013/05/16/how-to-decipher-emcs-new-vnx-pre-announcement-and-look-behind-the-marketing/#comments Thu, 16 May 2013 15:53:06 +0000 http://recoverymonkey.org/?p=598 Continue reading "How to decipher EMC’s new VNX pre-announcement and look behind the marketing."]]> It was with interest that I watched some of EMC’s announcements during EMC World. Partly due to competitor awareness, and partly due to being an irrepressible nerd, hoping for something really cool.

BTW: Thanks to Mark Kulacz for assisting with the proof points. Mark, as much as it pains me to admit so, is quite possibly an even bigger nerd than I am.

So… EMC did deliver something. A demo of the possible successor to VNX (VNX2?), unavailable as of this writing (indeed, a lot of fuss was made about it being lab only etc).

One of the things they showed was increased performance vs their current top-of-the-line VNX7500.

The aim of this article is to prove that the increases are not proportionally as much as EMC claims they are, and/or they’re not so much because of software, and, moreover, that some planned obsolescence might be coming the way of the VNX for no good reason. Aside from making EMC more money, that is.

A lot of hoopla was made about software being the key driver behind all the performance increases, and how they are now able to use all CPU cores, whereas in the past they couldn’t. Software this, software that. It was the theme of the party.

OK – I’ll buy that. Multi-core enhancements are a common thing in IT-land. Parallelization is key.

So, they showed this interesting chart (hopefully they won’t mind me posting this – it was snagged from their public video):

MCX core util arrow

I added the arrows for clarification.

Notice that the chart above left shows the current VNX using, according to EMCmaybe a total of 2.5 out of the 6 cores if you stack everything up (for instance, Core 0 is maxed out, Core 1 is 50% busy, Cores 2-4 do little, Core 5 does almost nothing). This is important and we’ll come back to it. But, currently, if true, this shows extremely poor multi-core utilization. Seems like there is a dedication of processes to cores – Core 0 does RAID only, for example. Maybe a way to lower context switches?

Then they mentioned how the new box has 16 cores per controller (the current VNX7500 has 6 cores per controller).

OK, great so far.

Then they mentioned how, By The Holy Power Of Software,  they can now utilize all cores on the upcoming 16-core box equally (chart above, right).

Then, comes the interesting part. They did an IOmeter test for the new box only.

They mentioned how the current VNX 7500 would max out at 170,000 8K random reads from SSD (this in itself a nice nugget when dealing with EMC reps claiming insane VNX7500 IOPS). And that the current model’s relative lack of performance is due to the fact its software can’t take advantage of all the cores.

Then they showed the experimental box doing over 5x that I/O. Which is impressive, indeed, even though that’s hardly a realistic way to prove performance, but I accept the fact they were trying to show how much more read-only speed they could get out of extra cores, plus it’s a cooler marketing number.

Writes are a whole separate wrinkle for arrays, of course. Then there are all the other ways VNX performance goes down dramatically.

However, all this leaves us with a few big questions:

  1. If this is really all about just optimized software for the VNX, will it also be available for the VNX7500?
  2. Why not show the new software on the VNX7500 as well? After all, it would probably increase performance by over 2x, since it would now be able to use all the cores equally. Of course, that would not make for good marketing. But if with just a software upgrade a VNX7500 could go 2x faster, wouldn’t that decisively prove EMC’s “software is king” story? Why pass up the opportunity to show this?
  3. So, if, with the new software the VNX7500 could do, say, 400,000 read IOPS in that same test, the difference between new and old isn’t as dramatic as EMC claims… right? 🙂
  4. But, if core utilization on the VNX7500 is not as bad as EMC claims in the chart (why even bother with the extra 2 cores on a VNX7500 vs a VNX5700 if that were the case), then the new speed improvements are mostly due to just a lot of extra hardware. Which, again, goes against the “software” theme!
  5. Why do EMC customers also need XtremeIO if the new VNX is that fast? What about VMAX? 🙂

Point #4 above is important. For instance, EMC has been touting multi-core enhancements for years now. The current VNX FLARE release has 50% better core efficiency than the one before, supposedly. And, before that, in 2008, multi-core was advertised as getting 2x the performance vs the software before that. However, the chart above shows extremely poor core efficiency. So which is it? 

Or is it maybe that the box demonstrated is getting most of its speed increase not so much by the magic of better software, but mostly by vastly faster hardware – the fastest Intel CPUs (more clockspeed, not just more cores, plus more efficient instruction processing), latest chipset, faster memory, faster SSDs, faster buses, etc etc. A potential 3-5x faster box by hardware alone.

It doesn’t quite add up as being a software “win” here.

However – I (or at least current VNX customers) probably care more about #1. Since it’s all about the software after all🙂

If the new software helps so much, will they make it available for the existing VNX? Seems like any of the current boxes would benefit since many of their cores are doing nothing according to EMC. A free performance upgrade!

However… If they don’t make it available, then the only rational explanation is that they want to force people into the new hardware – yet another forklift upgrade (CX->VNX->”new box”).

Or maybe that there’s some very specific hardware that makes the new performance levels possible. Which, as mentioned before, kinda destroys the “software magic” story.

If it’s all about “Software Defined Storage”, why is the software so locked to the hardware?

All I know is that I have an ancient NetApp FAS3070 in the lab. The box was released ages ago (2006 vintage), and yet it’s running the most current GA ONTAP code. That’s going back 3-4 generations of boxes, and it launched with software that was very, very different to what’s available today. Sometimes I think we spoil our customers.

Can a CX3-80 (the beefiest of the CX3 line, similar vintage to the NetApp FAS3070) take the latest code shown at EMC World? Can it even take the code currently GA for VNX? Can it even take the code available for CX4? Can a CX4-960 (again, the beefiest CX4 model) take the latest code for the shipping VNX? I could keep going. But all this paints a rather depressing picture of being able to stretch EMC hardware investments.

But dealing with hardware obsolescence is a very cool story for another day.

D

 

Technorati Tags: , , , , , , ,

]]>
http://recoverymonkey.org/2013/05/16/how-to-decipher-emcs-new-vnx-pre-announcement-and-look-behind-the-marketing/feed/ 9 598