With the advent of performance-altering technologies (notice the word choice), storage sizing is just not what it used to be.
I’m writing this post because more and more I see some vendors not using scientific methods to size their solution, instead aiming to reach a price point, hoping the technology will work to achieve the requisite performance (and if it doesn’t, it’s sold anyway, either they can give some free gear to make the problem go away, or the customer can always buy more, right?)
Back in the “good old days”, with legacy arrays one could (and still can) get fairly deterministic performance by knowing the workload required and, given a RAID type, know roughly how many disks would be needed to maintain the required performance in a sustained fashion, as long as the controller and buses were not overloaded.
With modern systems, there is now a plethora of options that can be used to get more performance out of the array, or, alternatively, get the same average performance as before, using less hardware (hopefully for less money).
If anything, advanced technologies have made array sizing more complex than before.
For instance, Megacaches can be used to dramatically change the I/O reaching the back-end disks of the array. NetApp FAS systems can have up to 16TB of deduplication-aware, ultra-granular (4K) and intelligent read cache. Truly a gigantic size, bigger than the vast majority of storage users will ever need (and bigger than many customers’ entire storage systems). One could argue that with such an enormous amount of cache, one could dispense with most disk drives and instead save money by using SATA (indeed, several customers are doing exactly that). Other vendors are following NetApp’s lead and starting to implement similar technologies — simply because it makes a lot of sense.
It is crucial that, when relying on caching, extra care is taken to size the solution properly, if a reduction in the number and speed of the back-end disks is desired.
You see, caches only work well if they can cache the majority of what’s called the active working set.
Simply put, the working set is not all your data, but the subset of the data you’re “touching” constantly over a period of time. For a customer that has, say, a 20TB Database, the true working set may only be something as small as 5% — enabling most of the active data to fit in 1TB of cache. So, during daily use, a 1TB cache could satisfy most of the I/O requirements of the DB. The back-end disks could comfortably be just enough SATA to fit the DB.
But what about the times when I/O is not what’s normally expected? Say, during a re-indexing, or a big DB export, or maybe month-end batch processing. Such operations could vastly change the working set and temporarily raise it from 5% to something far larger — at which point, a 1TB cache and a handful of back-end SATA may not be enough.
Which is why, when sizing, multiple measurements need to be taken, and not just average or even worst-case.
Let’s use a database as an example again (simply because the I/O can change so dramatically with DBs).You could easily have the following I/O types:
- Normal use – 20,000 IOPS, all random, 8K I/O size, 80% reads
- DB exports — high MB/s, mostly sequential write,large I/O size, relatively few IOPS
- Sequential read after random write — maybe data is added to the DB randomly, then a big sequential read (or maybe many parallel ones) are launched.
You see, the I/O profile can change dramatically. If you only size for case #1, you may not have enough back-end disk to sustain the DB exports or the parallel sequential table scans. If you size for case 2, you may think you don’t need much cache since the I/O is mostly sequential (and most caches are bypassed for sequential I/O). But that would be totally wrong during normal operation.
If your storage vendor has told you they sized for what generates the most I/O, then the question is, what kind of I/O was it?
The other new trendy technology (and the most likely to be under-sized) is Autotiering.
Autotiering, simply put, allows moving chunks of data around the array depending on their “heat index”. Chunks that are very active may end up on SSD, whereas chunks that are dormant could safely stay on SATA.
Different arrays do different kinds of Autotiering, mostly based on various underlying architectural characteristics and limitations. For example, on an EMC Symmetrix the chunk size is about 7.5MB. On an HDS VSP, the chunk is about 40MB. On an IBM DS8000, SVC or EMC Clariion/VNX, it’s 1GB.
With Autotiering, just like with caching, the smaller the chunk size, the more efficient the end result will ultimately be. For instance, a 7.5MB chunk could need as little as 3-5%% of ultra-fast disk as a tier, whereas a 1GB chunk may need as much as 10-15%, due to the larger size chunk containing not very active data mixed together with the active data.
Since most arrays write data with a geometric locality of reference (in contrast, NetApp uses geometric and temporal), with large-chunk autotiering you end up with pieces of data that are “hot” that always occupy the same chunk as neighboring “cool” pieces of data. This explains why the smaller the chunk, the better off you are.
So, with a large chunk, this can happen:
The array will try to cache as much as it can, then migrate chunks if they are consistently busy or not. But the whole chunk has to move, not just the active bits within the chunk… which may be just fine, as long as you have enough of everything.
So what can you do to ensure correct sizing?
There are a few things you can do to make sure you get accurate sizing with modern technologies.
- Provide performance statistics to vendors — the more detailed the better. If we don’t know what’s going on, it’s hard to provide an engineered solution.
- Provide performance expectations — i.e. “I want Oracle queries to finish in 1/4th the time compared to what I have now” — and tie those expectations to business benefits (makes it easier to justify).
- Ask vendors to show you their sizing tools and explain the math behind the sizing — there is no magic!
- Ask vendors if they are sizing for all the workloads you have at the moment (not just different apps but different workloads within each app) — and how.
- Ask them to show you what your working set is and how much of it will fit in the cache.
- Ask them to show you how your data would be laid out in an Autotiered environment and what bits of it would end up on what tier. How is that being calculated? Is the geometry of the layout taken into consideration?
- Do you have enough capacity for each tier? On Autotiering architectures with large chunks, do you have 10-15% of total storage being SSD?
- Have the controller RAM and CPU overheads due to caching and autotiering been taken into account? Such technologies do need extra CPU and RAM to work. Ask to see the overhead (the smaller the Autotiering chunk size, the more metadata overhead, for example). Nothing is free.
- Beware of sizings done verbally or on cocktail napkins, calculators, or even spreadsheets – I’ve yet to see a spreadsheet model storage performance accurately.
- Beware of sizings of the type “a 15K disk can do 180 IOPS” — it’s a lot more complicated than that!
- Understand the difference between sequential, random, reads, writes and I/O size for each proposed architecture — the differences in how I/O is done depending on the platform are staggering and can result in vastly different disk requirements — making apples-to-apples comparisons challenging.
- Understand the extra I/O and capacity impact of certain CDP/Replication devices — it can be as much as 3x, and needs to be factored in.
- What RAID type is each vendor using? That can have a gigantic performance impact on write-intensive workloads (in addition to the reliability aspect).
- If you are getting unbelievably low pricing — ask for a contract ensuring upgrade pricing will be along the same lines. “The first hit is free” is true in more than one line of business.
- And, last but by no means least — ask how busy the proposed solution will be given the expected workload! It surprises me that people will try to sell a box that can do the workload but will be 90% busy doing so. Are you OK with that kind of headroom? Remember – disk arrays are just computers running specialized software and hardware, and as such their CPU can run out of steam just like anything else.
If this all seems hard — it’s because it is. But see it as due diligence — you owe it to your company, plus you probably don’t want to be saddled with an improperly-sized box for the next 3-5 years, just because the offer was too good to refuse…