NetApp posts new SPEC SFS NFS results – far faster than V-Max with Celerra VG8

Following the new NetApp block-based SPC-1 results yesterday, here is some NAS benchmark action. This page contains all the SPEC SFS results. SPEC SFS is the NAS equivalent of SPC-1.

SPEC SFS is more cache-friendly than the brutal SPC-1, click here for some more information regarding this industry-standard NAS benchmark. The idea is that thousands of CIFS and NFS servers have been profiled and the benchmark reflects real-life NAS usage patterns.

In the same vein as the SPC-1 benchmarks, the configurations we submit to the standard benchmarking authorities are based on realistic systems customers could buy, not $7m lab queens. So, NetApp SPEC and SPC submissions:

  • Are always tested with RAID-DP (RAID-6 protection equivalent) – other vendors test with RAID10 most of the time, and never with RAID-6 (ask them why this is, BlueArc gets respect for being the only other one in the list doing our level of protection)
  • Have a target of using the most cost-effective configuration possible
  • Provide not just high IOPS but also very low latency
  • Are a realistic, deployable configuration, not just the fastest box we have (we still have the 1 million SPEC ops record for a 24-node system, that’s kind of pricy plus the result is old and can’t be compared with the current benchmark code – still, look at the rankings).

So, with those lofty goals in mind, we have 3 new submissions:

  1. CIFS benchmark, 3210 w/ SATA drives – typical low/mid-range system
  2. NFS benchmark, 3270 w/ SAS drives – typical mid-range system, no Flash Cache used in this one.
  3. NFS benchmark, 6240 w/ SAS drives – typical high-end (but not highest) system.

All NetApp systems included some Flash Cache memory boards to provide further acceleration (EDIT: aside from the 3270). We have an even faster system (6280) that we will be submitting later on as a special treat (there’s a certain degree of red tape and ceremony to even do one submission…)

Here’s an abbreviated chart in easily digestible form – showing the most recent results from perennial rivals NetApp and EMC (BTW – of all the systems in the chart, only one of them is truly unified and can provide block and NAS on the same architecture without the need for contortions).

System Result (higher is better) Overall Response Time (lower is better) # Disks Exported Capacity in TB RAID Protocol
NetApp 3210 64292 1.50 144x 1TB SATA 87 RAID-DP CIFS
NetApp 3270 101183 1.66 360x 15K RPM 450GB SAS 110 RAID-DP NFS
NetApp 6240 190675 1.17 288x 15K RPM 450GB SAS 85 RAID-DP NFS
EMC NS-G8 on V-Max 118463 1.92 Bunch o’ SSD (96 fancy STEC 400GB ZeusIOPS) 17 RAID-10 CIFS
EMC NS-G8 on V-Max 110621 2.32 Bunch o’ SSD (96 fancy STEC 400GB ZeusIOPS) 17 RAID-10 NFS
EMC VG8 on V-Max 135521 1.92 312x 15K RPM 450GB FC 19 RAID-10 NFS

Guide to reading the chart, and lessons learned:

  • A “puny” NetApp 3210 with SATA gets better overall response time than an all-SSD V-Max costing well over 10x
  • Notice the amount of usable space on NetApp systems, with even better protection than RAID10
  • The 6240 scored far higher even though it had less disks
  • The NetApp systems have “just” 2 controllers that do everything, vs. the EMC submissions with 4 V-Max engines, plus extra Celerra Data Movers and Control Stations on top. What do you think is more efficient?

In slide format:

I do have some questions to ask certain other vendors as a parting shot:

  1. Sun/Oracle – you keep saying your new boxes are a cheaper way to get NetApp-type functionality, you’ve had them for a while, why not submit to SPEC or SPC? (there is not a single SPEC result from Sun).
  2. EMC – maybe show the world how a system not based on V-Max runs? With RAID-6? (Even V-Max with RAID6, no problem…)
  3. EMC: What’s the deal with the exported capacity, even with 312x drives?
  4. All of you with large striped pools of RAID5 – have you bothered explaining to your customers what will happen to the pool if you have a dual-drive failure in any RAID group? Unacceptable.

D

16 thoughts on “NetApp posts new SPEC SFS NFS results – far faster than V-Max with Celerra VG8”

  1. JohnFul from NetApp PS here.

    This is coming from the perspective of the guy that has to go on site and make it happen…

    It’s great to see standard benchmark results coming out with the release of new tin. As Moore’s law progresses, with each iteration there’s faster processors available to make better decisions about what to cache and where to optimally place writes. Brute force, however, isn’t everything. This results show that intelligently using the additional processing resources provided by the endless march of Moore’s law can result in truely spectacular performance AND efficiency. You don’t have to choose one OR the other.

    Wow! 5 times as much exported space as the VMAX! Not to rag on the competition; thier results are typical of what Traditional Legacy Array architectures can provide. Still, an easy 25% better response time with half the spindles (half the IOPS, but it is SATA) for the low end of the mid range 3210 vs. top of the line VMAX. Flipping to 15K SAS (lest we not forget the new high density 2.5″ SAS shelf) with fewer spindles (and much better desinty/lower power cooling and space requirements) beating top of the line VMAX on performance by 50%. Then we have with no Flash Cache and SAS spindles on mid range 3270 beating the “Lab Queen” VMAX/EF(U)D configuration on performance by 14% on performance… All the while providing 5 times the exported space of the best the competition has to offer. Hmmm 5X.. now I know where that 20% number came from.

    In the end, all the competition has to say about it is “Nothing too surprising or game-changing here”. Guess they don’t know a game changer when they see it.

    Great results Dimitris, and great blog. This certainly is an eye opener.

    J

  2. D,

    what about adding two columns to the chart to clearify:
    – exported capcity per IOP (GB / IOP)
    – cost per IOP ($ / IOP)

    Just two important figures to show value for money in a glance, WITH a performance winning configuration…

    All in all, great blog.

  3. If a 3210 w/512GB of Flashcache and 144 SATA drives delivers lower latency than a Celerra NS-G8 + VMAX using 312 FC drives, why do I need FAST again? What problem is FAST trying to solve? Remind us again.

  4. These results are misleading — not sure if it’s intentional but this needs to be identified because the real world requires it. Here’s why:

    NetApp’s numbers are based on an active/active 6240 cluster while EMC’s numbers are based on an N+1 configuration (1 active Data Mover, 1 standby). This is called out in the “System Under Test Configuration Notes” section for each vendors SPECsfs2008_nfs.v3 results page.

    If I’m running in the real world with this NetApp config and I get a controller failure and subsequent failover, I will experience performance degradation until I can get the failed controller repaired and operational again (assuming these results ran the controller’s CPUs above an aggregate 100% utilization).
    If I’m running in the real world with this EMC config and I get a Data Mover failure and subsequent failover, I will only experience temporary impact on failover (also would possibly occur with NetApp). Because I have an unused, standby Data Mover, I can guarantee no performance impact.

    Maybe some business has deemed it is okay to run both NetApp controllers above an aggregate 100% CPU utilization but many would not be okay with this. Therefore, it’s extremely important to qualify these results with this kind of detail.

    A test I would like to see is one where both vendors put forth a configuration that enables me to run in a way that won’t impact performance on controller failure. This would mean no more than 100% aggregate CPU performance on a clustered NetApp and some combination of Active/Standy Data Movers from EMC.

  5. Hi David,

    Insightful comment and correct for the VG8 results (the others were more than 1 active DM) but the real world (since as you said this needs to reflect reality) also doesn’t run all their NAS on a 4-node V-Max with RAID10 and SSD or hundreds of RAID10 FC drives.

    The Celerra can’t really run active/active and maintain auto-failover, it needs at least 1 passive node. So, EMC would need to submit a 2+1 to do this – they are more than welcome to, and, in the process, run the box at a realistic RAID level that doesn’t give up anything in protection vs the NetApp approach. RAID-6 is their only choice then.

    You have to look at the system as a whole (there’s an entire separate array as the back-end with the EMC configs) – at the moment, nobody really knows where the bottleneck is with the EMC result.

    Is it the V-Max? The Celerra? Both?

    Look at the results as a balanced, realistic whole, and draw your own conclusions – the NetApp solution is far less expensive, has tons more space, great performance and better RAID protection. Arguably, it is the better realistic system.

    Could EMC build a system that’s even faster? Most probably. How much will it cost? Their current result is already almost double the cost. You’re probably looking at a fully maxed-out V-Max and a full complement of VG8 nodes.

    EMC should submit “normal” Celerra configs with Clariions at the back end (since they always like to compare our systems to the Clariion and not V-Max). Indeed, if you listen to EMC sales, they would say that a NetApp system is more like an NS-480 or -960.

    You see, if I’m a consumer, the EMC result is meaningless – it’s a config 99.99% of customers can’t possibly afford. Which is why NetApp has various results and not just the biggest, baddest boxes we make (that we didn’t even test yet :) )

    Even people that have the money, won’t go with such a large system and RAID-10, since NAS workloads typically demand decent amounts of space and efficiency.

    I’m rambling, too many hours of work today…

    D

  6. I think you’ve stumbled across the reason why I think all of these types of benchmarks are useless to me (end user), and used pretty much as marketing to try and sell you something.

    i.e.
    NetApp 3270 raw disk size=162TB, fileset size=11.7497TB
    NetApp 6240 raw disk size=129.6TB, fileset size=22.0475TB
    EMC FC VMAX raw disk size=140.4TB, fileset size=15.8065TB

    Seriously? You have >100TB of raw storage storage on all these frames and you are using at most 22TB of space (and the big NetApp system has >1TB of cache, ~4x more than the VMAX and 28x more than the 3270!!) Tell me what customer is really going to do that? Tell me where this has any real-world reality to it at all? Really show me why this isn’t sales marketing and to me. Does NetApp really expect me to purchase a 3270 and only use ~7% of the raw drive space? You throw away the million ops because it’s not reality, why are these smaller tests seem anything like reality.. why should these not be thrown away as well?

    Since you are constantly pushing on about benchmarks, why doesn’t NetApp do a proper realistic customer benchmark that would actually use >100TB of the space consuming ~85-90% of the raw space? We both know the answer to that… real life numbers using an array and filling it up doesn’t create flashy marketing glossies and subsequently don’t move frames in *sales* (but would be something actually useful to me as a customer)

    I’ll say this if any vendor brought these types of benchmarks to my door as proof… well I’d be personally insulted as how they were trying to pull one over on me or at best they are not dishonest but ignorant and probably useless to fix my problems.

  7. InsaneGeek,

    SPEC SFS is a totally different type of benchmark than SPC-1, but I wanted to address your utilization question.

    I think you’re confusing SPEC with SPC, and don’t really know how much huge amounts of cache from NetApp cost on the street (almost all my customers have at least 512GB, many 1TB, and several 2TB or more – this is not something high-end, costly or rare). The systems presented are realistic from a price standpoint. I’m sure you pay dearly for DMX/USP cache…

    But, most importantly (which will address your main concern):

    For our SPC-1 publication on the 3270, we used

    • Raw storage: 35.8 TB (manufactures rated capacity of the drives X number of drives).
    • Usable storage: 25.9 TB (72%), after subtracting all overheads including spare drives, /vol/vol0, etc.
    • Used storage: 21.7 TB.
    These numbers come straight from the full disclosure report.

    For SPC-1 We used 84% of the usable storage. Not an atypical number for a production storage system.

    The application utilization ratio is 60%. This ratio is the ratio of the size of the application data set to the size of all of the raw disks. Our application utilization ratio is an industry leading number.

    SPC-1 has a lot of random writes. The vast majority of vendors use RAID 1/0 to accommodate those writes. There are a couple of SSD results that use parity raid, and a couple of HDD results that use RAID-0 (no protection). Only the RAID-0 results and one of the SSD/RAID-5 results gets better than 60% application utilization.

    We use 1 TB of flash cache to accelerate this workload. This amounts to 4.6% of the active data set. Our best practices that suggest using flash equal to between 3% and 5% of the size of the active data set.

    In summary, for our SPC-1 publication we believe the system is configured with a typical amount of flash and that we’ve used a typical amount of the usable space. We have an industry leading overall utilization because we can use RAID-DP where others must use RAID-1/0. We’ve followed our best practices on how much flash to use and achieved a record high SPC-1 IOPS/disk drive for any solution that isn’t pure SSD.

    Benchmarks are not useless if you know what to look for. A good example:

    Since SPC-1 does almost 60% writes, it’s very indicative of the write efficiency of a system.

    SPEC SFS is different, it’s a NAS benchmark and is heavy on metadata and not so heavy on writes, and shows other aspects of storage.

    D

  8. Nope, I clicked on your link to the *NAS* benchmarks.

    In your post you supplied the quantity and size of the drives, all one needs to do is click on your NAS benchmark link, open up the appropriate report and read the fileset size… or even easier the as per the SFS guidelines fileset size is directly proportional to number of ops: 1x SFS ops/sec == 120MB of fileset. Divide the number of SFS ops by 120 and you’ll get the space that was required by the test… this number is *dramatically* larger then the hardware used to run the test that it’s laughable.

    So again why do NetApp, EMC, etc require so much raw space? Because everybody knows they couldn’t get such numbers without “rigging the test” to where it’s unrealistic comparatively to reality. Hence it’s only good to sell hardware by smoke and mirrors, because when you start using the frame as a customer actually would the performance is going to start plummeting and these tests show nothing as to how the system scales over a large dataset only an relatively extremely small dataset. I need to know how a 100TB array runs when I’m using it like it will be in reality, not how it runs when I’ve got hardly anything on it.

  9. Thanks for the new comment. Unfortunately, your logic is not valid regarding the SPC-1 benchmark (84% full yet we get better performance per spindle than anyone else – is that level of utilization acceptable to you? :) )

    You should spend some time examining what the benchmarks do, both organizations have extensive documentation. I’ll post something explaining SPEC.

    D

  10. Not sure why you keep going back to SPC since I don’t think I’ve ever mentioned it (maybe you are grabbing onto the word *all* out of my statement “all of these types of benchmarks are useless”), but I’m talking about SFS. I’ve got my opinions on SPC but I’ve never said the NetApp SPC test has the same issue as SFS and is meaningless to this conversation.

    So really it comes down to this do you agree or disagree that the results from the SPEC SFS NFS results you posted are meaningless from both NetApp & EMC to me as a customer (and I mean customer truthfully as I have both companies on my floor)? If you are going to convince me it is meaningful you are going to have to do something other than talk about a completely different benchmark.

    I actually have spent lots of time examining what a good portion of benchmarks do… and I still haven’t found them useful to my business; but they sure do make flashy headlines for someone trying to sell me something.

  11. Simply going back to SPC because your main argument seems to be all vendors are trying to cheat by short-stroking drives. We had 84% space utilization on SPC, and got a record result of IOPS/spindle efficiency, which negates that argument. It IS a different benchmark than SPEC and you know what, it’s the tougher benchmark by FAR. Utterly brutal on disk systems. SPEC is way easier and completely different.

    SPEC is different in the way it touches files and how much space is allocated.

    The SPEC – spec (sorry) is like this:

    SPEC is not a write-heavy benchmark, so NetApp doesn’t get a huge advantage running it (NetApp gear is write-optimized).

    The amount of space used by SFS2008 is considerably more than it’s predecessor. Space allocated is 120MB per requested op. (In SFS97, it was 10 MB/op).

    Of 120 MB/op allocated, 30 percent will be accessed during the execution of the benchmark. So, more space is allocated than touched. This is so that the accesses do not have unrealistic physical locality and is modeled after actual fileserver workloads.

    Notes:

    Of the disk space that is available for data, Netapp makes the majority of the physical space available for actual use (exported capacity is a large percentage of physical disk space). However, some other vendors do tend to export only a very small fraction of the physical disk space. This is a way of forcing the benchmark to run in a narrower band of tracks on the disks. Netapp does not artificially constraint the benchmark, and prefers to operate in the same way that the system will be used in the real world :)

    Netapp also prefers to use configurations that are realistically configured. For example, stuffing a room full of SSDs, and striping across them, is probably not realistic, yet we have an example from one vendor that has demonstrated that one can construct a pretty expensive experiment in an artificial world.

    If anyone were to construct a benchmark that always touched all available space, on very large systems with hundreds of spindles, the benchmark’s runtime would likely be unrealistic and unusable.

    Do your workloads touch every single bit of capacity simultaneously?

    And no, I don’t agree that the SPEC results are meaningless. They do show controller efficiency in a metadata-heavy NAS workload, and since the NetApp controllers do RAID and NAS and everything else at the same time, it shows that the controllers can sustain some pretty decent I/O and low latencies.

    EMC in their benchmarks has a pretty loaded V-Max as the back-end, initially with just SSD. They got heat about that then changed it to hundreds of short-stroked and mirrored FC drives, but the fact remains that it’s not a realistic solution people would get to serve files (some might, but it’s kinda rare – most get the “standard” Celerra that has a Clariion back-end).

    D

  12. I was doing some quick math base on the benchmark definition at (http://www.spec.org/sfs2008/docs/usersguide.html#_Toc191888937) there is 120MiB of data for every I/O per second
    at 1 IOP per 120 MB = 8.53 IOPS / GB
    If you were to do this at 100% capacity on a 300GB drive (which is about 272GiB) you get 2320 IOPS per drive, and on a 1TB drive that would required around 7,800 IOPS per spindle.
    This assumes no RAID or any other overhead. This is about 10x more than most people assume is possible out of a naked 15K drive and about 80x more than you’d expect from tSATA. SSD should get close to these kinds of IOP density, but even EMC with an all SSD config only managed around 1100 IOPS / “Disk” with EFD’s and a whopping 264GB of DRAM cache.
    If you were to ask me whether this number of IOPS/GB is representative of how hot end user workloads are in production environments then I’d say yes, but only for the files in active use which is typically less than 10% of the total of files in the filesystem. ( As an interesting side note, one of the interesting things the SPEC-SFS benchmark does is to demonstrate the ability of the array to effectively stripe a very hot workload across a number of disks.)
    The relevant question which I think is implied by some of the other comments is whether the benchmark would have had significantly different results if the underlying filesystem(s) was filled with varying quantities of inactive files. From experience, on a NetApp array, the results would remain substantially the same. You can fill the fileystems to about 90% full while still maintaining extremely high performance for writes (for reads there is no difference regardless of how full the filesystem is). This is borne out by NetApp tr-3647 which states
    For write-intensive, high-performance workloads we recommend leaving available approximately 10% of the usable space for this optimization process. This space not only ensures high-performance writes but also functions as a buffer against unexpected demands of free space for applications that burst writes to disk
    Leaving an extra 10% free space to maintain extremely high write performance is an acceptable tradeoff in most situations, especially when compared to other solutions which would require RAID-10 to maintain similar levels of write performance at an immediate 50% space penalty
    .

  13. Just because you make the space available for use, doesn’t mean anything if you aren’t using it. On pretty much any extents based filesystem if you don’t use the capacity you aren’t jumping around on the drive. i.e. 1TB filesystem only having 1GB written to it generally will not be randomly spread around and will exist in one section of the drive (one of my problems with Oracle ASM in OLTP mode). By not using the space you are in affect using (your words here) “narrower band of tracks on the disks”.

    Lets get down to the nitty gritty in general a Segate SAS 450GB 15k drive has:

    Usable size/disk=~429GB
    Average Latency ~2ms
    Avg seek ~3.5ms (think this number is way too low… but that’s the spec #)

    In the NetApp 3270 there are 352x (308x data & 44x parity) drives in a 14+2 raid configuration (.875 == usable after raid).

    ~429 * 352 * 0.875 = ~129TB usable in the array after raid (ignoring extra metadata overhead in WAFL, etc)
    Fileset size = 11.7497TB

    11.7497 / 129 = ~9.1 % of each disk to hold the entire fileset

    Let’s add in some WAFL, etc overhead and let’s round up from 9.1 to an even 10% of the 450GB drive to hold the entire dataset. Which really becomes the size of a full stroke during the SFS NAS test. All things being equal the average seek time should now be 0.35ms = 3.5*10%. This number is significantly weighted in NetApp’s favor as the data/track ratio from outer to inner is over 2x further reducing the seek distance but I’ll use that (off the cuff I’m guessing 1.3-1.5x further seek time reduction) for any other bits of unknown stuff within WAFL.

    1sec=1000ms
    1000 / (2ms latency + 3.5ms seek) = ~181 iops/disk
    1000 / (2ms latency + 0.35ms seek) = ~425 iops/disk

    181/disk * 308/data disks = 55748 disk iops (ignoring parity intentionally)
    425/disk * 308/data disks = 130900 disk iops

    I have to say that using a greater portion of the drive wouldn’t make much of a difference doesn’t fly. The difference on number of iops supportable by the disk alone is >2.34x. True it doesn’t make a caching difference as the amount of active data would be about the same, but it makes a *significant* seek time difference which adds up. Really you are going to look me in the face and say that the a drive that is only has to seek over 9.1% of the entire drive won’t be much different than one seeking over 90% of the drive? The amount of data to be accessed isn’t changed (~30% of 11TB), but it has to seek over huge distances.

    How about this if you want to convince me, run it with 2.34x less drives (150 instead of 352) that’s the ratio of the short to full stroke iop rate. You still have way more free space than you ever will need (56TB for 11TB of data on disk) and still hit that 101k number.

    If you guys are really saying that for real customers the active size is only 10% of the dataset size. The fileset size for 101k is 11.7497TB, the test has 30% or 3.524TB. So for a test that would be close to a customer, 10% active data your total size doesn’t need to be > 35.2TB (you’ve still got 37% more free which is way more than enough for WAFL, Aggr, etc needs). 37% after raid overhead is still pretty bad.

    Also I really think you should the numbers in half as I can’t run that way in production. NetApp will directly tell you that you shouldn’t run that hot because in case of a failure you are screwed. Could I really run a production load with high availability at 101k and sustain a head failure? Any truthful NetApp field person would look you in the eye and say “hell no” and that’s another reason why these types of benchmarks are BS to me… you have to go through things with such a fine-tooth comb to find things where vendors aren’t running like a sane person which just further proves my point that reality != benchmark and it’s just marketing to sell stuff.

  14. I still think there’s an issue with the understanding of how SPEC SFS works.

    You see, with SPEC SFS, one CANNOT make the system more or less full. The benchmark itself fills up the box, we don’t do it.

    The way SPEC SFS works, it creates capacity for every operation, so, the faster a box is, the more capacity gets consumed, until no more operations can be performed.

    It’s not like we could make it take 100TB. Even if we wanted to we couldn’t fill up the space and stay within the benchmark rules. It eats as much space as it can before running out of steam.

    So, the way all vendors benchmark SFS is that we add drives until the front end simply can’t do any more. Using fewer drives was not saturating the front-end (we tried it), so we kept adding drives until that happened. Simple as that.

    Without Flash Cache, the 3270 needs 360 drives. With Flash Cache, the 6240 needs only 288 drives but delivers almost 2x the performance. So, the combo of faster controllers and extra cache means that fewer spindles are needed to get far better performance, yet almost 2x the space was consumed because it could sustain about 2x the performance of the 3270.

    The only way to get the effect you want (full drives) is to use tiny drives that aren’t made any longer these days (and haven’t been produced in a long while).

    Each benchmark shows a different thing – if you want to see how a NetApp system operates under a crazy block workload while 84% full, look at the SPC-1 results. I’d welcome comments from you on that post.

    If you want to see how much throughput you can get for NAS, SPEC SFS is pretty much the only industry standard now, like it or not… there are efforts underway for a new version of SPEC, if you want to get involved then please do so and assist the entire community.

    Another way to look at it:

    If you connect the optimal number of disks, this result is the peak NAS ops/sec that the system will provide, it’s not designed to be a disk benchmark per se (SPC-1 is the pure disk benchmark). The amount of available space will vary with the changes in technology, but that is not what this benchmark measures. Its main goal is to measure the maximum throughput in NAS ops/sec that the system can sustain, not how much space does the latest disk technology provide.

    Is it reasonable to connect so much space? Imagine that I used 10GB drives. I would have a very small amount of space, but reach the optimal number of spindles pretty quickly. But that would not be a reasonable configuration as the drives are not available. So… one uses the disks that are most commonly used by customers and a sufficient number of them to reach the limit of the system. If a customer needs this number of ops/sec then the spindle count is one area that the customer could easily address.

    How much space this provides would again be dependent on the selection of the size of the disks.

    Regarding the active fileset: That’s again how SPEC works (and, typically, 5-10% is what I see as a typical working set with all customers).

    Here’s the alternative with some rough calculations:

    500GB drives, times 300 of them, with 4k transfers, and random I/O – how long would it take to fully touch every byte of storage ?

    Back of the envelope:
    250 ops/sec/drive (typical SAS or FC drive )
    250 * 300 = 75000 (total ops/sec)
    500 GB * 300 = 150TB
    150TB / 4k = 40265318400 4k chunks
    40265318400/75000 = 8053064 seconds
    8053064/60 = 134218 minutes
    134218/60 = 2237 hours
    2237/24 = 93 days

    So… if one argues that one must touch every byte of storage, then one may have a very long running benchmark.

    And, you’re right, the benchmark shows 2 maxed-out controllers for a full system, you absolutely couldn’t run that fast on 1 controller. My argument regarding that: It’s up to the customer to decide whether they are OK with performance degradation in such a situation or not. ANY system with 2 active controllers falls under the same umbrella – FAS, CX, AMS, DS, etc.

    On systems with more than 2 controllers (NetApp Cluster-Mode, V-Max, Celerra with 8 nodes, Isilon, VMware clusters to name a non-storage example) – you decide whether you want to run with all systems active, or keep one as spare (which is what Celerra does).

    The tradeoff with a spare is that it does nothing, it just sits there waiting for the other controller to break. So, on a 2-node Celerra, one node does absolutely nothing while the other does all the work. In a 4-node Celerra, only 3 of the 4 nodes can do useful work, yet you pay the hardware, software and maintenance price for all 4.

    The tradeoff with all active is just that – they’re all active, so if you lose 1 node from a 4-node system (like the V-Max EMC uses for their tests), you’ve neatly lost 25% of your performance. But, while all the nodes are operational, you retain 100% of performance, and utilize your investment.

    How you run your systems is totally up to you.

    From the SPEC manuals:

    Setting up the System Under Test (SUT)

    There are several things you must set up on your server before you can successfully execute a benchmark run.

    1. Configure enough disk space. SPECsfs2008 needs 120 MB of disk space for each NFS or CIFS ops/sec you will be generating, with space for 10% growth during a typical benchmark run (10 measured load levels, 5 minutes per measured load). You may mount your test disks anywhere in your server’s file space that is convenient for you. The maximum NFS or CIFS ops/sec a server can process is often limited by the number of independent disk drives configured on the server. In the past, a disk drive could generally sustain on the order of 100-200 NFS or CIFS ops/sec. This was only a rule of thumb, and this value will change as new technologies become available. However, you will need to ensure you have sufficient disks configured to sustain the load you intend to measure.

    2. Initialize and mount all file systems. According to the Run and Reporting Rules, you must completely initialize all file systems you will be measuring before every benchmark run. On UNIX systems, this is accomplished with the “newfs” command. On a Windows system the “FORMAT” utility may be used. Just deleting all files on the test disks is not sufficient because there can be lingering effects of the old files (e.g. the size of directory files, location of inodes on the disk) which affect the performance of the server. The only way to ensure a repeatable measurement is to re-initialize all data structures on the disks between benchmark runs. However, if you are not planning on disclosing the result, you do not need to perform this step.

    3. Export or share all file systems to all clients. This gives the clients permission to mount, read, and write to your test disks. The benchmark program will fail without this permission.

    4. Verify that all RPC services work. The benchmark programs use port mapping, mount, and NFS services, or Microsoft name services, and file sharing, provided by the server. The benchmark will fail if these services do not work for all clients on all networks. If your client systems have NFS client software installed, one easy way to do this is to attempt mounting one or more of the server’s exported file systems on the client. On a Windows client one may try mapping the shares to ensure that the services are correctly configured on the CIFS server.

    5. NFS servers generally allow you to tune the number of resources to handle TCP requests. When benchmarking using the TCP protocol, TCP support is of course required, and you must also make sure that UDP support is at least minimally configured or the benchmark will fail to initialize.

    6. Ensure your server is idle. Any other work being performed by your server is likely to perturb the measured throughput and response time. The only safe way to make a repeatable measurement is to stop all non-benchmark related processing on your server during the benchmark run.

    7. Ensure that your test network is idle. Any extra traffic on your network will make it difficult to reproduce your results, and will probably make your server look slower. The easiest thing to do is to have a separate, isolated net¬work between the clients and the server during the test. Results obtained on production net-works may not be reproducible. Furthermore, the benchmark may fail to correctly con¬verge to the requested load rate and behave erratically due to varying ambient load on the network.

  15. Just happened to see the long discussion above, and InsaneGeek has really nailed it — dead 0n right.

    In short, storage benchmarks that rely on the classic “bench-marketeering” trick of short-stroking the disks are useless for any purpose, and intentionally deceptive.

    This is true whether it’s a filesystem benchmark or a block-level benchmark.

    This deceptive practice was identified a very long time ago, and this is the reason why shortstroking is expressly forbidden in (for example) SPC-1.

    I cannot understand the argument made here that shortstroking is somehow acceptable in a filesystem benchmark vs a block-level benchmark?!?!

    How is a test that restricts the user to only using 5-10% of the capacity they purchased relevant anywhere in the real world?

Leave a comment for posterity...