Is EMC under-sizing RecoverPoint and Avamar deals to win business?

It’s been a while since I wrote anything – unlike some, I actually have a day job! Well, at least that’s my excuse.

My admiration for RecoverPoint is well known (see older post, which is referenced internally within EMC as a great pro-RecoverPoint article). It really is a good product and, next to VMware, my favorite EMC acquisition.

So it incenses me when I see a good product being misconfigured, and reminds me of Hanlon’s Razor: “Never attribute to malice that which can be adequately explained by stupidity“. You see, I’d rather chalk this up as sales not knowing what they’re doing rather than assume that EMC knows full well the ramifications of their decision and goes ahead and does the dirty deed anyway.

However, I’ve seen multiple cases recently where RecoverPoint and/or Avamar were most decidedly incorrectly sized to support the customer’s workload. The customer likes the price and goes for the solution, only to be in for a nasty surprise later on. Not to worry, everything can be fixed with some more boxes, licenses and hard disks! After all, it’s tough and expensive to rip the stuff out!

To start with RecoverPoint: it can be a wonderful DR tool but, like any tool, needs to be used correctly in order to be most effective. For instance, there are several aspects when designing a RecoverPoint solution:

  • One needs to take into account the sustained throughput each device can handle (minuscule when compared to the total bandwidth of a CX4 or V-Max), and add extra devices in order to comfortably sustain the throughput the customer needs – even if that means you go beyond the 2-device-per-site RecoverPoint SE maximum and into the realm of “full” RecoverPoint (which can do more than 2 appliances per site, for added performance).
  • To expand on the previous point, assume that one of the RecoverPoint devices is “gravy” and is there to fail over if another box breaks. So, you effectively don’t want to be relying on having the full complement of RecoverPoint boxes working. This is especially important in 2-box RecoverPoint SE configs. If one box breaks (and they’re plain Dell 1950 servers) then that should not be debilitating to your performance while you’re waiting for a new box.
  • Licensing is capacity-based, which also needs to be explained to the customer (including what it means price-wise if you go beyond what RecoverPoint SE will support).
  • There is an absolute ceiling for TB replicated
  • There’s a different price depending on whether you want to do local only, remote only or both kinds of replication (CDP, CRR and CLR licenses)
  • Beware of the increased I/O on the array! When doing any kind of traffic through RecoverPoint, at the very least you get quite a bit more I/O on the “journal” (the redo log part of RecoverPoint) in addition to your main disk. If you want to also do local recovery, you could be doing as much as 3x the I/O! You see, you have to send the normal copy of the data through first, then Clariion splits off the I/Os to RecoverPoint, which then writes data to a full local mirror, then also to the journal. Obviously, the array needs enough fast disks to cope with this.
  • As a corollary to the previous point, to do CDP you need at least 2x the space plus a percentage for the journal (depends on the change rate and how far back in time you want to be able to go to)
  • Additionally, you can’t present multiple clones of the data simultaneously, from different points in time – you have to do them one at a time. Could be important in some use cases.
  • Creating a full-speed-access snapshot of your data can take quite a while, again could be important in some cases.
  • Last but not least – RecoverPoint, while efficient, is still subject to the laws of physics, so if you are told you’ll get zero RPO/RTO over a multi-thousand-mile link, stop what you’re doing, email me and I’ll overnight you an industrial-strength cattleprod, gratis… which you can then use on the rep in question.

So – all I’m saying is, ask all the right questions before sending that PO over…

Avamar is a different case altogether. It’s a dedup backup appliance that dedupes the data before it’s sent over the network. It’s very efficient at doing rapid backups over poor WAN connections. You don’t have to pay per-client fees, it supports most major OSes and applications, and is fairly easy to use. However – the original use case for the product was doing centralized backups of multiple small remote sites that are connected via poor links, and it still excels in that. Doing backups of large datasets at the datacenter, on the other hand, is not really what it was designed to do, yet I see it positioned in such a way.

I also see EMC selling really, really small Avamar configs (1-2 boxes), the hope being that dedup will be so effective that it’ll all be a wash in the end. Well – deduplication, in general, is the ultimate “it depends” solution!

Here are some considerations:

  • Not all data deduplicates equally! Make sure you run the EMC dedup estimator not just on fileserver data but also on your DBs! (DBs don’t really dedupe well, and media files and in general anything compressed dedupes even worse). Make sure you really get a good sample of your data analyzed, ideally all of it if possible.
  • If the sizer and dedup tool have only been run for plain fileserver data and that’s not what you have, don’t believe anything you see…
  • Explain your desired retentions and insist you see the Avamar sizer results. A good rule of thumb is that if your data is 5TB, then even with dedupe and compression, you’ll still need about 5TB once you factor in retention, unless you’re one of those rare cases that had tremendous duplication to begin with.
  • Make sure you understand the ramifications of not going to the RAIN grid in the first place – if you get a couple of Avamar boxes they can’t be part of the RAIN architecture, and if you lose one then the entire system is down hard. If you have RAIN, you could lose an entire node and it will be OK (kinda like RAID5 for servers) but migrating from non-RAIN to RAIN is non-trivial. Ask for the details. Ideally, even if you don’t need enough capacity to go RAIN, just buy the appliances to go RAIN but don’t buy the capacity licenses (i.e. you could buy 1TB of capacity yet have 5 nodes that theoretically can have a bunch more capacity).
  • Figure out if you want fast backups or fast recovery or both, and choose product accordingly (the fastest recovery is always replication/snapshots of primary data). Remember – usually, the desired end result is to recover, not to back up!
  • Understand exactly how Avamar can go to tape – the solution is not clean and it’s excessively slow. The product is really meant for those that want to go tapeless.

That’s all I have for now.

D

 

14 Replies to “Is EMC under-sizing RecoverPoint and Avamar deals to win business?”

  1. Thanks for the post — many of us at EMC are reading it carefully.

    The initial take? No systematic issues on first glance (whether through malice or stupidity!)

    Some customers do have very limited budget, and prefer to go with an initial implementation, with idea that they’ll see how it does, and spend more if necessary. Don’t know if that game plan is always communicated to the people doing the implementation, though.

    And, despite our best efforts, sizing tools and methodologies are pretty good, but environments change dynamically, making the overall outcome an imperfect science at best.

    What’s worse: getting something too small, or overspending with something too big? Either way, it’s not good.

    Several people are going to keep an eye on this, though.

    Thanks for the post!

    — Chuck

  2. Hi Chuck,

    This was more aimed at the cases where the customer is not adequately explained the ramifications of starting small – for instance, the huge price delta between RecoverPoint SE and full-blown RecoverPoint…

    And there have been many of those cases recently, both sold and in ongoing campaigns.

    RecoverPoint is still new enough to EMC that the full effect won’t be seen for another year or two I believe. But when it hits it could be bad.

    What happens at the trenches doesn’t necessarily reach the upper echelons quickly enough, and depending on the geo/rep/manager/time-of-year/competitor, things can get real weird real quick sometimes.

    D

  3. Hi,
    Is there any way in IF based splitter can over come the full sweep in case of one path failure ?
    As in large environment replacing the SFP is everyday thing.

  4. I have a customer that is looking at purchasing Avamar and backing up to a third party DAS (Nexsan). He’s been told by a consultant that he HAS to use EMC storage to back up to. Is this correct? I’m aware of the EMC Data Store solution as an all in one appliance (server, software, storage), but if the customer just buys the Avamar software.

  5. If you want to buy avamar software and “roll your own”, then you can only buy what’s in the approved BOM – a while ago it was select Dell, IBM and HP servers and very, very limited external storage options.

  6. Hi Dimitri-
    I’ll make a few comments about the RecoverPoint portion of your post ( I haven’t read the Avamar part yet but will come back when I have more time)

    CDP for local protection is great and use case that usually requires different trade-offs. Isn’t that why Netapp bought the TOPIO solution for ~$160M? What happened to that product btw?

    Perhaps your knowledge is a little outdated – let me see if I can clarify a few things for you.

    RecoverPoint/SE had supported expandability up to 8 active nodes for some time now (not just 2 like you mention above). Pricing is extremely competitive, even compared to less capable traditional array replication methods and will continue to get even better over time. I have many customers who are hitting RPOs of

  7. Nobody’s blocking you. Maybe there’s some issue with your browser or my version of wordpress. Send it all to me via email and I’ll post the complete text.

  8. okay..trying again. I was using Firefox before. I’ll give IE a run

    Thanks for the kind words.. That’s rare and admirable from a formidable competitior

    I’ll make a few comments about the RecoverPoint portion of your post ( I haven’t read the Avamar part yet but will come back when I have more time)

    CDP for local protection is great and use case that usually requires different trade-offs. Isn’t that why Netapp bought the TOPIO solution for ~$160M? What happened to that product btw?

    Perhaps your knowledge is a little outdated – let me see if I can clarify a few things for you.

    RecoverPoint/SE now supports expandability up to 8 active nodes, not just 2 like you mention above. Pricing is extremely competitive, even to less capable array replication methods and will continue to get even better over time.

    I have many customers who are hitting RPOs of < 1 minute over async distances with relatively small pipes.  This is much harder to deliver with hourly snap and replicate approaches.  RP Data reduction we see on average with traditional workloads is 5-10X .  Each RPA supports up to 80MB sustained bandwidth.   Another huge value point of RecoverPoint is the consistency group approach, which prevents customers from having to queisce applications numerous times (usually every few hours) to achieve consistency before replication.  We recently introduced spanned consistency groups that can go across up to 4 nodes and produce an aggregate bandwidth of 320MB per consistency group.   Consistency groups also allow for federated consistency across application roles, databases and servers.  SharePoint and SAP are great examples of apps that have many moving parts and are tough to reliably and consistently stand up at the DR site.  Another thing that RecoverPoint offers (and I have several customers using) is RecoverPoint/CE or Cluster Enabler.  This plugs directly into the Microsoft cluster console and allows you to fail over your Microsoft clusters through the traditional cluster console for automated fail over.  It too, is available in the SE addition (used to be enterprise only). You are correct - CDP can add additional I/O but perhaps you're forgetting the value that it brings in terms of its  "TiVo like" granular functionality for local protection that snapshots will always struggle to provide. CDP is usually used for very critical apps, not the entire array.  The Journal also has a new feature called “snapshot consolidation” which allows users to optimize the space usage to meet a variety of protection windows - daily, weekly, monthly etc. Thanks, Jonas

  9. Thanks (obviously, as time goes by older articles will have somewhat obsolete info but at the time of writing I believe it was accurate).

    However, my article had nothing to do with whether RP is good or not – my original article on it was famous within EMC.

    No – it has all to do with not explaining all the ramifications to customers.

    You spent a tiny number of words on the IOPS hit, and I’ve recently experienced various EMC account teams selling CDP for the whole environment simply to box out the competition.

    Which is fine if that’s what the customer needs, but, again, they were not explained the performance ramifications.

    I’m sure this varies between account teams but, at least in IL, this has been my experience so far.

    But enough of this – happy Thanksgiving!

    D

  10. i have question about EMC recoverpoint (RP-HW-1U-GN4B). i have 100TB traffic for remote replication. but i dont know which license i must order and how many QTY. for example there is many license “rp-rr 1tb(0-14tb)” ,”rp-rr (61-100 TB)” ,…
    please tell me for my scenario (100tb remote replication)which license and how many QTY i need?

Leave a comment for posterity...