The Architectural Benefits of HPE Alletra MP – Plus R4 Coolness

When we first released the new HPE Alletra MP platforms, I wrote a few articles going over the benefits and how the flexible new hardware platform manifests into different “personalities” for high end block and file solutions.

This time I want to take a deeper dive into the architectural benefits of our approach and how the new R4 software for Alletra MP Block enables certain things no other vendor can come close to – plus give a taste of what may be possible in the future given the amazing flexibility of the underlying architecture (it’s a blog, I can’t provide roadmaps here).

I will cover things like fractional multi-dimensional scaling (that allows things impossible with other vendors like adding a single controller node without needing to add capacity) but also resiliency in the face of simultaneous failures that would cripple all other storage systems I’m aware of. It’s not meant to be a comprehensive coverage of everything, but hopefully enough to give you a taste.

Let’s go!

R4 Efficiency and Scale Enhancements for HPE Alletra MP Block

Apart from the recently announced all-SDS offering (initially in AWS but hey, it’s just software…) there is a lot of extra goodness for everyone with the new release. Some notable benefits revolve around improved storage efficiency. Two major boosts in this area, plus a scale enhancement:

  • Enhanced compression and dedupe – more intelligence around real-time determination of how to reduce certain data means more efficiency – overall there can be double digits % benefit in many use cases. 
  • Up to 25% more usable space with certain small or large configurations, including much more efficient use of spare space and improved heuristics for certain other defaults. This one is partially my baby, enjoy πŸ™‚
  • 5.6PB max capacity per system (actual capacity, not counting data reduction).

For the rest of this article I want to focus more on more general architectural possibilities (some of which are enabled by R4, others were already there since R1, some may be coming a bit later – I will point out availability in each section if needed).

No More Controllers in HA Pairs

A fundamental aspect of this architecture is that there is no concept of HA pairs of controllers. “Classic” storage systems normally have pairs of controllers mirroring writes between them, which naturally leads to a lack of resiliency in case of failures, but also reduced flexibility since everything must be done in pairs.

Controllers do Not Matter for Write Cache Resiliency

The second fundamental technology is that all the write cache resiliency has been moved out of the controllers. It goes hand-in-hand with not having a concept of HA pairs of controllers. Ergo, losing a controller doesn’t reduce write cache integrity. Which is unlike most other storage systems. Oh, and we don’t need batteries or supercapacitors to protect write cache. Again, unlike most storage systems. The less that can break, the better off you are.

No More Controller Ownership of Disks/Shelves

The third fundamental technology is that there is no need for controllers to “own” disks or shelves. Indeed, all controllers see all disks and shelves, all the time. Again, this is unlike most storage systems that enforce a strict ownership of disks by controllers – losing both controllers in an HA pair for those systems always means losing access to that data no matter how “orderly” the loss may be.

These three fundamental principles are what allow all the flexibility and resiliency I’m about to show in the following scenarios.

Scenario 1: What if I Need just a Bit More Speed? Fractional Node Scaling!

So let’s say my storage system is getting busy but not busy enough to need upgrading to the next model up or to need 2 more controllers. I just need a bit more speed and/or headroom.

Simple: I can just add a single controller node! And since there’s no concept of having to own capacity, I can simply add the new node without having to add more capacity! The system will just balance itself and give workloads to the new controller automatically.

And in the future I can add yet another controller – again without needing to add capacity. Very clean and easy.

In all other systems I’m aware of I would normally need to either:

  1. Replace both controllers with faster ones (and what if there exist no faster ones?) or
  2. Add two new controllers plus disk, and then have to  balance things and deal with the extra complexity and hassle. What if I have enough space? Why should I be forced to buy more capacity? (this is also the problem with traditional grid-type HCI solutions)
  3. Or in the case of monolithic systems, replace everything πŸ™‚

All of those options are more expensive and time consuming than what HPE provides with Alletra MP Block. This granular node addition ability is available in the R4 release and is a unique ability.

Scenario 2: I Need to Add Some Capacity

Adding capacity is simple and generally the least interesting thing in storage – the major difference with Alletra MP Block compared to other systems is that the capacity doesn’t belong to any specific controller, I just need to plug it into the fabric and all controllers now happily and automatically share the new space. No need to worry about balancing things or assigning anything to anything. All this is already available, just listing it for completeness.

Scenario 3: I Need to Add Even More Speed but Not More Capacity

If I need even more speed later on, I can just add even more controllers – again without having to add more capacity. This is is a natural evolution of Scenario 1. Going to more than 4 nodes is possible by the architecture and planned for later.

And let’s start exploring problems now: What if I lose a whole shelf? No problem!

The ability to survive a whole disk shelf loss is already available, you can do this with as few as 3 shelves without resorting to wasteful mirroring. You can also protect against dual simultaneous drive shelf failure if you have 8 or more shelves!

Scenario 4: My Hypothetical Problems Continue! What if I Lose Multiple Nodes SIMULTANEOUSLY?

This one is the bane of storage systems, regardless of how fancy they may think they are. Several vendors will claim being able to lose multiple controllers, but since they still rely on mirroring things, they don’t tell you they can’t lose ANY 2+ controllers SIMULTANEOUSLY.

The “ANY” and “SIMULTANEOUSLY” words are emphasized on purpose. Truly simultaneous loss of the wrong two controllers would be catastrophic in most systems I’m aware of (actually all I’m aware of but I can’t claim I know everything so I’m happy to be corrected). That’s the big downside of mirroring cache between two nodes πŸ™‚

For this level of protection to happen properly there need to exist the right cluster conditions of course – (N/2)-1 nodes can be lost at the same time for Alletra MP Block. This rule exists to avoid cluster split brain issues. So in a 6-node cluster, ANY 2 nodes can be lost SIMULTANEOUSLY, per this example:

Note that the max cluster size with the R4 release of Alletra MP Block is 4 nodes, more than 4 nodes is possible by the architecture and planned for a future release. But the (N/2)-1 code is already in R4, waiting for increased node counts to be certified.

Conceptually, the architecture allows arbitrary node counts – so if in the future we do, say, 8 node clusters, (8/2)-1=3 so ANY 3 nodes could be TRULY SIMULTANEOUSLY lost without issues, and so on and so forth, as cluster size increases (which is why I’m providing the formula, so you can work out on your own the various scenarios πŸ™‚ )

Note that even with 7 nodes you could still lose 3 controllers simultaneously, since you’d still have a majority available (4 more). Same thing with 5 nodes – you could lose 2 simultaneously. Just mentioning for clarity since you can have odd node counts – we don’t force even numbers πŸ˜‰

And in case there is any confusion: “Simultaneous” means losing more than 1 node before the cluster has a chance to “calm down”. So you lose 1 node, then before the cluster can stabilize again, you lose another one (most clusters take some time to stabilize, there’s no such thing as “instant” in clusters).

I’ve seen this failure happen with systems that claim multiple node loss protection – they really mean rolling node loss protection, not simultaneous.

What I’m describing in Scenario 4 is losing things with the worst possible timing πŸ™‚

If a competitor says “who needs that anyway, buy our cheap stuff instead”, that’s a bit disingenuous since the art of enterprise storage is precisely about providing protection and flexibility for as many use cases as possible. We must always strive to do better.

Scenario 5: I am Losing All my Controllers, One after Another (Rolling Failure). The Sky is Falling!

Imagine you simultaneously lost a couple of controllers and now you’re gradually losing all others – but it’s happening slowly, giving the cluster a chance to breathe. Perhaps due to environmental reasons.

We now allow N/2 rolling failures in R4, and later may potentially allow more.

Admittedly it’s not a very probable scenario for most customers, but it may help in situations where you need a system to stay up in horrible conditions without the possibility of spares arriving on time or maybe ever, in some deployments.

Add our Active Peer Persistence clustering and you could engineer a system that can survive not just a site disaster but also further massive failures in each site – multiple controllers, whole disk shelves, etc. – and never suffering downtime.

Scenario 6: Replace Any Number of Controllers with Dissimilar Types for Non-Disruptive Tech Refresh & Lifecycle

Another thing that’s impossible for other architectures: replacing any number (including odd numbers) of controllers with dissimilar ones. With the Alletra MP architecture that’s something possible since there are no node pairs to worry about and you may have an odd number of nodes anyway (3, 5 etc) so why not? πŸ™‚

The disaggregated, shared everything architecture makes lifecycle refreshes easier since everything is modular and nothing belongs to anything. Why be shackled to a specific enclosure type?

This is not yet available in R4, it’s coming in a later release, but I’m including it for completeness in describing the architectural benefits.

All this Flexibility and Resiliency – Possible from Small to Huge Solutions

This architecture can start small and scale big (systems with as little as 15TB and as much as 5.6PB). Maybe you don’t need all this ability up front – that’s OK. But you can feel safe knowing that if you need it in the future, you can do those things, with investment protection, instead of being trapped with architectures that can never do them and force you to do forklift upgrades.

I didn’t even cover other cool stuff like manageability or the ease of migrations from older gear (totally non-disruptive) but this post is already long enough. Until next time!

D