HPE GreenLake for File Storage

A critical part of the recent April 4^th, 2023 announcements from HPE Storage was the scale-out HPE GreenLake for File Storage.

For the foundational piece explaining the common hardware between the various offerings please go here. For the Block storage piece, here.

The new HPE File offering is based on the HPE Alletra Storage MP hardware, and uses a common management interface for both File and Block, providing a seamless, centralized, multiprotocol management experience.

For the people that like looking at boxes, a small one would look like this:

A Small HPE GreenLake for File Storage System – Compute Separate from Capacity

What is it Good For?

NAS can mean different things to different people.

There’s this non-exhaustive list for example…

Low latency (metadata) NAS
Low latency (operations) NAS
Huge throughput NAS
Huge capacity NAS
Huge file count NAS
Huge IOPS NAS
Space-efficient NAS
Flexible NAS
Easily scalable NAS
General purpose NAS for small requirements like home directories for a small number of users, where one doesn’t really care about any of the above.

Most NAS offerings focus more on certain areas vs others and there doesn’t exist a single offering out there that can cover every use case.

For instance, there are some solutions that use embedded NAS for small, low performance requirements (typically seen in primarily Block products like Dell EMC PowerStore or Pure FlashArray), or dedicated NAS for large scale workloads, like VAST Data, Weka, Qumulo, Scality or Dell EMC PowerScale.

Some would ask – why not build a super-duper unified offering that can satisfy all protocols? Well… Even my alma mater, NetApp, who tries to do almost everything with ONTAP, sees most customers deploying big ONTAP clusters for large file workloads typically not using it for multiple protocols. Instead, they dedicate those clusters to NAS duties.

It’s really hard to excel at everything. An efficient system perfect for all workloads, reasonably priced, that didn’t make sacrifices around usability, reliability and supportability? It would own 100% of the market. But such a system, quite simply, doesn’t exist.

At HPE, we did some forward-looking analysis and opted to satisfy the (by far) fastest growing market that is typically after higher capacities and speeds, for workloads such as:

ML/AI
Analytics
Seismic
Stock Trading
Fraud Detection
Genomics
Data Warehouses
NoSQL Data Lakes

That’s not to say other workloads aren’t important, but we see the biggest growth and opportunity to help customers in the areas above.

A Solid Foundation

It is very hard to build enterprise NAS, especially if one desires certain attributes.

For instance, the go-forward architecture for HPE Storage in general is to be able to scale using a Disaggregated, Shared Everything (DASE) scale-out architecture, since it provides the most flexibility and allows customers to easily scale compute and capacity independently, plus provides the ability to mix and match components and easily upgrade separate pieces.

For the Block MP offering we developed our code all in-house, but for the File MP offering, we decided to OEM the VAST Data software, since their architecture matched our architecture vision, in addition to being amazingly capable, resilient, efficient, and able to easily satisfy our target workloads.

Other offerings from the competition, like Dell EMC’s PowerScale (the artist formerly known as Isilon) or Pure’s FlashBlade, follow the architecture of having capacity per node, similar to how HCI works – and susceptible to similar challenges:

Lose the node, or even upgrade it – you lose the capacity on that node, and any resiliency tied to that capacity
No ability to have dissimilar type compute and/or capacity nodes
No ability to scale things independently
Poor data reduction due to the complexities of handling cluster-wide metadata
No ability to share hardware with other solutions (for example, a Pure FlashBlade can’t share hardware with a Pure FlashArray, and one can’t ever convert from one to another if there’s some large excess capacity available).

Extreme Data Integrity with the Highest Efficiency

I will not cover all aspects of the File MP architecture, but will instead focus on some of the cooler bits.

One of the most interesting capabilities is how efficient, yet resilient, it is.

Locally Decodable Erasure Codes

For instance, one can have 146+4 protection, meaning the ability to withstand the simultaneous loss of any 4 devices, yet only waste 2.7% of capacity to provide this level of protection. In MTTDL terms (Mean Time To Data Loss) – tens of millions of years, firmly in the “one less thing to worry about” category.

This technique not only allows high efficiency, but it also helps recover from any faults extremely quickly.

Another significant advantage is hugely improved performance in failure mode scenarios, since a lot less operations are needed to recover from problems compared to traditional Reed-Solomon encoding (a 146+4 RAID set would be absolutely insane for any other architecture, which is why nobody else is doing it).

The locally decodable erasure codes allow recovery using 1/P the amount of data vs Reed Solomon codes (where P is the protection “parity”). So 1/4th in the case of 4 Parity elements.

This is unprecedented in the storage world, and is one of two main methods to help customers maximize their investment.

Similarity-Based Data Reduction

The second method to maximize the use of capacity is Similarity-Based Data Reduction.

The challenge with normal deduplication and compression is that they work within fairly limited domains. Deduplication, even variable-block, looks at finite blocks and compares them.

Compression normally uses a “window” – that can range anywhere from a few KB to MB, or even whole files, but it can’t look at data across an entire system.

That’s where Similarity-Based Data Reduction comes in.

The order of operations is as follows:

Adaptive chunking (variable at byte-level granularity) intelligently separates the data into blocks
Global deduplication is then performed across the chunks
The unique part of the solution: A Similarity hash then finds chunks of data that are similar enough to each other.
Data-aware compression is then performed, which separates the common parts of the similar blocks from their unique deltas, and uses appropriate algorithms for different kinds of data. This allows compression to become incredibly more efficient than any other method and allows the compression “window” to transcend things like blocks or files and instead span an entire storage system – which allows to find commonality where no other system could.

This approach allows reduction even for data that normally can’t be reduced on other storage, like already compressed and deduped backups, video, compressed logs, etc.

Here are a couple of GIFs showing in high level the data reduction process, taken from some of the product slides:

How Similarity is Found Across the System

What HPE has Added

One might ask what, apart from the fancy MP hardware, is HPE bringing to the table with the VAST Data partnership?

The short answer is, plenty. This wasn’t “just” a case of making enough changes to allow the VAST software to run on HPE hardware. Significant extra work had to be done, in two major places:

The ability to be managed from the same portal as the other HPE solutions. This means that storage (all protocols), networking and servers, plus data services like Backup and DR are all handled by the same management plane. This is unique in the industry, either because vendors that don’t make servers and networking don’t have more than one product, or because vendors that do make everything are incredibly behind on this journey. The opportunities for cross-stack automation, ease of support and sheer convenience are just staggering.
The support experience is unified. This means that things like the cool Wellness & Support Automation from Nimble is also being used for both the new File and Block, which opens up some really amazing possibilities, and maintains our differentiation in the area of proactive AI-based support automation.

And of course, since HPE makes the complete stack, troubleshooting becomes easier.

A further change HPE made was more powerful compute nodes, allowing a higher Compute:JBOF ratio. This helps achieve higher densities with less hardware without sacrificing performance.

Some other practical benefits of course include HPE’s global reach, logistics and finance, plus existing agreements with clients and existing compliance with various standards, making it simpler to acquire this technology than certifying another vendor.

Regarding management – here’s a video showing a quick demo of the HPE interface. One can manage both File and Block from the same place, plus all the other goodies like backups, DR etc. – with more to come.

In Conclusion

The new HPE GreenLake for File Storage is HPE’s go-to solution for customers looking at accelerating large File workloads. It provides unprecedented resiliency and the best capacity efficiency of any storage platform on the market, which allows huge scale, all-flash performance at a reasonable cost, even for workloads that normally can’t be data reduced by other technologies.