20x Faster Time to First Token: The HPE Alletra X10000 Edge for AI

In March 2026, HPE, NVIDIA, Kamiwaza and Signal65 published a paper (check here and here) showing  about a 20x acceleration for both time-to-first-token (TTFT) and token generation rate, using an HPE Alletra X10000 to store the KV cache. The storage system used S3 over RDMA to achieve this (and is in fact the first NVIDIA-certified object storage system).

The test has, as of the date of this writing, the most complete benchmark disclosure of all the KV Cache competitive tests I could find. It also consists of an extremely hard workload with high concurrency.

I will explain why all this is important, why it’s different from competitor numbers, plus provide some insights about what this means regarding overall system efficiency.

Because the goal isn’t just to keep GPUs busy. It’s to keep them busy generating new stuff, not recalculating old stuff.

The benefits with this solution are numerous:

  • Far more workload becomes possible but also…
  • One could approach it as a lot less infrastructure is needed due to far more efficient use of the hardware, which means…
  • Lower power and rackspace requirements, which all leads to…
  • Lower Watt/token and lower $/token.
Continue reading “20x Faster Time to First Token: The HPE Alletra X10000 Edge for AI”

The Architectural Benefits of HPE Alletra MP – Plus R4 Coolness

When we first released the new HPE Alletra MP platforms, I wrote a few articles going over the benefits and how the flexible new hardware platform manifests into different “personalities” for high end block and file solutions.

This time I want to take a deeper dive into the architectural benefits of our approach and how the new R4 software for Alletra MP Block enables certain things no other vendor can come close to – plus give a taste of what may be possible in the future given the amazing flexibility of the underlying architecture (it’s a blog, I can’t provide roadmaps here).

I will cover things like fractional multi-dimensional scaling (that allows things impossible with other vendors like adding a single controller node without needing to add capacity) but also resiliency in the face of simultaneous failures that would cripple all other storage systems I’m aware of. It’s not meant to be a comprehensive coverage of everything, but hopefully enough to give you a taste.

Let’s go!

Continue reading “The Architectural Benefits of HPE Alletra MP – Plus R4 Coolness”

HPE GreenLake for File Storage

A critical part of the recent April 4th, 2023 announcements from HPE Storage was the scale-out HPE GreenLake for File Storage.

For the foundational piece explaining the common hardware between the various offerings please go here. For the Block storage piece, here.

The new HPE File offering is based on the HPE Alletra Storage MP hardware, and uses a common management interface for both File and Block, providing a seamless, centralized, multiprotocol management experience.

For the people that like looking at boxes, a small one would look like this:

A Small HPE GreenLake for File Storage System – Compute Separate from Capacity
Continue reading “HPE GreenLake for File Storage”

Beware of Cloud Sizing Tools and avoid Reliability Angst

Some time ago I wrote about the dangers of taking certain things for granted with new technologies.

This time I wanted to use a more specific enterprise application example to show that customers need to be extra careful when comparing solutions, especially for mission-critical apps.

Sometimes being too high-level means missing the unspeakable horrors lurking under the covers. And ignorance doesn’t mean bliss… just nasty surprises.

To summarize: Avoid bait-and-switch so you avoid surprise costs and pain.

  • Ensure all components in any sizing tools reflect your business requirements.
  • The simpler the infrastructure, the more reliable. If one must do things like stripe across many volumes in order to get decent performance even for medium-sized solutions, then that may be a warning sign that the solution is lacking.
  • Ensure all the underlying components in any pricing you see would fit your company’s mission-critical needs. For instance – what reliability and resiliency are the storage components rated for? And is that sufficient for your needs?
  • Ensure you are accounting for the right number of systems (Production-spec vs non, times number of applications, etc). This can quickly add up with certain apps.

Continue reading “Beware of Cloud Sizing Tools and avoid Reliability Angst”

HPE Alletra 9000 – Primera Evolved

I’m excited to announce that the evolution of the HPE Primera (which was the evolution of 3PAR) is now available.

It’s called the HPE Alletra 9000 and is the mission-critical Tier-0 complement to the Tier-1 Alletra 6000 (which in turn is the evolution of Nimble).

It retains the rich feature set of Primera and the 100% uptime guarantee. The main enhancement vs Primera is the increased speeds, and the fact that all the performance is possible in just a 4U configuration, making it the most performance-dense full-feature Tier 0 system in the world (by far). It is managed via the HPE Data Services Cloud Console.

A welcome enhancement (that is also coming to Primera) is that of Active Peer Persistence, which allows a LUN to be simultaneously read from and written to from two sites synchronously replicating. This means that each site can do local writes to a sync replicated LUN without the hosts needing to cross the network to the other site.

Continue reading “HPE Alletra 9000 – Primera Evolved”