20x Faster Time to First Token: The HPE Alletra X10000 Edge for AI

In March 2026, HPE, NVIDIA, Kamiwaza and Signal65 published a paper (check here and here) showing  about a 20x acceleration for both time-to-first-token (TTFT) and token generation rate, using an HPE Alletra X10000 to store the KV cache. The storage system used S3 over RDMA to achieve this (and is in fact the first NVIDIA-certified object storage system).

I will explain why all this is important, why it’s different from competitor numbers, plus provide some insights about what this means regarding overall system efficiency.

Because the goal isn’t just to keep GPUs busy. It’s to keep them busy generating new stuff, not recalculating old stuff.

The benefits with this solution are numerous:

  • Far more workload becomes possible but also…
  • One could approach it as a lot less infrastructure is needed due to far more efficient use of the hardware, which means…
  • Lower power and rackspace requirements, which all leads to…
  • Lower Watt/token and lower $/token.
Continue reading “20x Faster Time to First Token: The HPE Alletra X10000 Edge for AI”

HPE X10000 Deep Dive – Differentiation For Unstructured Data

At HPE Discover Barcelona 2024, HPE released the Alletra Storage MP X10000, the latest in our new line of shared hardware platform storage offerings.

It’s an innovative new platform specially made for unstructured data, and a long time in the making. This is HPE tech, not a partnership.

The initial workloads this solution is aimed at are anything requiring fast S3 performance, including AI workloads, data lakes, cloud native app development and high speed restore and backup.

It has several innovations such as RDMA for object, and is highly differentiated – plus, allows this kind of technology in a smaller possible starting capacity instead of only focusing on the huge side of the scale.

As usual, my aim is not to regurgitate basic information but rather to explain the true technical differentiation and get people excited about the possibilities on offer here. 

The summary of the X10000 benefits are:

  1. Disaggregation flexibility for separately expanding compute and/or capacity
  2. Ability to scale down and not need huge capacities to get good performance
  3. Balanced read/write performance and low latency for all workloads
  4. Flexible, fully container-based architecture that opens up tons of possibilities for running customer code inside the storage solution.

Let’s get to it:

Continue reading “HPE X10000 Deep Dive – Differentiation For Unstructured Data”