20x Faster Time to First Token: The HPE Alletra X10000 Edge for AI

In March 2026, HPE, NVIDIA, Kamiwaza and Signal65 published a paper (check here and here) showing  about a 20x acceleration for both time-to-first-token (TTFT) and token generation rate, using an HPE Alletra X10000 to store the KV cache. The storage system used S3 over RDMA to achieve this (and is in fact the first NVIDIA-certified object storage system).

The test has, as of the date of this writing, the most complete benchmark disclosure of all the KV Cache competitive tests I could find. It also consists of an extremely hard workload with high concurrency.

I will explain why all this is important, why it’s different from competitor numbers, plus provide some insights about what this means regarding overall system efficiency.

Because the goal isn’t just to keep GPUs busy. It’s to keep them busy generating new stuff, not recalculating old stuff.

The benefits with this solution are numerous:

  • Far more workload becomes possible but also…
  • One could approach it as a lot less infrastructure is needed due to far more efficient use of the hardware, which means…
  • Lower power and rackspace requirements, which all leads to…
  • Lower Watt/token and lower $/token.
Continue reading “20x Faster Time to First Token: The HPE Alletra X10000 Edge for AI”

HPE X10000 Deep Dive – Differentiation For Unstructured Data

At HPE Discover Barcelona 2024, HPE released the Alletra Storage MP X10000, the latest in our new line of shared hardware platform storage offerings.

It’s an innovative new platform specially made for unstructured data, and a long time in the making. This is HPE tech, not a partnership.

The initial workloads this solution is aimed at are anything requiring fast S3 performance, including AI workloads, data lakes, cloud native app development and high speed restore and backup.

It has several innovations such as RDMA for object, and is highly differentiated – plus, allows this kind of technology in a smaller possible starting capacity instead of only focusing on the huge side of the scale.

As usual, my aim is not to regurgitate basic information but rather to explain the true technical differentiation and get people excited about the possibilities on offer here. 

The summary of the X10000 benefits are:

  1. Disaggregation flexibility for separately expanding compute and/or capacity
  2. Ability to scale down and not need huge capacities to get good performance
  3. Balanced read/write performance and low latency for all workloads
  4. Flexible, fully container-based architecture that opens up tons of possibilities for running customer code inside the storage solution.

Let’s get to it:

Continue reading “HPE X10000 Deep Dive – Differentiation For Unstructured Data”

The Loss of Important Knowledge and Acumen Through Perceived Commoditization

I posit that we now have a whole new class of consumer that is completely oblivious to certain hitherto fundamental concepts – and this can lead to poor business decisions and overall sub-optimal execution and results.

I got the idea after a discussion with an ex colleague (that’s now working for a cloud vendor) where he proudly proclaimed that infrastructure is unimportant and uninteresting.

I’ll start generically and shift to IT. The generic aspect of this problem is very interesting, since it’s lowering quality in all sorts of fields.

And never forget: Just because something is widely and easily available doesn’t mean it’s better. It simply means that more people have access to it.

Continue reading “The Loss of Important Knowledge and Acumen Through Perceived Commoditization”

HPE Memory-Driven Architectures Extend to 3PAR and Nimble Storage

HPE has been innovating in the Memory-Driven Compute space for a while now (for example, HPE Labs’ The Machine project and Gen-Z ).

The driver behind this has been to transform application performance, not by increments but by leaps and bounds. Think orders of magnitude in reduction of execution time. For instance, at an organization performing Alzheimer’s cure research, they had a certain key analytics operation that took 22 minutes for each iteration (and they need to do many, many iterations). With a Memory-Driven system from HPE it now takes 13 seconds. This allows the researchers to reach useful results much faster – which, in turn, means the cure could materialize in a much shorter timeframe.

Continue reading “HPE Memory-Driven Architectures Extend to 3PAR and Nimble Storage”

Media is Not Created Equal and NVMe is Just a Protocol

In this era of over-marketing and misinformation, it can be refreshing to clarify things for customers.

Allow me to be refreshing regarding NVMe 🙂

NVMe is simply a protocol. Just like SCSI is a protocol. NVMe is most assuredly not a media type. Yet, storage vendors keep talking about “NVMe drives” and customers often think those devices are equal as long as “NVMe” is mentioned.

Alas, that’s not how things work…

Strictly speaking, there’s no such thing as an NVMe drive. Or, at the very least, calling something an “NVMe drive” isn’t enough to describe what that media is, and it’s especially not enough to describe how fast it may be.

Continue reading “Media is Not Created Equal and NVMe is Just a Protocol”