KV Cache | Recovery Monkey

In March 2026, HPE, NVIDIA, Kamiwaza and Signal65 published a paper (check here and here) showing about a 20x acceleration for both time-to-first-token (TTFT) and token generation rate, using an HPE Alletra X10000 to store the KV cache. The storage system used S3 over RDMA to achieve this (and is in fact the first NVIDIA-certified object storage system).

The test has, as of the date of this writing, the most complete benchmark disclosure of all the KV Cache competitive tests I could find. It also consists of an extremely hard workload with high concurrency.

I will explain why all this is important, why it’s different from competitor numbers, plus provide some insights about what this means regarding overall system efficiency.

Because the goal isn’t just to keep GPUs busy. It’s to keep them busy generating new stuff, not recalculating old stuff.

The benefits with this solution are numerous:

Far more workload becomes possible but also…
One could approach it as a lot less infrastructure is needed due to far more efficient use of the hardware, which means…
Lower power and rackspace requirements, which all leads to…
Lower Watt/token and lower $/token.

Tag: KV Cache

20x Faster Time to First Token: The HPE Alletra X10000 Edge for AI