How Nimble Storage Systems do Block Folding

I got the idea for this post after seeing certain vendors claim they were the first and only with certain data reduction technologies (I’m not talking about dedupe and compression). I thought – how come Nimble never made a big deal about this? After all, what those vendors were claiming didn’t seem to be very interesting compared to how Nimble systems efficiently write data… yet those vendors were acting as if they’d cured a particularly virulent disease.

Executive Summary

Nimble systems naturally avoid any wasted space when writing. This is an inherent part of the design and not something that was added later.

The Problem With Being Forced to Write White Space

There are many filesystems, both on storage devices and operating systems, that have a minimum, immutable block size that can be stored on disk. Usually something like 4KB, sometimes bigger, often configurable for performance/metadata count reasons.

The challenge with some filesystems is this: If a block is partially filled, what gets stored is still the minimum size block. So, to use the 4KB example again, if I only use 1KB out of the 4KB, 4KB is still the absolute minimum that gets stored.

This is actually a big deal since in the real world there is a large percentage of partially filled blocks and/or small files, which leads to reduced storage efficiency and negates some of the benefits of deduplication and compression. It all adds up.

The following figure illustrates what happens in such a case:

Fixed block

Block Folding to the Rescue

A way around this inefficiency is to implement a technique called Block Folding. Different vendors may call this feature different names, but, in essence, it allows for multiple logical blocks to be stuffed into a single physical one, thereby reducing the amount of waste.

Here’s a figure showing what theoretically perfect Block Folding would look like for combining 4 partially filled blocks optimally into a single block:

Fixed folded

As you can see, the end result can be zero wasted space.

The Realities of Block Folding

There are always caveats, and depending on the implementation, it may not be possible to perfectly stuff any small size into a bigger size. For instance, perhaps smaller than 1KB logical units aren’t allowed, or they may have to be padded to fit in some minimum structure (padding to fit 600 bytes in 1KB, 1.2KB in 1.5K etc).

In the following example, some padding occurs, and block #4 cannot fit, and will have to be placed in the next block:

Imperfect Block Folding

With this approach, certain data patterns will defeat the folding mechanism. It’s still better than not having this feature of course, as long as it doesn’t generate too much extra overhead in the form of reduced performance, increased metadata count etc.

The ultimate goal is to achieve a very low percentage of white space after the folding is done, no matter what the incoming data is.

How HPE Nimble Systems Perform Block Folding

Yes, Nimble Systems do perform Block Folding to avoid wasted space. We just never really talked about this, thinking it’s implicit given how we’ve been doing writes from day 1. Let this be a lesson for all of us: never assume certain things are understood by everyone if you don’t spell them out.

The primary problems with how certain other systems approach Block Folding really are threefold:

  1. Trying to stuff things into too small a chunk due to legacy design reasons (“Technical Debt”).
  2. Having limitations regarding how big something can be before it can be stuffed.
  3. Challenges with garbage collection and recovery of fragmented space.

Nimble Storage systems naturally avoid these problems since they follow these simple rules:

  1. Reduce everything up front, not later.
  2. Only store real data, no white space, regardless of the incoming I/O sizes.
  3. Write always the same large size to stable media (the actual size varies since it’s optimized based on the underlying media, the example below shows 512K per device).
  4. Strict, true Log Structured layout with amazing garbage collection.

This figure shows the Nimble Block Folding process:

Nimble Block Folding

The end result is perfect Block Folding into large, always sequentially-written chunks with the bonus of lowered write amplification. Which leads to prolonged SSD life.

In customer tests, this approach provides extra capacity savings for certain workloads that are difficult to reduce for other vendors.

This is one of the reasons Nimble systems are great at data reduction, but also why they’re great at coalescing many writes into few, sequential operations, which results in high random write performance with low latency.

It’s all Relative

Excitement about a capability typically depends on whether that capability truly offers major business value. But, within a vendor’s walls, it also depends on whether the capability is pioneering in the industry, but also whether that capability gets rid of a long-standing problem that vendor has had.

Which explains why certain vendors are so proud of their new Block Folding capabilities.

However, for vendors that never had the problem to begin with, a lot of things are taken for granted. And Block Folding is a prime example of this type of capability not being marketed by certain vendors.

So – Nimble does Block Folding. There you have it. Consider it marketed.

Should we keep that name or use something fancier? Suggestions welcome 🙂


One Reply to “How Nimble Storage Systems do Block Folding”

Leave a comment for posterity...