Ransomware Detection Failure – Where does Technology Fail

This is a generic, foundational article to support related materials.

Modern ransomware that encrypts your data tries to evade detection – since that’s how the hackers maximize their payout.

The way they evade detection is by doing 2 main things:

  1. Not encrypt everything – that evades detectors that look for massive changes in your data. So they might encrypt just a tiny bit in each file.
  2. Encrypt in a way that doesn’t look encrypted ๐Ÿ™‚

#2 is the interesting one and we will focus on that one in this article since that’s where most encryption detectors fail.

Why do detectors fail to detect data that doesn’t look encrypted?

Detectors that look for encryption measure entropy (randomness). They typically have a fixed threshold for declaring something encrypted.

So – if something encodes data in a way that doesn‘t look very random, and can even be compressed – the detectors will fail to see the data as encrypted. How could it be encrypted if it can still be compressed? ๐Ÿ™‚

Example time!

Let’s take a file from the Silesia corpus – a common resource used to benchmark data reduction (you can grab it from multiple places, here’s one: https://github.com/MiloszKrajewski/SilesiaCorpus

I used the Webster dictionary file.

First, let’s compress it to see how compressible it is. I’m using zstd as the algorithm (there are other ones but in storage arrays we tend to use stuff that’s a balance between fast and efficient and zstd is good at both). You can grab it here: https://github.com/facebook/zstd

zstd webster
webster: 29.20%   (39.5 MiB =>   11.5 MiB, webster.zst)

So after compression, it’s 29.20% of the original size. So – nicely compressible.

Then, I encoded the source file using the common base64 (one of the techniques used by some ransomware to make data not look encrypted).

In Windows you can use the certutil utility to do this, in UNIX there’s also a built-in base64 command.

certutil -encode webster webster.b64
Input Length = 41458703
Output Length = 57005774
CertUtil: -encode command completed successfully.

Now let’s compress the encoded file and see if it gets very different compression:

zstd webster.b64
webster.b64: 31.77%   (54.4 MiB =>   17.3 MiB, webster.b64.zst)

It’s a bit lower reduction but still highly compressible even after encoding! And yes, the file gets larger due to the encoding, but real ransomware wouldn’t encode the whole file anyway, just bits and pieces – not enough to significantly inflate the capacity of the storage.

The main point here is that I encoded something and only slightly changed how compressible it is.

OK. Now what if I wanted to simulate a ransomware detector? We need something that detects entropy.

I found this utility: Releases ยท merces/entropy

It calculates the entropy of a file. 8 means maximum entropy.

Let’s compare our files so far:

Uncompressed file entropy:

entropy webster
4.97 webster

Encoded file entropy:

entropy webster.b64
5.77 webster.b64

So – both these files compress at over 3:1, and their entropy is (predictably) not very different.

Let’s check the entropy of the compressed file:

entropy webster.zst
8.00 webster.zst

Of course, the compressed file has maximum entropy.

So what does that mean for a typical detector?

If a detector has a fixed threshold above which it assumes a piece of data is encrypted (and therefore it has to take action by alerting you) – what threshold do you think it’s using? ๐Ÿ™‚

Indeed: most detectors have a fixed threshold, so in this example it might be anything over 7.5 or exactly 8.

So if pieces of your file went from a much lower entropy to all of a sudden having maximum entropy, they’d count as encrypted.

And they’d definitely not count as encrypted if their compressibility or entropy only minimally changed. Way too many false positives that way.

There are many obvious problems with that detection approach, yet, sadly, this is how most detectors work!

A picture to put all this into perspective:

Call to Action

When comparing solutions that detect encryption, ensure you find out whether they’re intelligent enough to detect modern, sneaky ransomware that encrypts sporadically and with techniques that make the data not look encrypted.

If the detector only detects older ransomware, what use is it?

D