May 24, 2026

libwce: the entropy layer of a wavelet codec, on its own

Most image codecs you know about such as JPEG, JPEG 2000, JPEG XS, WebP are like layer cakes. You have transform sitting on top, entropy coding at the bottom, and rate control floats somewhere in the middle. And then there's a metadata layer wrapping it all up. The interesting bits are hidden under tons of framing code, profile parsers, and standards plumbing. If you just want to see how wavelet coefficients become bits, you have to dig deep into the guts of the codec.

I wrote libwce as a bare-bones implementation, consisting of only a single lib.rs file, weighing in at 500 lines. It just implements a patent-clean Bit-Plane Count (BPC)-style entropy layer in the spirit of JPEG XS, and nothing else. There is no boilerplate or dependencies with the library relying solely on stdlib.

A two-minute primer on BPC coding

A raw video stream is basically a grid of pixels, most of which share very similar color and brightness values with their immediate neighbors. Storing every pixel individually wastes a ton of bandwidth since there is a lot of repeated data in the stream. Codecs are used to compress this information by transforming the image from the spatial domain into the frequency domain. Rather than tracking individual pixels, a codec uses mathematical frequencies to describe color changes across the image. Older formats like standard JPEG end up chopping the image into squares and applying a discrete cosine transform, leading to the blocky artifacts we all know and love.

A wavelet is a newer approach that solves the problem by applying the transform process to the whole image at once, splitting the signal into low-frequency structural data and high-frequency detail data across multiple scales. After the wavelet transform, you end up with a 2D array of signed integer coefficients, most of which are near zero, with a long Laplacian tail. The purpose of the entropy layer is to compress this array down to a small number of significant bits.

BPC coding is done using groups of four coefficients at a time. For each group, you have to determine the smallest bpc such that every coefficient can be held. This is the bit-plane count representing the index above which all coefficient bits in the group are zero. In libwce, all the bpc values are written first into a single bitstream, then for each group the four coefficients are emitted coeff-major. These are the magnitude bits of each coefficient followed immediately by a single sign bit when that coefficient is nonzero. That takes care of all the data processing you need to do. Then, you get to the actual compression when you go to encode these bpc values. Neighboring groups tend to have similar sizes, so instead of writing each bpc as a raw 6-bit number, you can estimate it from its neighbors and, instead, write a small residual which tends to be tiny.

Here, libwce uses RUNNING (DPCM delta vs the previous group's bpc, zigzag-mapped and Rice-coded) and ZERO (unsigned residual against lossy_bits) predictors which can be optionally combined with a 1-bit-per-8-group sparse-block flag that short-circuits all-deadzone blocks. That leaves you with four predictor × flag combinations, and the encoder sweeps Rice-k across seven values inside each, picking the best per band via a single-pass cost search. All combinations give the same decoded result, but they produce different types of bitstreams. Each one works best for different pixel distribution such as textured regions, flat parts, or sub-bands which are mostly zeros.

What it looks like to use

Here's a complete decoder for one sub-band:

let mut coeffs = vec![0i32; N];
let lossy_bits = decode(buf, &mut coeffs).unwrap();
dequantize_optimal(&mut coeffs, lossy_bits, scale_b);

The library itself is stateless, and only works with whatever buffers you provide. It doesn't use I/O or hidden globals, and works purely through caller-owned buffers (a small BPC scratch buffer is allocated internally).

Compressing an image end-to-end

The repo has 3 demos. The most fun one is image_compress, which is a full codec built on top of libwce. It uses Haar wavelet in, libwce in the middle, and inverse Haar on the way out which run across four quality presets.

  preset          lossy_bits      payload     .wce file    ratio    PSNR
                  LL  HL  LH  HH    bytes        bytes
  near-lossless    2   4   4   5    146537       146597    1.52x   49.06 dB
  balanced         4   6   6   7     92631        92691    2.40x   37.54 dB
  aggressive       6   8   8   9     49516        49576    4.48x   28.79 dB
  very lossy       8  10  10  11     21923        21983   10.11x   21.62 dB

The whole process consisting of DWT, sub-band coding, quantization, and writing to a container takes under 500 lines of code. If you open the four reconstituted PGMs side by side and you'll see quality degrade as compression increases. At q1, the image will be indistinguishable from the original; q2 has minor smoothing in flat areas; q3 starts to show noticeable wavelet ringing around edges; and q4 is blocky in a recognizable wavelet way, looking eldritch but still legible.

The second demo, mode_shootout, runs a synthetic Laplacian sub-band through every predictor × flag combination and displays the winner.

  mode             total   ratio   ok
  --------------   -----  ------   --
  RUN, flag=off      658   12.45x   Y
  RUN, flag=on       666   12.30x   Y
  ZERO, flag=off     652   12.56x   Y
  ZERO, flag=on      660   12.41x   Y
  auto-pick          612   13.39x   Y

  best forced: ZERO, flag=off  (652 bytes)
  auto-pick beat best forced by 40 bytes (better rice_k).

This is precisely the kind of thing that's a pain to do within the confines of a full codec, where you’d have to fiddle with instrumenting internals, disable rate control, and then mock the framing layer. With libwce, mode comparison is just how the API works. You use the same sub-band through encode_with_options with each predictor × flag combination, then count the bytes and pick the winner, which is exactly what encode itself does internally.

The third demo, stream_surgery, does 256 random bitflips and 256 random byte scrambles across the encoded bitstream, 300 truncation points covering every 4-byte prefix, and a set of adversarial cases including all-ones “unary bombs” along with crafted bad headers.

  bit-flip (anywhere)           : 256/256 returned, avg 36 / max 1024 coeffs differ (of 1024)
  random byte (anywhere)        : 256/256 returned without crash
  truncation (every prefix)     : 300/300 prefix lengths returned
  adversarial (bombs + bad hdrs): 7 cases returned cleanly

The demo shows how every case gets successfully decoded without any hangups or a crash.

What it isn't

Finally, it's worth reiterating that I intentionally didn't write libwce to be a full codec implementation, which would necessitate adding a container format, rate control, and other plumbing. It's designed to illustrate how the most conceptually interesting layer of a mezzanine codec works and to make it easier to study and modify without the weight of the full codec around it. What you get is just the entropy layer that you can wire into your own pipeline.

The repo is at https://github.com/yogthos/libwce. Clone it and play with it. It's written with readability in mind.


Tags: compression rust entropy-coding programming