June 2, 2026

Putting Code Under a Microscope: Wavelet-Based Context for LLMs

Every developer who has tried an AI coding tool is familiar with the problem of watching the model fumble with the codebase to find relevant sections to edit. Since it's not possible to load an entire codebase into the context for large projects, it greps through a few files to give it some context, and guesses what to do next. But code has a hierarchical structure with layers and boundaries. Functions sit inside classes. Classes live in files. Files make up modules. One 400-line file can contain six different conceptual areas, each with its own distinct purpose.

When a human developer reads the code, we leverage the structure when we try to understand it. We examine the way files and classes are organized, and try to find relevant logic based on that. Wouldn't it be nice if the agent could, like you, zoom in and out of the code so that it could look at the big picture, then jump directly to the exact function it needs without having to open the entire file.

That’s what WaveScope does. It’s an MCP server that uses wavelet transforms to give LLMs a multi-resolution view of the codebase. Imagine something like progressive image loading, but for source files. Even though newer models can handle large contexts on paper, what happens in practice is that they start losing focus. The more context the agent has the harder it is to figure out what it actually needs to work on, and what to prioritize leading to context rot.

Currently, there are two main ways of dealing with the problem. Grep-based search finds exact matches but ignores structure since pattern matching on individual lines can’t tell you where a class boundary is or where an error handling region starts. Embedding-based RAG is another approach which understands semantic meaning but loses position and structure. Neither gives the model a real sense of the architecture of the code.

Code Has a Rhythm

Open a file in any language like Clojure, TypeScript, Rust, or Go and you’ll see repeating common structures throughout the code. Imports sit at the top. Class and function definitions pop up at regular intervals. Indentation also has its own distinct pattern in every language. Comment blocks and blank lines are pauses in between. What if there was some way to extract these patterns, and create a structure similar to an AST without actually having to know the syntax of the language.

Luckily for us, wavelets were made for processing exactly this sort of signal by decomposing it into multiple scales at once. The transform gives you all the fine details along with the large-scale patterns at the same time. This family of algorithms is very versatile and has been used in many different fields. Seismologists use them to spot earthquakes, doctors sharpen MRI scans with them, and audio engineers separate basslines from vocals. Code structure, it turns out, is just another kind of signal that can be decomposed in a similar fashion.

So how does that actually work? Once each line has a score, the file can be treated as a 1D signal representing a sequence of numbers that rises and falls with the density of the structure. The Ricker wavelet is a little template shaped like a bump with a dip on each side. You slide it across the signal one position at a time and, at every position, measure how well the signal underneath matches that shape. A strong match means there's an elevated region sitting between two quieter regions which suggests a structural boundary. The output is a coefficient at every line scored on how much it looks like a boundary.

The trick for encoding different resolutions lies in the width of the template. You can slide the same shape stretched to many widths, each representing a different scale. A narrow wavelet only matches when the elevated region is a line or two wide, so it fires on small, sharp features. A wide one ignores line-level noise and responds to larger regions elevated relative to their surroundings, firing for big structures. So, the next trick is to run multiple widths at once to see boundaries which light up across a band of widths rather than just one. Features consistent across different scales tend to be genuine structural edges that we care about.

Since code structure nests at different sizes, each size shows up at the width that fits it. We can see a concrete example of how this works by running WaveScope on its own src/context.ts. A single line such as a lone import statement, or export class FileContext { declaration stands out over its quieter neighbors making it a sharp one-line spike that the fine, narrow wavelets lock onto. Its coefficient peaks at scale 1 or 2 and fades away at wider scales. At the other extreme is the long keyword-dense cascade inside the inferLabel method, which has a wide run of consecutive if (tokens.includes("class")), interface, enum, and struct branches. That whole region reads as one broad elevated plateau, and its coefficient climbs steadily as the wavelet widens to roughly 0.5 at scale 16, 1.0 at scale 32, 1.3 at scale 64, and about 2.3 at scale 128, where it peaks as the strongest structural response in the entire file. The biggest structure produces the strongest coarse-scale signal, which is exactly the spot you want surfaced first when you zoom out.

The line scores that feed all of this come from the same pass. In our example, export class FileContext { scores 1.6 because the class keyword weight of 1.0 and export at 0.6, the one-line get lineCount() accessor scores about 0.58, the readonly field declarations beneath it score roughly 0.08 each, while comments and blank lines around the class score a flat 0.0. The class declaration towers over its own body, the body towers over the whitespace around it, and the wavelet reads those relative heights at every width.

The intuition above is the front half of the pipeline where you score every line into a 1D signal, then slide the Ricker wavelet across it at eight scales which are 1, 2, 4, 8, 16, 32, 64, and 128 lines giving us a coefficient at every line and width. Next, we need a couple more steps to turn raw coefficient arrays into something the model can use.

First is to do multi-scale peak detection where the arrays are scanned for local maxima and ranked by magnitude. Strongest boundaries represent features such as the beginning of a class or the transition between imports and code. Because a real boundary shows up at several adjacent scales, as we saw above, these repeats can be safely collapsed into a single peak to avoid flooding the ranking with duplicates of the same spot.

The second step is the band assembly, where the peaks are separated into three broad zoom bands. The fine band at scales 1–2 shows raw source lines in a close window around the query center. The medium band at scales 4–16 tracks function and class signatures with some context around them. Finally, the coarse band at scales 32–128 compresses the whole radius into a section-level summary.

All of that processing is handled by the MCP server, and the model simply sees structured JSON with bands and peak positions without having to worry about any of the wavelet math.

What the Model Actually Gets

Let's say an agent calls query_wavelet_context which is centered on line 150 inside a 500 line TypeScript file that has some authentication logic. In this case, the fine band will have the actual lines of code being inspected. The medium band will provide a summary of lines 0–400, guided by peaks such as the imports at top and test helpers at bottom.

The model derives its knowledge of what updateUser does by paying attention to the fine band, but it also knows about authentication context from the coarse band. It's able to jump to related code by recognising class and function boundaries in the wavelet peaks without needing to see all 500 lines of text from the file.

There is also a utility called get_important_positions, which operates on the whole project. It goes through every source file, smooths out the wavelet peaks, and gives you a ranked list of the most important places in the code.

Beyond locating structural boundaries, the server can also measure how complex or irregular those structures are, using a pair of entropy analysis tools driven by a Haar discrete wavelet transform and bit-cost estimation I discussed in my last post. Ricker coefficients can be quantized at each scale using get_entropy_bands which computes their bit-plane counts with higher cost indicating more structural irregularity at that resolution. The original per-line structural signal can also be decomposed through multiple Haar levels using get_complexity_heatmap to project the entropy cost back onto a per-line irregularity score. The model can use these scores as a sort of texture channel to understand where gnarly parts of the code live. Any boilerplate regions will end up with a low score, and so they can be safely summarized or skipped, while high-entropy regions will likely contain dense logic or unusual patterns that warrant extra attention. These tools give the model a data-driven way to triage code at high level and works great for refactoring tasks where the model can easily find sections with tangled logic and break it up.

The key advantage of wavelets is that they focus on the overall structure of the code which is something that can only be inferred using tools like regex. It's worth mentioning that it's possible to get a lot of the same type of analysis using AST parsing which is even more exact. However, AST tools necessitate having a parser for each language while wavelets are completely agnostic of the semantics of the text they're applied to. They simply flag statistical regularities, and that's what makes them such a broadly applicable tool in the first place.

Wavescope’s approach strikes a happy medium between grepping and full blown AST analysis since the transform works on any 1D signal. Language awareness comes from a simple keyword-weighting layer rather than a full parser with each language just needing a 10-line config to describe its core semantics. And the whole thing is cheap to run, able to process an entire file in milliseconds to create hierarchical and multi-scale outputs. The scales happen to be a natural representation of code structure which provides the LLM with a map to navigate it.

Since the wavelet locates edge positions at all levels simultaneously, it trivially locates structural changes that even sophisticated parsers might struggle to detect. For example, a long sequence of if/else blocks looks structurally different from a class with many short methods while regions of documentation appear as valleys between peaks. The wavelet doesn't need to know what these things actually are to perceive that they are structurally different. And this often happens to be just what the model needs to figure out what to do.

By the Numbers

To illustrate the concept, I ran three realistic development tasks against WaveScope's own 14 file codebase containing just under five thousand lines of TypeScript to compare the token cost with the traditional way. "Traditional" here means what LLM coding agents actually do: grep for landmarks, read targeted chunks of code, and skim file headers to get a sample of what the structure looks like.

One common task is to understand the structure of a large file. So, I had the agent analyze index.ts weighing in at 854 lines to see where imports end, to find where the tool registrations cluster, and identify the startup code. These are your typical tasks where the approach would be to grep for export landmarks and section comments, then read the import block, find a registration example, and recognize the startup tail. That costs about 2,000 tokens producing a patchy picture which is just a heuristic. WaveScope's coarse band plus the top 15 important positions give the same structural overview using just 750 tokens with section-level boundaries that grep can't surface because it's only matching lines without the structural context around them.

Now, let's take a look at a different kind of task, which I've alluded to earlier, where we want to find tangled code that needs refactoring. A naive way would be to run wc -l to find the biggest files, then skim large chunks looking for deep nesting and high cyclomatic complexity. Such a task runs at about 5,200 tokens to read 600 lines across two files, and you would still miss tangled code in any files you didn't scan. On the other hand, WaveScope's get_complexity_heatmap flags exact per-line irregularity scores across every file you point it at. Running it on the three largest files, which are index.ts, wce.ts, and context.ts costs a mere 436 tokens. That's a whopping 92% reduction, but also surfaces precise hotspots at line 287 in handleDiffWaveletContext scoring 0.92, line 59 in FileContext with a score of 0.92, and so on. The analysis finds every section of interest across the whole codebase producing a ranked list of spots to look at closely.

Another example would be to identify which files are architecturally core versus peripheral. The traditional approach runs head -25 on every file to read imports, which ends up costing around 2,900 tokens for all 14 files. As a result, you learn what each file imports without actually knowing which ones define the project's architecture, and the model will have to spend more tokens digging deeper based on its best guesses. WaveScope's project-wide get_important_positions returns a structural-density ranking of every file in just 1,700 tokens along with meaningful rankings. Now it's clear that signal.ts tops the list with the highest keyword density per line. The heavyweight algorithmic files are wce.ts, context.ts, and index.ts which have lower average density because their structural features are spread across hundreds of lines of implementation. Other files, such as the haar.ts and file-cache.ts utilities, end up with a low ranking. Now it's clear which files need to be read first in order to understand the conceptual skeleton of the project.

Across all three tasks, WaveScope used significantly fewer tokens while producing meaningful answers that are structural rather than textual which is precisely what we were interested in. The structure provides the model with understanding of relationships within the code base that is simply not possible to do by doing regex based heuristics. On top of that, a 128K token window would burn 8% of its capacity using the traditional approach for these three tasks while WaveScope needs only 2%. And the gap scales proportionally with the size of the codebase, making the tool particularly effective for analyzing large projects.

It's also worth noting that processing all 14 files using signal extraction, Ricker CWT at 8 scales, peak detection, and band assembly all averages at just 3 milliseconds per file. And the entropy heatmap adds about 1ms per file on top. In the context of reasoning time for the model, this is effectively free to do.

Hopefully, the examples above make the value of using structural peak analysis clear already, but we can leverage the information that's already been produced even further by passing it through an entropy encoder to see just how complex those regions actually are. Running get_entropy_bands against index.ts gives a breakdown of the bit-cost estimates per wavelet scale, which is an indirect measure of structural irregularity. Here, it can be seen that the finest detail at scale costs 613 bits while the coarsest costs 322, suggesting that the file is structurally busy at the line level. A high density of handler functions and schema definitions found in a tool registration class is what's responsible for the pattern.

Next, we can call get_complexity_heatmap to take things further by back-projecting entropy onto individual lines via a Haar DWT. Doing that identifies the top irregularity hotspots in index.ts such as handleDiffWaveletContext with a score of 0.92 and handleGetCursorImportantPositions with a score of 0.90. These are non-trivial async handler functions with the most branching logic in them, and the heatmap is able to flag them without knowing anything about TypeScript! The entire heatmap costs 474 tokens as well, giving the model a data-driven triage list to focus on and to safely skip the boilerplate.

WaveScope is open source and can be found at https://github.com/yogthos/wavescope-mcp. Add it as a drop-in MCP server to give your agent a zoom lens for code.

Tags: compression llm entropy-coding programming mcp wavelets