← teaching

Overcomplete Dictionaries
& Sparse Coding

why more atoms than dimensions enables sparser, more robust representations


§ 1

The problem: representing a signal

Suppose you want to describe a signal — a pattern of light on your retina, or a neural activity pattern — as a weighted sum of simple building blocks called atoms. You scale each atom by a coefficient and add them up to reconstruct the signal.

s  ≈  c₁ · a₁  +  c₂ · a₂  +  …  +  cₙ · aₙ

The coefficients c are what you must store or transmit — they are the representation. The question is: how do you choose your atoms so the coefficients are as simple, as sparse, as possible?

Sparse means most coefficients are zero. Only a handful of atoms are active at once. This is metabolically cheap for neurons, and it makes the code easier to read out downstream.

With a complete basis — exactly as many atoms as dimensions — every signal has a unique representation. But uniqueness is not the same as sparsity. Drag the red dot below to see: the x/y basis rarely yields a sparse code.

complete basis — 2 atoms in 2D — drag the signal

Both bars are almost always nonzero. The x/y basis cannot produce a sparse code for a diagonal signal.


§ 2

Overcomplete dictionary

Now add a third atom at 60°. We have 3 atoms in 2D — more than strictly necessary. Representations are no longer unique, but that redundancy is a feature: any signal is now close to some atom, so sparse representations become achievable. Toggle between dense and sparse solutions below.

overcomplete dictionary — 3 atoms in 2D — drag the signal

In sparse mode, the algorithm uses only the single best-matching atom. Compare the bar heights to the complete basis above.


§ 3

The geometry of sparsity

Here is the core geometric insight. Each atom defines a direction in signal space. Sparsity means your signal is close to one of those directions — it projects strongly onto one atom and weakly onto all others.

With a complete basis, the directions are fixed and sparse. A signal pointing between two atoms must use both. With an overcomplete dictionary you pack more directions into the same space, so any signal is likely to be close to some atom. Increase the atom count below and watch the worst-case gap shrink.

atom directions & angular coverage

The shaded wedge is the worst-case angle to the nearest atom. More atoms → smaller wedge → any signal projects strongly onto one atom → sparser code.

Minimum projection onto the best atom = cos(π/n). At n = 2 (complete): cos(90°) = 0 — the worst-case signal is orthogonal to both atoms. At n = 8: cos(22.5°) ≈ 0.92 — almost any signal projects strongly onto one atom.

§ 4

Matching pursuit — step by step

How does a brain (or algorithm) actually find the sparse representation? One elegant method is Matching Pursuit — a greedy algorithm that proceeds exactly as intuition suggests: at each step, find the atom most correlated with what remains unexplained (the residual), subtract its contribution, and repeat.

matching pursuit — 8-dimensional signal — 12-atom overcomplete dictionary
residual energy remaining
coefficients used so far

Notice how the residual (red) collapses rapidly in the first few steps and then flattens. This is characteristic of sparse signals in overcomplete dictionaries: a few atoms capture the bulk of the energy, and the rest is noise.


§ 5

Sparsity vs. reconstruction error

There is a tradeoff: using fewer atoms is cheaper but less accurate. The key advantage of an overcomplete dictionary is that it reaches lower error at the same atom count — because its denser coverage of directions means any signal is closer to some atom from the start.

atoms used vs. reconstruction error — drag to explore different signals

Blue = overcomplete (8 atoms). Red = complete (2 atoms). The gap is largest when the signal falls between the complete basis directions.


§ 6

Grid cells as an overcomplete spatial dictionary

Now lift this to neural coding. A grid cell's firing rate map is a sum of three cosines at 60° offsets — a hexagonally periodic atom in 2D space. The full grid cell population provides atoms at multiple phases and spatial scales (modules), forming an overcomplete dictionary for position.

r(x,y)  =  cos(2π f₁·x + φ₁)  +  cos(2π f₂·x + φ₂)  +  cos(2π f₃·x + φ₃)

Different cells within a module have different phases (φ), tiling all possible locations. Different modules have different spatial frequencies, roughly scaling by a factor of 1.4. Click any atom below to expand it.

hexagonal grid atoms — each panel is one grid cell’s firing rate map
Place cells emerge as a sparse readout: a rat at any given location activates a small subset of grid atoms — exactly a sparse code in an overcomplete dictionary. The overcomplete covering of 2D space is precisely what allows a sparse, robust place code.

§ 7

Why overcomplete? Three reasons

Overcompleteness is not redundancy in a wasteful sense. It buys three concrete properties, all present in the grid cell system.

I
Flexibility
More atom directions means any signal is close to some atom, enabling a sparse representation for any input — not only those aligned with a fixed basis. A grid cell module at a single scale and orientation could only sparsely encode signals in certain directions of neural activity space.
II
Robustness to loss
A complete basis has zero redundancy — lose one atom and reconstruction fails along that dimension. An overcomplete dictionary degrades gracefully. Grid cells are routinely lost to aging, injury, and disease; the system continues to function because neighboring atoms (cells with similar phases and scales) can partially compensate.
III
Metabolic efficiency
Sparse codes use fewer active neurons per stimulus — lower energy cost, less interference between simultaneously stored memories, and higher representational capacity. A rat at any location activates only a small fraction of all place cells, consistent with sparse coding.

Drag the slider below to remove atoms from both dictionaries and compare how gracefully they degrade.

robustness — drop atoms and watch error across all signal angles — red = complete — blue = overcomplete

The overcomplete dictionary (blue) degrades smoothly. The complete basis (red) fails catastrophically when a critical atom is removed — the complete basis has no slack.