Overcomplete Dictionaries
& Sparse Coding
why more atoms than dimensions enables sparser, more robust representations
§ 1
The problem: representing a signal
Suppose you want to describe a signal — a pattern of light on your retina, or a neural activity pattern — as a weighted sum of simple building blocks called atoms. You scale each atom by a coefficient and add them up to reconstruct the signal.
The coefficients c are what you must store or transmit — they are the representation. The question is: how do you choose your atoms so the coefficients are as simple, as sparse, as possible?
With a complete basis — exactly as many atoms as dimensions — every signal has a unique representation. But uniqueness is not the same as sparsity. Drag the red dot below to see: the x/y basis rarely yields a sparse code.
Both bars are almost always nonzero. The x/y basis cannot produce a sparse code for a diagonal signal.
§ 2
Overcomplete dictionary
Now add a third atom at 60°. We have 3 atoms in 2D — more than strictly necessary. Representations are no longer unique, but that redundancy is a feature: any signal is now close to some atom, so sparse representations become achievable. Toggle between dense and sparse solutions below.
In sparse mode, the algorithm uses only the single best-matching atom. Compare the bar heights to the complete basis above.
§ 3
The geometry of sparsity
Here is the core geometric insight. Each atom defines a direction in signal space. Sparsity means your signal is close to one of those directions — it projects strongly onto one atom and weakly onto all others.
With a complete basis, the directions are fixed and sparse. A signal pointing between two atoms must use both. With an overcomplete dictionary you pack more directions into the same space, so any signal is likely to be close to some atom. Increase the atom count below and watch the worst-case gap shrink.
The shaded wedge is the worst-case angle to the nearest atom. More atoms → smaller wedge → any signal projects strongly onto one atom → sparser code.
§ 4
Matching pursuit — step by step
How does a brain (or algorithm) actually find the sparse representation? One elegant method is Matching Pursuit — a greedy algorithm that proceeds exactly as intuition suggests: at each step, find the atom most correlated with what remains unexplained (the residual), subtract its contribution, and repeat.
Notice how the residual (red) collapses rapidly in the first few steps and then flattens. This is characteristic of sparse signals in overcomplete dictionaries: a few atoms capture the bulk of the energy, and the rest is noise.
§ 5
Sparsity vs. reconstruction error
There is a tradeoff: using fewer atoms is cheaper but less accurate. The key advantage of an overcomplete dictionary is that it reaches lower error at the same atom count — because its denser coverage of directions means any signal is closer to some atom from the start.
Blue = overcomplete (8 atoms). Red = complete (2 atoms). The gap is largest when the signal falls between the complete basis directions.
§ 6
Grid cells as an overcomplete spatial dictionary
Now lift this to neural coding. A grid cell's firing rate map is a sum of three cosines at 60° offsets — a hexagonally periodic atom in 2D space. The full grid cell population provides atoms at multiple phases and spatial scales (modules), forming an overcomplete dictionary for position.
Different cells within a module have different phases (φ), tiling all possible locations. Different modules have different spatial frequencies, roughly scaling by a factor of 1.4. Click any atom below to expand it.
§ 7
Why overcomplete? Three reasons
Overcompleteness is not redundancy in a wasteful sense. It buys three concrete properties, all present in the grid cell system.
Drag the slider below to remove atoms from both dictionaries and compare how gracefully they degrade.
The overcomplete dictionary (blue) degrades smoothly. The complete basis (red) fails catastrophically when a critical atom is removed — the complete basis has no slack.