Essay · 2,000 ft · Patterns

The Compression Canopy

Every word is a compressed handle on reality. AI learned to read the handles, not the reality, and that's both why it works and where the next edge lives.

AI · representation Investing · judgment Systems · emergence 14 min read

Visual 01

What the model sees.

Click through. The same sentence becomes raw text, then chunks, then a small space of relationships. The shift is the whole essay.

The human reads judgment in one glance: business quality is not deal quality.

A token is a handle on reality.

When you type a sentence into ChatGPT, the system does not see words the way you do. It breaks text into tokens, chunks converted into numbers and placed into a representational space. That sounds like an engineering detail. It points to something more general.

A token is the level at which a substrate becomes usable. Atoms become molecules. Molecular motion becomes temperature. Customer behavior becomes churn. A thousand operational details become a business model.

The point of a token is not that it is small. The point is that it is operable. It lets a system hold the world without carrying the entire world around.

Intelligence is not the ability to see everything. It is the ability to find the level where detail becomes action.

Visual 02

Too fine. Useful. Too coarse.

Move the slider. Watch the same substrate go from raw noise to operable tokens to a vague label that hides what matters.

Grain Useful level

The raw details cluster into tokens a buyer can act on: churn, utilization, pricing power.

The finding

A token is a compression that still knows what matters.

More is different, but only at the right level.

The world does not become easier because it has fewer parts. It becomes easier when a new unit appears above those parts.

If you know everything about the atoms in a molecule, you do not automatically have the useful language of chemistry. If you know everything about the molecules in a body, you do not automatically have the useful language of organs, symptoms, or treatment. New levels need new handles.

That is the non-mystical core of emergence. The higher-level object is not fake. It is a real feature of the lower-level substrate, because it preserves predictive power upward.

And here is the move that matters for what comes next: every level the world produced eventually got a name. Atoms. Molecules. Cells. Bodies. People. Teams. Markets. Each compression earned a word. Those words turned out to be how the rest of the stack would be carried forward.

Visual 03

The same shape at every scale, and a word for each.

Tap a layer. Each rung shows what gets compressed below, what token appears above, and the everyday word humans gave to the compression. Language is what made every level portable.

Walk through every layer

Atoms compress into molecules, and chemistry becomes possible.

Thermodynamics is not intelligent. But it is a powerful example of why a macro-variable can be more useful than the full microstate. Temperature lets you reason about molecular motion without tracking every molecule.

The finding

The right level is not the deepest level. It is the level where the system becomes predictable without becoming false.

Language is one compression layer that carries all the others.

Language is not the compression layer humans built. It is a compression layer, one of several. But it is the one that carries the rest. Every level below it eventually got a word, and each word arrived already loaded with the work of the levels beneath.

The word molecule is a handle on chemistry, which is a handle on atomic interactions. The word fatigue is a handle on a body, which is a handle on cells, which is a handle on the chemistry of energy. The word EBITDA is a handle on operating economics, which is a handle on thousands of customer decisions, which are handles on incentives, which are handles on minds. The word trust is a handle on a relationship that is itself a handle on hundreds of small interactions extending back through human cooperation.

Every word in your vocabulary is sitting on top of a stack like that. Some of those stacks took thousands of years to settle. Some took millions. By the time a word like fatigue or moat or brand exists in ordinary language, it is a multi-millennia stack of compressions wearing a single label.

This is what makes LLMs feel sudden. The model did not begin with atoms, photons, rooms, people, incentives, and histories. It began with text, and the text was already pre-compressed by every level the world built underneath it.

All of it, at once.

A model trained on language inherits the residue of every compression that came before, from physics, biology, sociology, economics, accounting, medicine, because each of those levels had already been folded into vocabulary. LLMs read the canopy. The work below it was already done.

The finding

LLMs did not have to tokenize the world from scratch. They inherited a vocabulary into which the world had already been compressed, level by level, over millennia.

Intelligent Space is where intelligence moves.

Once tokens exist, they form paths, distances, bottlenecks, analogies, and tradeoffs. That navigable structure is the space of thought.

Intelligent Space is not a physical place. It is not a cosmic field. It is the cross-level geometry that appears when stable tokens relate to one another and can be used for prediction or action.

A doctor moves from symptoms to organs to mechanisms to treatments. An investor moves from customer behavior to unit economics to financing to exit. A model moves from token to token through probability space. The substrates differ. The shape rhymes.

And the shapes rhyme because the words rhyme. Signal, mechanism, bottleneck, leverage, compression, capacity: these show up across medicine, engineering, and investing not by coincidence but because the underlying compressions rhyme, and the language inherited that rhyme. Borrow the vocabulary of one domain and you've borrowed its map.

Visual 04

A journey through token-space.

Each domain is a different shape. Press play, or scrub the path. The muted brass branches are the dead-ends, paths a careless analyst would take.

01 / 08

Reviews

Should we buy this specialty manufacturing roll-up?

Top 3 customers are 48% of revenue. Is this a moat, or a single point of failure?

Evidence

Top 3 customers are 48% of revenue; long-term contracts.

Token

Concentration risk

The finding

Intelligence is the ability to move through the right space without getting trapped at the wrong level of detail.

Analogy is how token-spaces talk.

A token makes one domain usable. An analogy makes one domain usable through another.

Analogy is not decorative language. It is relational compression. When we say electricity flows like water, or a company has a moat, we are not saying the domains are identical. We are saying the relationships in one token-space can help us navigate another.

This is why analogy creates understanding. It imports a map.

Visual 05

Two domains. Same shape.

The path through a deal and the path through a diagnosis aren't the same content, but they rhyme step for step. The dashed bridges are the analogy: each one is a relational mapping the analyst is borrowing from the other domain.

The investor and the doctor are doing the same shape of thinking: moving from raw evidence to a compressed token to a confident action. Borrowing the doctor's map sharpens the investor's diligence; borrowing the investor's map sharpens the clinician's triage.

A useful analogy can become so stable that it hardens into a business token. Moat, flywheel, runway, platform, pipeline, and optionality all began as mappings before they became ordinary language.

The finding

Analogy is relational tokenization. It turns an unfamiliar space into a navigable one by borrowing the structure of a familiar space.

First-principles thinking is token hygiene.

It strips away inherited tokens, tests which ones are real, descends to more basic tokens, and rebuilds upward.

Not all tokens are first principles. Many are useful shortcuts. Moat, brand, platform, culture, AI-native, and recurring revenue can be real. They can also be lazy compressions.

First-principles thinking asks which handles are load-bearing. It keeps drilling until the token becomes difficult to fake.

Visual 06

Strip the buzzword to behavior.

Pick a familiar business token. Watch it crack open into the layers underneath. Each strip is a level of abstraction; the last is the only one that's hard to fake.

"This company has a moat."

Fake-token detector

Cannot name who pays.
Cannot name what gets cheaper at scale.
Cannot trace the token to behavior.
Cannot name what would break the thesis.

Analogies create maps. First principles audit those maps. Tokens let us think. Analogies let us transfer thought. First principles keep thought honest.

The finding

First principles are the tokens you are willing to build from because they still explain the system when the higher-level language is stripped away.

Diligence is token discovery.

A data room is not a business. It is a messy substrate waiting to be compressed into a thesis.

A company gives you 4,000 files, 12 years of transactions, 200 employees, 17 systems, and a founder who says the business has never been stronger. Nobody underwrites that directly. You compress.

A good investor is not someone who sees every detail. A good investor is someone who knows which details deserve to become tokens.

The finding

Diligence is not data collection. It is the search for the compressed variables that still predict the business.

AI works where the tokens are clean.

AI progress is fastest where the substrate already has stable, machine-readable tokens: text, code, contracts, spreadsheets, logs. It is slower where the world is continuous, noisy, embodied, path-dependent, or strategically reflexive.

That does not make messy domains hopeless. It means better instruments can move them rightward: more structured data, better labels, cleaner feedback, richer measurement, more useful abstractions.

Visual 07

Bigger models help. Cleaner tokens move the frontier.

Toggle instrumentation. Some high-value domains become navigable when the right tokens are made cheap.

Before instrumentation, messy high-value domains stay valuable but hard to navigate.

The limits of the frame.

Tokens are not magic. They are useful handles. They are real in the same sense that churn, temperature, price, and brand are real: they preserve enough structure to help you predict or act.

Every useful abstraction throws something away. EBITDA can hide capex. Recurring revenue can hide weak retention. "AI-native" can hide a thin wrapper. "Moat" can hide a business with no customer switching friction.

There is also a shadow side. Wolfram calls it computational irreducibility: some systems may not have a clean higher-level handle that predicts what happens next. You have to run the process. A subscription churn pattern can be modeled. A founder negotiation cannot. Each move changes the next move. Intelligence lives where compression gives a shortcut. It is humbled where reality must be lived forward.

The finding

Intelligence is powerful where reality gives it handles. It is humbled where the only honest prediction is to run the system.

The next edge is knowing what to tokenize.

The first wave of AI made text, code, documents, and chat cheaper. That wave worked because language had already been compressed for thousands of years before the model arrived. The model just had to read the canopy.

The next wave is different. The next wave is for domains that don't yet have a vocabulary: operating substrates that haven't been compressed into language because nobody has needed to. Permits. Routes. Tickets. Claims. Calls. Inspection notes. These are still raw. They have not yet earned their words.

Building permit velocity is not a trick. It is what humans did with fatigue and EBITDA and moat over centuries, but compressed into months because the compression itself can now be partially automated. The frontier is manufacturing the next layer of language for the parts of the world that don't yet have one.

This is where the builder, investor, and operator meet. The builder creates the tokenization layer, the new words. The investor reads the new vocabulary as signal before the rest of the market notices it exists. The operator acts on what the new vocabulary makes visible.

Visual 08

Watch the compression happen.

Pick a domain. Press play. The messy operational substrate collapses into tokens, then the value chain emerges in order: builder, investor, operator.

Press Play to watch the compression.

The value chain Once tokens exist, three roles light up, in order.

Builder builds the layer

...

Investor reads the signal

...

Operator acts on it

...

Same compression move as atoms to molecules to chemistry. Just applied to operating noise. Once the tokens exist, the three roles emerge in order: the builder makes the layer, the investor reads the signal it produces, and the operator acts on what the investor sees.

The finding

The most valuable AI systems will not merely answer questions. They will create new units of analysis for domains that were previously too messy to navigate.

The level at which the world becomes usable.

Each level is the previous level made operable.

Visual 09

The whole stack, one shape.

Eight compressions. Each one makes a new kind of intelligence possible.

01 / 08 Atoms → molecules

The question for any domain (a model, a market, a company, a cell, a mind) is not only how complex it is. The question is: what are its tokens, and can anything learn to move through them?

The synthesis here borrows from several traditions: tokenization in modern LLMs, "more is different" in physics, structure-mapping in cognitive science, and computational irreducibility in complexity theory. The thread I find most useful for what to build next is treating intelligence as the navigation of the right token-space, and treating new tokenization layers as the next economic frontier.

Notes and scientific footing

OpenAI documentation on tokenization and counting tokens with tiktoken. OpenAI Cookbook.
Philip W. Anderson, "More Is Different." Science.
Shalizi and Crutchfield, "Computational Mechanics: Pattern and Prediction, Structure and Simplicity." arXiv.
Bengio, Courville, and Vincent on representation learning. arXiv.
Dedre Gentner's structure-mapping theory of analogy. Cognitive Science.
Wolfram's computational irreducibility. MathWorld.

The Compression Canopy

A token is a handle on reality.

More is different, but only at the right level.

Language is one compression layer that carries all the others.

Intelligent Space is where intelligence moves.

Analogy is how token-spaces talk.

First-principles thinking is token hygiene.

Fake-token detector

Diligence is token discovery.

AI works where the tokens are clean.

The limits of the frame.

The next edge is knowing what to tokenize.

The level at which the world becomes usable.

Notes and scientific footing

Get the next pattern essay.