The Legibility Bottleneck

Agents run on the documented version of your organization. The documented version has never been the real one.

By Rina Takahashi— April 23, 2026

Agents run on the documented version of your organization. The documented version has never been the real one.

When Meta tried to deploy AI coding agents across a large internal data pipeline last month, it ran into a number that should bother anyone planning enterprise agent rollouts. Of 4,100 files spanning three languages and four repositories, roughly 5% had the kind of structured documentation that agents could actually navigate. The rest worked fine. Humans maintained it, understood it, kept it running. It just wasn't legible in the way machines require.

The knowledge to avoid mistakes existed. It lived in engineers' heads. That's a specific instance of something much older than software.

The philosopher Michael Polanyi observed in 1966:

“

"We can know more than we can tell."

He meant it literally: skilled practitioners navigate exceptions and recognize patterns in ways they genuinely cannot articulate as rules. The knowledge is contextual, adaptive, built through practice rather than instruction. You learn which field names are load-bearing the same way you learn which floorboards creak. By walking the building.

Resilience engineering has a name for the resulting gap: "work as imagined" versus "work as done." The documented version of any complex operation is narrower than the real one. Always narrower. The humans filling those gaps are what makes the system function. They're also what makes the system opaque to anything that can only read the formal version.

Enterprise agent deployments are stalling at exactly this layer. McKinsey's 2025 data shows fewer than one in four organizations have scaled an agentic AI system to production. A March 2026 survey of 650 enterprise technology leaders found that organizations successfully scaling agents spent differently: proportionally more on evaluation infrastructure and operational tooling, proportionally less on model selection. They'd located a different bottleneck entirely.

A case from OpenAI's own internal deployment makes this concrete. A data agent, asked how many enterprise customers renewed last quarter, counted every account with a renewal date in range. A reasonable approach. Also completely wrong. "Enterprise" meant contracts above a specific value threshold with a minimum employee count. "Renewed" excluded trials converting to paid. "Last quarter" followed the fiscal calendar, not the calendar calendar. Encoding the organization's definitions of its own terms produced a 5x accuracy improvement. The model had been reasoning correctly from the wrong map. Nothing in the pipeline caught it, because the answer looked plausible. The agent completed the task, returned a number, and was wrong in a way that didn't announce itself.

Agents need the map written down.

That implication is worth sitting with: the gap between what organizations know and what they've made legible looks less like documentation debt that gets paid off with a sufficiently ambitious project and more like a structural feature of how complex institutions work. Every workaround, every contextual judgment, every "we just know not to do that" accumulated over years of practice represents knowledge that resists formalization. And it resists formalization in a specific way: the act of writing it down changes what gets captured. Better models guess more fluently across it.

Things to follow up on...

Meta's fix for this: A swarm of 50+ specialized agents systematically mapped undocumented patterns across Meta's pipeline, producing concise context files that reduced agent tool calls by roughly 40% per task — though the team flags that context which decays is worse than no context at all.
Capability isn't reliability: A recent framework paper found that frontier models exhibit the highest behavioral meltdown rates in long-horizon tasks, not the lowest, because they pursue ambitious strategies that compound failure over time.
The governance retrofit: Organizations that launched agent pilots in 2025 without audit infrastructure are now discovering that production deployment requires rebuilding permission and logging architecture, with production costs running 5-10x pilot budgets.
Forrester's readiness test: Analyst research proposes a disarmingly simple question for any leader considering agent deployment: does formal documentation on how a task is done exist, and does it reflect how the task is actually done?

The knowledge to avoid mistakes existed. It lived in engineers' heads. That's a specific instance of something much older than software.

The philosopher Michael Polanyi observed in 1966:

“

"We can know more than we can tell."

Agents need the map written down.

Things to follow up on...

Meta's fix for this: A swarm of 50+ specialized agents systematically mapped undocumented patterns across Meta's pipeline, producing concise context files that reduced agent tool calls by roughly 40% per task — though the team flags that context which decays is worse than no context at all.
Capability isn't reliability: A recent framework paper found that frontier models exhibit the highest behavioral meltdown rates in long-horizon tasks, not the lowest, because they pursue ambitious strategies that compound failure over time.
The governance retrofit: Organizations that launched agent pilots in 2025 without audit infrastructure are now discovering that production deployment requires rebuilding permission and logging architecture, with production costs running 5-10x pilot budgets.
Forrester's readiness test: Analyst research proposes a disarmingly simple question for any leader considering agent deployment: does formal documentation on how a task is done exist, and does it reflect how the task is actually done?