CURRENT | Practitioner's Corner

The Attribution Gap

The Agent's Alibi

By Rina Takahashi— April 1, 2026

Feature image for article: The Agent's Alibi

Seventy-eight percent of enterprises are piloting AI agents. Fourteen percent have made it to production. That gap has held steady through billions in improvement spending on data pipelines, prompts, tooling, and models. Every fix is genuine. The gap barely moves.

When an agent produces a wrong output, the post-mortem has five plausible places to land before it ever reaches the agent's own reasoning. All five deserve the attention. But there's something structural in that fact, and it may explain why so much real progress produces so little closing distance.

The Attribution Gap

The Agent's Alibi

By Rina Takahashi— April 1, 2026

Seventy-eight percent of enterprises are piloting AI agents. Fourteen percent have made it to production. That gap has held steady through billions in improvement spending on data pipelines, prompts, tooling, and models. Every fix is genuine. The gap barely moves.

When an agent produces a wrong output, the post-mortem has five plausible places to land before it ever reaches the agent's own reasoning. All five deserve the attention. But there's something structural in that fact, and it may explain why so much real progress produces so little closing distance.

The Forensics Problem

One Production Failure, Thirty Scenarios, and the Debugging Infrastructure That Didn't Exist

By Rina Takahashi— April 1, 2026

Feature image for article: One Production Failure, Thirty Scenarios, and the Debugging Infrastructure That Didn't Exist

A customer support agent submits a credit card replacement before the customer agrees to it. The session log looks clean. Every tool call completed, every response was generated on time, and the conversation reached a natural close. Standard monitoring sees a successful run. The agent made a bad judgment call inside a reasonable-looking transcript, and nothing in the observability stack can tell you where.

That gap between "something failed" and "here is specifically what failed" is wide enough that closing it required building an entire forensic environment from scratch. The infrastructure one team constructed to get there says a lot about how little already existed.

The Forensics Problem

One Production Failure, Thirty Scenarios, and the Debugging Infrastructure That Didn't Exist

By Rina Takahashi— April 1, 2026

A customer support agent submits a credit card replacement before the customer agrees to it. The session log looks clean. Every tool call completed, every response was generated on time, and the conversation reached a natural close. Standard monitoring sees a successful run. The agent made a bad judgment call inside a reasonable-looking transcript, and nothing in the observability stack can tell you where.

That gap between "something failed" and "here is specifically what failed" is wide enough that closing it required building an entire forensic environment from scratch. The infrastructure one team constructed to get there says a lot about how little already existed.

The Post-Mortem Pattern

The Agent Incident Investigator Who Keeps a Second Set of Notes

The Post-Mortem Pattern

The Agent Incident Investigator Who Keeps a Second Set of Notes

The Systems Thinking Trap

Agent Failure Attribution Is Entering Aviation's Systems Thinking Trap From the Opposite Direction

Aviation spent fifty years writing "pilot error" on accident reports before recognizing the phrase explained nothing fixable. The corrective was systems thinking. Failures became organizational, distributed, contextual. Better, and also convenient: when every failure belongs to the system, no specific component bears obligation to change.

Agent systems are walking into the same room through the back door. Failures seed at step three and surface at step seventeen. "The system failed" becomes the default attribution not as a philosophical correction, but because isolating the causal thread is genuinely, architecturally hard. The diffusion of responsibility that aviation chose, agent teams inherit by default.

The Systems Thinking Trap

Agent Failure Attribution Is Entering Aviation's Systems Thinking Trap From the Opposite Direction

Aviation spent fifty years writing "pilot error" on accident reports before recognizing the phrase explained nothing fixable. The corrective was systems thinking. Failures became organizational, distributed, contextual. Better, and also convenient: when every failure belongs to the system, no specific component bears obligation to change.

Agent systems are walking into the same room through the back door. Failures seed at step three and surface at step seventeen. "The system failed" becomes the default attribution not as a philosophical correction, but because isolating the causal thread is genuinely, architecturally hard. The diffusion of responsibility that aviation chose, agent teams inherit by default.

Dekker's warning:

Error classification frameworks can entrench the misconceptions they aimed to displace while producing an illusion of understanding

Primitive baselines:

Microsoft Research's AgentRx improved failure localization by 23.6%, revealing how crude prior attribution methods actually were

Coordination fog:

36.9% of multi-agent failures involve coordination breakdowns where no single component is identifiably responsible for the outcome

Visibility gap:

89% of teams can observe that agents failed, but only 52% can systematically determine what kind of failure occurred

Reversed trajectory:

Aviation corrected toward systems thinking over decades; agent teams may start there and never develop component-level accountability

Further Reading

AI Didn't Break Your Production — Your Architecture DidWhere you point the blame shapes what you fix. The attribution problem, stated plainly.

The Multi-Agent Reality Check: 7 Failure Modes When Pilots Hit ProductionA useful anatomy of how multi-agent coordination degrades. The taxonomy reveals how many failure surfaces hide behind one symptom.

Quick links

AI Agents: Why the Gap Between Demo and Deployment Keeps Widening

AI Agent Scaling Gap March 2026: Pilot to Production

How Veris AI and Lume Security Built a Self-Improving AI Agent with Microsoft Foundry

Past Articles

What Browser Use Found When They Stopped Looking at Screenshots

Most browser agent frameworks begin by taking a screenshot. Feed pixels to a model, ask it where to click. Magnus Müller...

The Handoff Is the Product

On the benchmarks that matter most to the agent industry, a browser agent that completes 90% of a task scores identicall...

Building Browser Agents for a Web That Fights Back

A government portal in Delaware goes offline at night. A dropdown is actually a textbox. A checkbox arrives pre-checked,...

The Problem That Takes Twenty-Two Months to Start Solving

A startup spent twenty-two months building infrastructure before it had a product to sell. At pre-seed, that tempo looks...

Past Articles

What Browser Use Found When They Stopped Looking at Screenshots

Most browser agent frameworks begin by taking a screenshot. Feed pixels to a model, ask it where to click. Magnus Müller...

The Handoff Is the Product

On the benchmarks that matter most to the agent industry, a browser agent that completes 90% of a task scores identicall...

Building Browser Agents for a Web That Fights Back

A government portal in Delaware goes offline at night. A dropdown is actually a textbox. A checkbox arrives pre-checked,...

The Problem That Takes Twenty-Two Months to Start Solving

A startup spent twenty-two months building infrastructure before it had a product to sell. At pre-seed, that tempo looks...