CURRENT | Practitioner's Corner

The Invisible Overhead

The Work the Spreadsheet Can't See

By Nora Kaplan— March 25, 2026

Feature image for article: The Work the Spreadsheet Can't See

A single agent step running at 95% reliability sounds fine. Chain twenty steps and you're below 36%. That gap has to be managed by someone: prompt maintenance, drift detection, failure triage across layers that didn't exist before deployment. None of it appears in the business case that funded the project. The accounting framework used to justify automation has no line item for work the automation itself generates. The costs are real, and they accumulate where no instrument exists to catch them.

The Invisible Overhead

The Work the Spreadsheet Can't See

By Nora Kaplan— March 25, 2026

A single agent step running at 95% reliability sounds fine. Chain twenty steps and you're below 36%. That gap has to be managed by someone: prompt maintenance, drift detection, failure triage across layers that didn't exist before deployment. None of it appears in the business case that funded the project. The accounting framework used to justify automation has no line item for work the automation itself generates. The costs are real, and they accumulate where no instrument exists to catch them.

The Builder Profile

Sumeet Vaidya and the Distance Between Writing Code and Shipping It

By Rina Takahashi— March 25, 2026

Feature image for article: Sumeet Vaidya and the Distance Between Writing Code and Shipping It

An AI agent writes a code change in seconds. It compiles. It passes the sandbox. It touches a database schema, a caching layer, an auth service, and nobody finds out whether it actually works until the cost of finding out has already multiplied. Sumeet Vaidya spent a decade at Facebook, Uber, and Discord watching that distance between "looks right" and "works in production" grow wider with every new service dependency. With Crafting, he's placed a very specific bet on where the wall is, and it lives in the space between generated code and the production environment that has to accept it.

The Builder Profile

Sumeet Vaidya and the Distance Between Writing Code and Shipping It

By Rina Takahashi— March 25, 2026

An AI agent writes a code change in seconds. It compiles. It passes the sandbox. It touches a database schema, a caching layer, an auth service, and nobody finds out whether it actually works until the cost of finding out has already multiplied. Sumeet Vaidya spent a decade at Facebook, Uber, and Discord watching that distance between "looks right" and "works in production" grow wider with every new service dependency. With Crafting, he's placed a very specific bet on where the wall is, and it lives in the space between generated code and the production environment that has to accept it.

The Practitioner's Day

The Professional Noticer Keeping AI Agents From Quietly Losing Their Minds

The Practitioner's Day

The Professional Noticer Keeping AI Agents From Quietly Losing Their Minds

The Maintenance Curve

The Agentic AI Cost Curve: Fast Builds, Slow Drowns

Gartner predicts over 40% of agentic AI projects face cancellation by end of 2027. Most will be narrated as technology failures. Look closer and the pattern is financial: teams that built fast discover they've inherited platform-scale obligations on a prototype-scale budget.

The trajectory is remarkably consistent. Ship an agent, wire up basic logging, call it supervised. Within months, evaluation suites, audit infrastructure, model migration cycles, and governance layers arrive uninvited. Engineering maintenance alone runs $3,000 to $6,000 monthly per mid-complexity agent. Development environments, with their clean data and cooperative inputs, never hinted at any of this.

By the time the true operating cost surfaces, the project is already under executive scrutiny with no clean exit.

The Maintenance Curve

The Agentic AI Cost Curve: Fast Builds, Slow Drowns

Gartner predicts over 40% of agentic AI projects face cancellation by end of 2027. Most will be narrated as technology failures. Look closer and the pattern is financial: teams that built fast discover they've inherited platform-scale obligations on a prototype-scale budget.

The trajectory is remarkably consistent. Ship an agent, wire up basic logging, call it supervised. Within months, evaluation suites, audit infrastructure, model migration cycles, and governance layers arrive uninvited. Engineering maintenance alone runs $3,000 to $6,000 monthly per mid-complexity agent. Development environments, with their clean data and cooperative inputs, never hinted at any of this.

By the time the true operating cost surfaces, the project is already under executive scrutiny with no clean exit.

Eval gap:

89% of teams have agent observability running, but only 52% do systematic evaluations, so most can watch agents act without ever validating correctness

Testing costs:

Non-deterministic behavior means every prompt change triggers thousands of simulation reruns, pushing per-agent evaluation into the tens of thousands

Model churn:

Budget for one to two model migrations yearly, each consuming up to two weeks of engineering time and restarting the full evaluation cycle

Cognitive debt:

Researchers now separate debt in code from debt in developers' minds, where fast-built agent systems quietly erode shared understanding of how anything works

Remediation market:

Gartner anticipates specialized tools and consulting services emerging specifically to audit and refactor AI-generated technical debt at enterprise scale

Further Reading

State of AI Agents (LangChain, 2025 Survey)The gap between watching agents run and knowing they ran correctly, quantified across hundreds of teams.

Why AI Agent Pilots Fail in Production (Composio)A detailed anatomy of integration brittleness and the ongoing engineering tax it creates.

Quick links

AI Agents in 2026: From Hype to Enterprise Reality (Kore.ai)

Browser Agent Reliability: Benchmarks, Hype Gaps, and Real Task Performance (SoftwareSeni)

Crafting Raises $5.5M to Build Testing Infrastructure for AI-Driven Engineering

Indirect Prompt Injection Observed in the Wild (Palo Alto Networks Unit 42)

Past Articles

Reading an Architecture as an Argument About Failure

During their January 2026 Launch Week, Skyvern shipped a feature that lets users upload a PDF of a human standard operat...

The Math That Quietly Decides Which Agent Workflows Survive

A single step succeeds 95% of the time. Chain twenty of those steps together and the workflow completes 36% of the time....

What Browser Use Found When They Stopped Looking at Screenshots

Most browser agent frameworks begin by taking a screenshot. Feed pixels to a model, ask it where to click. Magnus Müller...

The Next Translation Layer

The IRS's Individual Master File has been running since 1961. It was supposed to be replaced decades ago. Instead, layer...

Past Articles

Reading an Architecture as an Argument About Failure

During their January 2026 Launch Week, Skyvern shipped a feature that lets users upload a PDF of a human standard operat...

The Math That Quietly Decides Which Agent Workflows Survive

A single step succeeds 95% of the time. Chain twenty of those steps together and the workflow completes 36% of the time....

What Browser Use Found When They Stopped Looking at Screenshots

Most browser agent frameworks begin by taking a screenshot. Feed pixels to a model, ask it where to click. Magnus Müller...

The Next Translation Layer

The IRS's Individual Master File has been running since 1961. It was supposed to be replaced decades ago. Instead, layer...