Practitioner's Corner
Agent systems are being measured by autonomous completion. The actual value might live in what they prepare for the human who finishes the job.

Practitioner's Corner
Agent systems are being measured by autonomous completion. The actual value might live in what they prepare for the human who finishes the job.

The Problem That Takes Twenty-Two Months to Start Solving

A startup spent twenty-two months building infrastructure before it had a product to sell. At pre-seed, that tempo looks like a mistake. But a study tracking 151 million IDE window activations found something uncomfortable: developers using AI tools were context-switching *more* over time, not less. Seventy-four percent didn't notice the increase.
Most AI tooling treats professional context as a retrieval problem. Find the right document, surface the right snippet. For most knowledge work, that's close enough. For developers, it misses something about where meaning actually lives. The twenty-two months start to make sense once you see what that something is.
The Problem That Takes Twenty-Two Months to Start Solving
A startup spent twenty-two months building infrastructure before it had a product to sell. At pre-seed, that tempo looks like a mistake. But a study tracking 151 million IDE window activations found something uncomfortable: developers using AI tools were context-switching *more* over time, not less. Seventy-four percent didn't notice the increase.
Most AI tooling treats professional context as a retrieval problem. Find the right document, surface the right snippet. For most knowledge work, that's close enough. For developers, it misses something about where meaning actually lives. The twenty-two months start to make sense once you see what that something is.

The Handoff Is the Product

On the benchmarks that matter most to the agent industry, a browser agent that completes 90% of a task scores identically to one that never starts. Both register as zero. The measurement is binary: full completion or nothing. And the best systems currently finish about 62% of their runs.
Meanwhile, research shows nearly half of agentic workflows end before five steps, a human stepping in to carry the work forward. That transition point is already the most common interaction pattern in production, and almost universally undesigned.

The Handoff Is the Product
On the benchmarks that matter most to the agent industry, a browser agent that completes 90% of a task scores identically to one that never starts. Both register as zero. The measurement is binary: full completion or nothing. And the best systems currently finish about 62% of their runs.
Meanwhile, research shows nearly half of agentic workflows end before five steps, a human stepping in to carry the work forward. That transition point is already the most common interaction pattern in production, and almost universally undesigned.

A Middleware Engineer on Why the Agent's Best Work Happens at the Handoff
CONTINUE READINGProtocol-Level Handoffs
The 2026 MCP roadmap, published March 9 by Lead Maintainer David Soria Parra, introduces the Tasks primitive (SEP-1686) as an experimental feature giving long-running operations a proper lifecycle: working, completed, failed, cancelled, and input_required, a first-class state for when a task can't finish autonomously.
Production use has already found the seams. Retry semantics are undefined because requestors can't know the task ID before receiving the initial response. Lose that response, and there's no safe retry without risking duplicates. Result expiry is server-defined with no standard policy. These are handoff design problems living at the protocol level, and they surface precisely because the protocol now has enough structure to expose them.
Further Reading




Past Articles

A single step succeeds 95% of the time. Chain twenty of those steps together and the workflow completes 36% of the time....

United Nations translators don't eliminate language barriers—they make collaboration possible despite them. Enterprise s...

An agent delegates a task to another system and waits for a response that might never come. The network didn't fail. The...

A login button moves from top-right to bottom-left during a site redesign. The HTML is identical—same element, same attr...


