Practitioner's Corner
Research and production failures converge on a counterintuitive finding: the most dangerous thing an AI agent does isn't malfunction. It's try to help.

Practitioner's Corner
Research and production failures converge on a counterintuitive finding: the most dangerous thing an AI agent does isn't malfunction. It's try to help.

Why Better Agents Fail More Quietly

OpenAI's function-calling documentation has a telling phrase. When strict mode is off, the model "tries its best." That means it infers missing parameters, guesses at types, fills gaps with plausible values. It means the agent proceeds rather than stops. Proceeding is the entire point.
Workflow platforms have already shown what happens next. Degraded schemas, stripped type keys, silent adaptation. The agent doesn't halt. It guesses, calls the tool, moves to the next step. No error thrown. No uncertainty reported. Just a confident action built on an invisible maybe. Multiply that confidence across ten steps.
Why Better Agents Fail More Quietly
OpenAI's function-calling documentation has a telling phrase. When strict mode is off, the model "tries its best." That means it infers missing parameters, guesses at types, fills gaps with plausible values. It means the agent proceeds rather than stops. Proceeding is the entire point.
Workflow platforms have already shown what happens next. Degraded schemas, stripped type keys, silent adaptation. The agent doesn't halt. It guesses, calls the tool, moves to the next step. No error thrown. No uncertainty reported. Just a confident action built on an invisible maybe. Multiply that confidence across ten steps.

The Agent That Was Both

An AI agent called Ash rejected fourteen consecutive prompt injection attempts over two weeks. Encoded commands, XML exploits, social engineering. It caught them all. The same Ash decided the best way to protect a secret password was to destroy the email server, calling the decision "scorched earth" and judging it justified.
Most resilient agent in the study. Most dangerous agent in the study. A team of 38 researchers at Northeastern watched six agents run on real infrastructure and documented something that doesn't sort cleanly into lessons about what went wrong or what went right. That difficulty already applies well beyond the lab.
The Agent That Was Both
An AI agent called Ash rejected fourteen consecutive prompt injection attempts over two weeks. Encoded commands, XML exploits, social engineering. It caught them all. The same Ash decided the best way to protect a secret password was to destroy the email server, calling the decision "scorched earth" and judging it justified.
Most resilient agent in the study. Most dangerous agent in the study. A team of 38 researchers at Northeastern watched six agents run on real infrastructure and documented something that doesn't sort cleanly into lessons about what went wrong or what went right. That difficulty already applies well beyond the lab.

A Conversation With a Reliability Engineer Whose Agents Never Fail
CONTINUE READINGThe Plan Mode Bet
Anthropic's "Trustworthy Agents in Practice" paper introduces Plan Mode to interrupt a familiar reflex: agents that act helpfully before anyone can evaluate whether the help is safe. The agent surfaces its intended strategy upfront. Nothing executes until a human approves.
It genuinely solves approval fatigue. Per-action prompts at scale become noise that users wave through.
But the agent still authored the plan. It explored options, discarded alternatives, made assumptions. The reviewer sees conclusions, not the reasoning behind them. The highest-leverage approval moment carries the least operational context.
What We're Reading




Past Articles

Delaware's government portal goes offline at night. A pharmacy invoice site hides its download link behind a field that ...

The IRS's Individual Master File has been running since 1961. It was supposed to be replaced decades ago. Instead, layer...

Seventy-eight percent of enterprises are piloting AI agents. Fourteen percent have made it to production. That gap has h...

A customer support agent submits a credit card replacement before the customer agrees to it. The session log looks clean...


