Ask an enterprise team running agent workflows for the approval rate on their human-in-the-loop controls. Most have never checked. One management consultancy reports that figure running above 95% across every sales organization they've worked with. Nearly everything gets waved through.
That number is diagnostic, though of the institution rather than the agents or the reviewers. It reveals how the checkpoint got placed where it sits.
Trace how these gates originate and you find a pattern so consistent it's almost procedural. A compliance or legal team identifies a risk surface in an agent workflow. They flag it. Engineering inserts an approval gate. This is the responsible thing to do. Then: nothing. Nobody revisits whether the gate sits at a moment where human judgment would actually change the outcome. Nobody asks whether the person reviewing has the context or time to exercise real discretion. The gate exists. It fires. It gets logged.
The compliance requirement is satisfied.
And the gate persists for exactly this reason. The EU AI Act's Article 14 mandates that high-risk AI systems be designed for effective human oversight, but as legal scholars have observed, the regulation requires the capability to intervene without defining what makes intervention effective. SOC 2 auditors verify that controls exist and fire consistently. They don't evaluate whether the human at the control is exercising judgment or processing a queue. A 98% approval rate satisfies every framework it faces. No framework asks whether the rate should trouble anyone.
So the gate stays. And it compounds.
Stanford's study of 51 enterprise AI deployments found that escalation-based models, where agents handle routine work and humans intervene only at genuine exceptions, delivered median productivity gains of 71%. Full-approval models delivered roughly 30%. That 41-point gap is almost entirely a function of where the checkpoint sits.
Research on automation bias shows that when professionals override their own judgment to agree with an automated recommendation, they make worse decisions in the majority of cases. The checkpoint erodes the capacity it was designed to preserve.
But the latency cost is the visible one. The less visible cost is what the checkpoint does to the people inside it. Reviewers learn that approving is almost always correct, because the gate was never placed where the hard calls live. Approval rates climb. The checkpoint looks increasingly like it's working.
Replacing a reflexive gate with a deliberate one is structurally harder than anyone expects. Relocating a checkpoint means someone has to identify where the irreversible moments in the workflow actually are, which requires understanding the process at a depth the compliance team who placed the original gate never had. That knowledge lives with operators, not auditors. And removing a gate, even temporarily, creates a documented gap in oversight. In a regulatory environment that measures the existence of controls rather than their effectiveness, a missing gate is more legible than a misplaced one.
Deliberate checkpoints sit at moments of irreversibility, where a human decision changes the trajectory. Reflexive checkpoints sit wherever institutional anxiety concentrates. One produces governance. The other produces a record that governance occurred. Most organizations, if they checked their approval rates, would discover which one they built. And what comes after that discovery is genuinely unclear, because knowing the gate is in the wrong place doesn't help you find the right one.
Things to follow up on...
- Automation bias under pressure: A systematic review of 35 studies on automation bias in human-AI collaboration found that time pressure amplifies the severity of over-reliance on automated recommendations, even when it doesn't increase its frequency.
- The escalation rate benchmark: Production engineering guidance from Mavik Labs targets a 10-15% escalation rate as the sustainable operating point for agent workflows, implying any checkpoint with an 85%+ pass-through rate is structurally misplaced.
- Article 14's effectiveness gap: Legal scholar Melanie Fink's analysis of EU AI Act human oversight provisions argues that Article 14's success depends on implementation that acknowledges cognitive constraints and automation bias, rather than treating human presence as a standalone safeguard.
- Silent failures behind the gate: A taxonomy drawn from 591 documented agent incidents found that 88% of classifiable failures trace to infrastructure gaps like missing guardrails and monitoring, not model quality — the kind of failures approval checkpoints aren't designed to catch.

