The Classification Decision That Makes Agent Cost Models Fragile

Routing cheap models for execution and frontier models for planning produces real savings, until the boundary between the two categories dissolves mid-run.

By Nora Kaplan— April 8, 2026

The Classification Decision That Makes Agent Cost Models Fragile

Routing cheap models for execution and frontier models for planning produces real savings, until the boundary between the two categories dissolves mid-run.

In a LangGraph config file, a practitioner stares at a step called "extract pricing data from competitor pages" and has to decide: is this execution or planning? Execution gets routed to a cheap model. Planning gets a frontier model. The cost architecture of the entire system rides on this classification. And no framework documentation draws a formal boundary between the two. The LangChain blog calls execution "tool invocations" and planning "thinking about the whole trajectory." In practice, extracting pricing data might require judgment about what counts as a comparable product. That's reasoning. Or it might not be. You won't know until the step runs.

Plan-and-Execute works like this: a planner generates a multi-step strategy using a frontier model, then executors carry out each step with something cheaper. One expensive call to reason, many cheap calls to act.

The LangGraph reference implementation reveals how quickly that logic bends. It wires a replanning node to the same frontier model as the planner. After each execution step, the replanner inspects results and decides whether the plan still holds. This is reasoning at frontier pricing, and the architecture assumes it's rare. The same tutorial documents what happens when you economize by using a smaller planning model: frequent replanning. The savings invert.

Per-agent context scoping attacks a different dimension entirely: information volume. Each agent sees only what it needs. Google's ADK documentation recommends scoping by default. LangChain's multi-agent docs show isolated subagents processing 67% fewer tokens than a shared-context approach on parallel, domain-separable tasks. That 67% is real. It depends entirely on the tasks being cleanly separable.

When they aren't, the parent agent must decide what context each child needs. That decision is itself a context engineering problem. Get it wrong and the child starves. Production teams have documented downstream agents receiving 60% of the original intent after summarization strips away what turned out to matter.

Plan-and-Execute routes across time. Context scoping routes across information. Nobody in the public literature seems to be writing about what happens when they interact.

You scope an executor's context to save tokens. The executor, running a cheap model with limited visibility, hits an ambiguous step it can't resolve. It fails.

Now the replanning loop fires, re-engaging the frontier model. The replanner consumes the accumulated step history from the failed execution. Its context window grows with each cycle.

So an architecture designed to minimize token spend starts burning frontier-model tokens on replanning passes triggered by information starvation in the execution layer. The two savings mechanisms are feeding each other's failure mode.

The conditions that trigger this are precisely the ones hardest to predict at design time: ambiguity that surfaces only during execution, cross-domain dependencies that scope isolation can't accommodate, steps that look like extraction but turn out to require judgment.

The cost claims, examined

The 90% cost reduction circulating in practitioner guides traces to a single blog post with no methodology. More granular accounts put the range at 30–60%, contingent on how cleanly tasks decompose.

The honest range is probably 30–60% on well-structured workflows, with a long tail of cases where savings erode or reverse. Still meaningful. Just not the number anyone quotes.

The fragility is invisible until the replanning loop starts running.

Things to follow up on...

Replanning frequency in production: No public dataset documents how often the Execute→Replan→Execute loop fires across different task types, a gap that makes cost projections speculative; this LangGraph tutorial is the closest thing to empirical guidance on when replanning cascades.
Context engineering as discipline: Jeremy Daly's February 2026 post on context engineering for commercial agent systems provides the sharpest practitioner account of how scoped delegation fails at scale and why the parent's context-packaging decision is itself an unsolved design problem.
Gartner's 40% cancellation warning: The projection that over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs and inadequate risk controls suggests that the cost architecture fragility described here is already showing up in enterprise retention data.
Cascading hallucination across boundaries: A documented failure mode where one agent's hallucinated policy propagates as fact through scoped handoffs is detailed in this multi-agent orchestration analysis, illustrating how scope isolation can amplify rather than contain errors.