Practitioners describe the failure modes of multi-agent coding in matter-of-fact terms. Agent A refactors a function signature. Agent B, still working from the original state, calls the old version in new code. The error surfaces at merge time, far from its cause. Silent overwrites, divergent git histories, state inconsistencies that only become visible when two reasonable-looking outputs collide. Both agents did exactly what they were told. Nobody scoped the tasks to be independent.
The upstream problem in every case is decomposition. Someone decided how to break the work apart, and the seams were wrong.
A Google DeepMind and MIT study of 180 agent configurations put numbers to this. Multi-agent coordination improved performance by up to 80% on tasks that decompose cleanly into parallel subtasks. On sequential tasks, it degraded performance by up to 70%. Independent agents working without coordination amplified errors 17.2 times. Even centralized orchestration only contained the amplification to 4.4 times. The difference between these outcomes is almost entirely a function of how the work was divided.
Multi-agent systems improved performance up to 80% on parallelizable tasks but degraded it up to 70% on sequential ones. The architecture is identical; only the task division changes.
Frederick Taylor, stopwatch in hand on the factory floor in 1911, was doing a version of this. He watched men shovel pig iron, broke their labor into discrete elements, and reassembled the elements more efficiently. Model T assembly dropped from twelve hours to ninety-three minutes. But Taylor had an advantage the modern decomposer lacks: he could observe the complete process before dividing it. He watched, timed, understood, then divided. The developer spawning subagents today decomposes work they haven't fully done themselves, into tasks whose interactions they can't fully predict. Each subagent runs in its own context window and reports back summaries. The quality of those summaries depends on whether the original scoping was right. And knowing whether the scoping was right requires the kind of domain knowledge that comes from having done the work by hand, repeatedly, over years.
Here is where the problem folds in on itself. Decomposition demands deep familiarity with the territory being divided, and agents are commoditizing that familiarity faster than new practitioners can accumulate it. The DeepMind study found a 45% accuracy threshold above which adding more agents yields diminishing or negative returns. As base models improve, fewer tasks will sit in the zone where careful human decomposition adds value. The skill may be most critical during exactly the period when it's hardest to develop, because the operational work that would build the intuition is the first thing to be automated away. And once the work is divided, the question of how to supervise what you've set in motion is its own problem, one that may prove equally nameless.
Taylor's stopwatch measured seconds. The modern decomposer is measuring something closer to coherence. Nobody has built the instrument for that yet.

