The MCP roadmap reads like something you've read before. Stateful sessions fighting load balancers. No standard service discovery. Horizontal scaling requiring workarounds. The roadmap's own language: evolve the transport to "run statelessly across multiple server instances," define how "sessions are created, resumed, and migrated so that server restarts and scale-out events are transparent to connected clients."
If you lived through the microservices migration, this vocabulary is muscle memory. And the ecosystem has noticed. Google's developer blog calls multi-agent systems "the AI equivalent of a microservices architecture." Gartner logged a 1,445% surge in multi-agent inquiries between Q1 2024 and Q2 2025. Decompose monolithic agents into specialized services, orchestrate them, observe the traffic, scale horizontally. Familiar playbook.
The analogy holds further than you'd expect. Take MCP's proposed Server Cards: structured metadata exposed via .well-known URLs so registries and crawlers can discover what a server does without connecting to it. That's service discovery. Same problem DNS-based discovery solved for microservices, same reason you needed a way for one service to learn another's capabilities before calling it. The enterprise gaps the roadmap names (audit trails, gateway behavior, SSO-integrated auth) are the same gaps every service mesh had to close. Real pattern transfer, down to the implementation details.
The familiar patterns stop working at a specific point, though, and the breaking point is more fundamental than scale.
A circuit breaker wraps a call, monitors for failures, and trips when failures cross a threshold. Five consecutive errors. Thirty percent of calls timing out. The entire mechanism depends on failure being legible: a service returned an error code or it didn't. A timeout was exceeded or it wasn't. The breaker counts enumerable events. It transitions through three clean states (closed, open, half-open), and recovery probes test whether the failing service is back. Every step assumes you can tell the difference between working and broken.
An agent that produces a plausible, well-formatted, syntactically correct output that is semantically wrong breaks that assumption completely. Downstream agents consume it, reason on top of it, propagate the error. Researchers analyzing 1,642 multi-agent execution traces for a NeurIPS 2025 study put it precisely:
"Unlike in traditional software where failures often have clearly identifiable root causes, failures in MAS are frequently complex," involving "convoluted agent interactions and the compounding effects of individual model behaviors."
Failure rates across the seven systems they studied ranged from 41% to 86.7%.
The circuit breaker never trips because nothing looks broken.
The saga pattern hits the same wall. In distributed transactions, sagas recover through compensating actions. A payment gets reversed. A reservation gets cancelled. The whole model works because each step's effect is known and deterministically reversible. You charged $47.50, so you refund $47.50. The compensating transaction doesn't need to understand why the charge happened, just what happened. When the "transaction" is an agent's reasoning path, that clarity disappears. The output is a conclusion other agents already built on, and you can't un-reason a conclusion or write a compensating transaction for "interpreted the data creatively."
Temporal's engineering blog articulates the boundary cleanly:
"The orchestration of the LLM and tool calls is deterministic, the calls, the plan, the tools executed are completely non-deterministic."
The orchestration layer can be fully reliable. The content flowing through it cannot be verified by the same tools.
The reliability patterns that made microservices trustworthy all share a foundational assumption: failure produces a signal you can count.
In agent systems, the most consequential failures produce outputs indistinguishable from success without understanding what the output means. Session management, service discovery, horizontal scaling: those are translation problems, and the distributed systems community solved them already. The agent ecosystem should borrow that work freely. It should also be honest about what all those monitoring dashboards are watching over: systems full of green lights that are quietly, plausibly wrong.
Things to follow up on...
-
MCP's enterprise gaps: The 2026 MCP roadmap names four unresolved enterprise challenges — audit trails, SSO-integrated auth, gateway and proxy behavior, and configuration portability — each of which maps to infrastructure problems the microservices ecosystem spent years closing.
-
NIST enters agent identity: The National Institute of Standards and Technology launched an AI Agent Standards Initiative in February 2026, with listening sessions on sector-specific barriers beginning in April, signaling that voluntary industry norms for agent behavior may not remain the default.
-
The 47% monitoring problem: A Gravitee.io survey of 919 executives found that only 47.1% of deployed AI agents are actively monitored, meaning more than half of agent fleets operate without the security oversight that would even surface the legibility problem this piece describes.
-
Determinism is fragile everywhere: ACL 2025 research found that even LLMs configured to be deterministic showed accuracy variations of up to 15% across runs, with a gap of up to 70% between best and worst possible performance — confirming that non-determinism isn't a tuning problem but a structural one.

