A pilot runs smoothly at ten concurrent sessions. Someone asks: "Can we try a hundred?" The team spins up more instances, traffic starts flowing, and within minutes the system stops responding. Not gradually—it just stops.
The failure itself isn't surprising. What's revealing: nothing changed except volume. Same code, same logic, same infrastructure that worked perfectly at ten sessions. You weren't operating a system designed for scale. You were operating in conditions where scale problems simply didn't exist yet.
The Cascade Nobody Saw Coming
Building web agents, we keep encountering cascading failures with surprising consistency. At pilot scale, a slow response is just a slow response. Someone notices, maybe restarts something, moves on. At production scale, that same slowness triggers a feedback loop:
- Requests pile up on the slow component
- Timeouts trigger retries
- The retries add more load
- The system can't recover because new requests keep arriving faster than it can process the backlog
Pilot environments succeed partly because they never accumulate enough concurrent load to trigger cascades. The infrastructure gaps aren't visible because the conditions that expose them don't exist yet.
Session management shows this clearly. Pilots often use sticky sessions because it's simpler—same client connects to same server repeatedly. Works fine at small scale. But multiply that pattern across thousands of connections and a single overwhelmed server can't shed its load. Clients keep trying to reconnect to it even as it fails. The system can't rebalance traffic effectively. A convenience becomes a constraint.
Rate limiting follows similar logic. At pilot scale, you hit an API endpoint fifty times an hour—no problem. At production scale, you're hitting it five thousand times. Suddenly you're triggering anti-bot defenses that didn't exist in the pilot environment. The website sees your traffic pattern and starts serving CAPTCHAs or blocking requests entirely. The pilot never revealed this constraint because it never generated enough volume to look suspicious.
The Infrastructure That Wasn't Built
Pilots don't use inferior technology. They operate without infrastructure that only becomes necessary at scale.
Monitoring in pilots means checking logs when something seems wrong. In production, monitoring means tracking whether requests that return 200 status codes actually contain the data you expected—because websites can return "success" while serving you a login page instead of product data. You need to verify not just that requests completed, but that they completed correctly. That verification infrastructure doesn't exist in pilots because manual spot-checking is sufficient.
Orchestration layers that coordinate tasks across distributed components. Fault tolerance systems that keep applications available when individual pieces fail. Circuit breakers to prevent cascades. Retry logic with exponential backoff. Queue management that prevents resource exhaustion.
None of this exists in pilots because none of it's required. You can manually restart a failed session. Check logs by hand. Route around problems because you know exactly what's running and where. Production removes that option.
What Production Actually Demands
Production-ready web automation treats failure as normal operating conditions. What happens when 3% of your sessions hit a CAPTCHA simultaneously? The pilot never forced you to answer. Production demands an answer every day.
The pilot succeeded by avoiding these questions. Production forces you to answer them. You can't just run more copies of what worked at pilot scale. You need the infrastructure that makes reliability possible when you can't manually intervene.

