
Practitioner's Corner
Lessons from the field—what we see building at scale
Practitioner's Corner
Lessons from the field—what we see building at scale

The API Exists. The Data Doesn't.

LinkedIn has a public API. Open any profile in your browser—work history, skills, recommendations, everything's there. Query the API for the same data? Name and current position. The rest requires partner status most companies will never get. The API exists. The data doesn't come through it. Every day, technical teams and business teams talk past each other about this gap, unable to explain why the "proper" solution doesn't work.
The API Exists. The Data Doesn't.

LinkedIn has a public API. Open any profile in your browser—work history, skills, recommendations, everything's there. Query the API for the same data? Name and current position. The rest requires partner status most companies will never get. The API exists. The data doesn't come through it. Every day, technical teams and business teams talk past each other about this gap, unable to explain why the "proper" solution doesn't work.

The Web Has Amnesia

You're one click from completing checkout. Payment details entered, shipping confirmed. Then the page refreshes and your cart is empty. You're logged out. Everything you were doing—gone.
The web working as designed, forgetting you between every interaction. What feels like staying logged in is actually invisible infrastructure constantly reconstructing who you are across systems that have no memory. Operating web agents at scale makes this amnesia operationally visible—thousands of authenticated sessions, each requiring constant proof of identity to systems that forget you exist. The illusion of continuity hides more complexity than most people realize.
The Web Has Amnesia
You're one click from completing checkout. Payment details entered, shipping confirmed. Then the page refreshes and your cart is empty. You're logged out. Everything you were doing—gone.
The web working as designed, forgetting you between every interaction. What feels like staying logged in is actually invisible infrastructure constantly reconstructing who you are across systems that have no memory. Operating web agents at scale makes this amnesia operationally visible—thousands of authenticated sessions, each requiring constant proof of identity to systems that forget you exist. The illusion of continuity hides more complexity than most people realize.

Rina Takahashi
Rina Takahashi, 37, former marketplace operations engineer turned enterprise AI writer. Built and maintained web-facing automations at scale for travel and e-commerce platforms. Now writes about reliable web agents, observability, and production-grade AI infrastructure at TinyFish.
Theory Meets Production Reality

What Staging Actually Tests
Every selector validated. Every error handler triggered correctly. The staging tests passed completely. Then production launched and 40% of the automation failed within an hour. The code worked perfectly—staging had confirmed it. What staging couldn't tell the team: whether their assumptions about how the web behaves matched reality. Turns out, testing your logic and testing your understanding are different problems entirely.

Production Teaches What Staging Cannot
Staging confirmed your assumptions were internally consistent. Production revealed which ones were wrong. A website changed its authentication flow overnight. Bot detection evolved its logic between deployments. Rate limits activated at scale you'd never encountered in testing. Not because the code failed, but because the live web doesn't behave like controlled environments. Production doesn't just validate whether your automation works—it teaches you what you didn't know you needed to learn.

What Staging Actually Tests
Every selector validated. Every error handler triggered correctly. The staging tests passed completely. Then production launched and 40% of the automation failed within an hour. The code worked perfectly—staging had confirmed it. What staging couldn't tell the team: whether their assumptions about how the web behaves matched reality. Turns out, testing your logic and testing your understanding are different problems entirely.

Production Teaches What Staging Cannot
Staging confirmed your assumptions were internally consistent. Production revealed which ones were wrong. A website changed its authentication flow overnight. Bot detection evolved its logic between deployments. Rate limits activated at scale you'd never encountered in testing. Not because the code failed, but because the live web doesn't behave like controlled environments. Production doesn't just validate whether your automation works—it teaches you what you didn't know you needed to learn.

What Staging Actually Tests
Every selector validated. Every error handler triggered correctly. The staging tests passed completely. Then production launched and 40% of the automation failed within an hour. The code worked perfectly—staging had confirmed it. What staging couldn't tell the team: whether their assumptions about how the web behaves matched reality. Turns out, testing your logic and testing your understanding are different problems entirely.

Production Teaches What Staging Cannot
Staging confirmed your assumptions were internally consistent. Production revealed which ones were wrong. A website changed its authentication flow overnight. Bot detection evolved its logic between deployments. Rate limits activated at scale you'd never encountered in testing. Not because the code failed, but because the live web doesn't behave like controlled environments. Production doesn't just validate whether your automation works—it teaches you what you didn't know you needed to learn.
The Number That Matters
Enterprises spend 78% of their data collection budgets on data specialists. Not for elegant code. For the grinding work of unblocking target sites and reformatting datasets into usable structures.
Server maintenance takes 14%. Network security gets 5%. Software licensing claims 3%. The numbers tell a story most infrastructure pitches skip: web data collection at scale is a human labor problem wearing a technology costume.
Those specialists aren't optimizing algorithms. They're reverse-engineering authentication flows that changed overnight. Adapting to layout mutations. Cleaning malformed JSON that breaks parsers. The web resists automation at every turn, demanding judgment that machines can't yet replicate. Scale doesn't make this easier. It multiplies the edge cases until human expertise becomes the bottleneck.
Enterprises spend 78% of their data collection budgets on data specialists. Not for elegant code. For the grinding work of unblocking target sites and reformatting datasets into usable structures.
Server maintenance takes 14%. Network security gets 5%. Software licensing claims 3%. The numbers tell a story most infrastructure pitches skip: web data collection at scale is a human labor problem wearing a technology costume.
Those specialists aren't optimizing algorithms. They're reverse-engineering authentication flows that changed overnight. Adapting to layout mutations. Cleaning malformed JSON that breaks parsers. The web resists automation at every turn, demanding judgment that machines can't yet replicate. Scale doesn't make this easier. It multiplies the edge cases until human expertise becomes the bottleneck.
Server maintenance and cooling consume just 14% of budgets because compute got cheap while expertise stayed expensive.
Network protection including firewalls and isolation accounts for 5%, a fraction of what specialists cost to maintain operations.
Licensing takes 3%, the smallest line item, because integration tools are commoditized but adaptation knowledge isn't.
Organizations lose $15 million annually to poor data quality on average, yet 60% don't measure these hidden costs.
Moving from prototype to production multiplies specialist hours as target sites proliferate and architectural edge cases compound exponentially.
Field Notes from the Ecosystem
November delivered a lesson in operational margins. One configuration change doubled a feature file size. A hardcoded limit that seemed reasonable suddenly wasn't. Three hours of global disruption followed.
The pattern keeps repeating. Feature files updating every few minutes because attackers evolve faster. Failover paths built in hours after 90-minute blackouts. Rate limiters moving into operating systems. These aren't theoretical concerns. They're production realities visible in incident reports and engineering blogs.
What we're documenting: the operational patterns that only surface when systems run at scale, under load, in public view.
November delivered a lesson in operational margins. One configuration change doubled a feature file size. A hardcoded limit that seemed reasonable suddenly wasn't. Three hours of global disruption followed.
The pattern keeps repeating. Feature files updating every few minutes because attackers evolve faster. Failover paths built in hours after 90-minute blackouts. Rate limiters moving into operating systems. These aren't theoretical concerns. They're production realities visible in incident reports and engineering blogs.
What we're documenting: the operational patterns that only surface when systems run at scale, under load, in public view.
