
Foundations
Conceptual clarity earned from building at scale
Foundations
Conceptual clarity earned from building at scale

When Extraction Succeeds But Data Goes Wrong

Your web agent extracts data successfully. Pipelines flow, dashboards update, everything looks operational. Then someone notices the competitor prices don't match what's actually on the website. Inventory counts show gaps that shouldn't exist. A business decision gets made on data that was technically extracted but fundamentally incorrect.
The hardest failures to catch aren't the ones that break your agent. They're the ones that keep it running while the data slowly becomes wrong. Extraction succeeds. The data just drifts. How do you know whether extracted data actually represents what you think it does when the web keeps shape-shifting beneath you?
When Extraction Succeeds But Data Goes Wrong

Your web agent extracts data successfully. Pipelines flow, dashboards update, everything looks operational. Then someone notices the competitor prices don't match what's actually on the website. Inventory counts show gaps that shouldn't exist. A business decision gets made on data that was technically extracted but fundamentally incorrect.
The hardest failures to catch aren't the ones that break your agent. They're the ones that keep it running while the data slowly becomes wrong. Extraction succeeds. The data just drifts. How do you know whether extracted data actually represents what you think it does when the web keeps shape-shifting beneath you?

The Shopping Cart That Made State Management Adversarial

In June 1994, Lou Montulli needed to solve HTTP's statelessness—how do you remember a shopping cart when every page load forgets the last? His Netscape team rejected simpler solutions because they'd enable cross-site tracking. So they designed cookies with privacy protections: information would only flow between users and the specific sites they visited. The architecture had a gap.
Today, web agents operate in a landscape where cookies are simultaneously essential for authentication and restricted because of exploitation nobody anticipated in 1994. Every consent banner, every third-party cookie block, every session timeout—infrastructure shaped by a cascade that started with shopping carts.

The Shopping Cart That Made State Management Adversarial

In June 1994, Lou Montulli needed to solve HTTP's statelessness—how do you remember a shopping cart when every page load forgets the last? His Netscape team rejected simpler solutions because they'd enable cross-site tracking. So they designed cookies with privacy protections: information would only flow between users and the specific sites they visited. The architecture had a gap.
Today, web agents operate in a landscape where cookies are simultaneously essential for authentication and restricted because of exploitation nobody anticipated in 1994. Every consent banner, every third-party cookie block, every session timeout—infrastructure shaped by a cascade that started with shopping carts.

An Interview With v2.0, The API Version Nobody Wanted to Migrate To
An Interview With v2.0, The API Version Nobody Wanted to Migrate To

Pattern Recognition from the Field
Here's what I keep seeing: vendors announce dozens of agents while actual deployments stall out. Microsoft Ignite paraded Agent 365 and specialized agents. Google updated Vertex AI Agent Builder. OpenAI shipped agent-optimized models. Meanwhile, 65% of enterprises are stuck running pilots. Only 11% reach production.
The gap isn't model quality. Carnegie Mellon's benchmark shows even the best models complete just 30% of professional tasks autonomously. MIT found 95% of implementations falling short. IBM researchers got it right: most organizations simply aren't agent-ready.
What matters: 42% of enterprises need eight or more data source connections to deploy successfully, but they lack the APIs, governance frameworks, and monitoring tools for non-deterministic behavior. The exciting work isn't improving models. It's exposing enterprise APIs and building infrastructure agents actually need to operate.
Here's what I keep seeing: vendors announce dozens of agents while actual deployments stall out. Microsoft Ignite paraded Agent 365 and specialized agents. Google updated Vertex AI Agent Builder. OpenAI shipped agent-optimized models. Meanwhile, 65% of enterprises are stuck running pilots. Only 11% reach production.
The gap isn't model quality. Carnegie Mellon's benchmark shows even the best models complete just 30% of professional tasks autonomously. MIT found 95% of implementations falling short. IBM researchers got it right: most organizations simply aren't agent-ready.
What matters: 42% of enterprises need eight or more data source connections to deploy successfully, but they lack the APIs, governance frameworks, and monitoring tools for non-deterministic behavior. The exciting work isn't improving models. It's exposing enterprise APIs and building infrastructure agents actually need to operate.
Vendors announce agent capabilities at scale while enterprise deployments remain stuck in pilot phase, with only 11% reaching production despite widespread interest.
Companies abandoned 42% of AI initiatives in 2024, up from 17% previously, scrapping 46% of proofs-of-concept on average before deployment.
Infrastructure deficit blocks progress—86% need tech stack upgrades, 53% cite security concerns, and traditional governance frameworks assume deterministic behavior agents don't provide.
Single-agent systems with limited autonomy represent the production sweet spot, simpler to debug than multi-agent setups while maintaining dynamic logic capabilities.
Start with narrow, high-value use cases rather than broad automation, building API exposure and monitoring infrastructure before deploying agents at scale.
Questions Worth Asking
The questions you ask before choosing a tool reveal how much production experience you have. Feature matrices and vendor demos tell you what works in controlled environments. They don't tell you what breaks at 3am or what costs you didn't see coming.
These questions come from operating systems at scale and watching what actually matters versus what just sounds good in architecture reviews. They're not in the comparison spreadsheets. They predict whether you'll still be happy with this choice six months into production, when the real costs become visible and the complexity tax comes due.
The questions you ask before choosing a tool reveal how much production experience you have. Feature matrices and vendor demos tell you what works in controlled environments. They don't tell you what breaks at 3am or what costs you didn't see coming.
These questions come from operating systems at scale and watching what actually matters versus what just sounds good in architecture reviews. They're not in the comparison spreadsheets. They predict whether you'll still be happy with this choice six months into production, when the real costs become visible and the complexity tax comes due.
