CURRENT | Foundations

Field Guide

When Extraction Succeeds But Data Goes Wrong

By Nora Kaplan— November 27, 2025

Feature image for article: When Extraction Succeeds But Data Goes Wrong

Your web agent extracts data successfully. Pipelines flow, dashboards update, everything looks operational. Then someone notices the competitor prices don't match what's actually on the website. Inventory counts show gaps that shouldn't exist. A business decision gets made on data that was technically extracted but fundamentally incorrect.

The hardest failures to catch aren't the ones that break your agent. They're the ones that keep it running while the data slowly becomes wrong. Extraction succeeds. The data just drifts. How do you know whether extracted data actually represents what you think it does when the web keeps shape-shifting beneath you?

Field Guide

When Extraction Succeeds But Data Goes Wrong

By Nora Kaplan— November 27, 2025

Your web agent extracts data successfully. Pipelines flow, dashboards update, everything looks operational. Then someone notices the competitor prices don't match what's actually on the website. Inventory counts show gaps that shouldn't exist. A business decision gets made on data that was technically extracted but fundamentally incorrect.

The hardest failures to catch aren't the ones that break your agent. They're the ones that keep it running while the data slowly becomes wrong. Extraction succeeds. The data just drifts. How do you know whether extracted data actually represents what you think it does when the web keeps shape-shifting beneath you?

Web Archaeology

The Shopping Cart That Made State Management Adversarial

By Rina Takahashi— November 27, 2025

Feature image for article: The Shopping Cart That Made State Management Adversarial

In June 1994, Lou Montulli needed to solve HTTP's statelessness—how do you remember a shopping cart when every page load forgets the last? His Netscape team rejected simpler solutions because they'd enable cross-site tracking. So they designed cookies with privacy protections: information would only flow between users and the specific sites they visited. The architecture had a gap.

Today, web agents operate in a landscape where cookies are simultaneously essential for authentication and restricted because of exploitation nobody anticipated in 1994. Every consent banner, every third-party cookie block, every session timeout—infrastructure shaped by a cascade that started with shopping carts.

Web Archaeology

The Shopping Cart That Made State Management Adversarial

By Rina Takahashi— November 27, 2025

In June 1994, Lou Montulli needed to solve HTTP's statelessness—how do you remember a shopping cart when every page load forgets the last? His Netscape team rejected simpler solutions because they'd enable cross-site tracking. So they designed cookies with privacy protections: information would only flow between users and the specific sites they visited. The architecture had a gap.

Today, web agents operate in a landscape where cookies are simultaneously essential for authentication and restricted because of exploitation nobody anticipated in 1994. Every consent banner, every third-party cookie block, every session timeout—infrastructure shaped by a cascade that started with shopping carts.

In Dialogue With Complexity

An Interview With v2.0, The API Version Nobody Wanted to Migrate To

In Dialogue With Complexity

An Interview With v2.0, The API Version Nobody Wanted to Migrate To

Pattern Recognition from the Field

The Agent Deployment Gap Everyone's Ignoring

Here's what I keep seeing: vendors announce dozens of agents while actual deployments stall out. Microsoft Ignite paraded Agent 365 and specialized agents. Google updated Vertex AI Agent Builder. OpenAI shipped agent-optimized models. Meanwhile, 65% of enterprises are stuck running pilots. Only 11% reach production.

The gap isn't model quality. Carnegie Mellon's benchmark shows even the best models complete just 30% of professional tasks autonomously. MIT found 95% of implementations falling short. IBM researchers got it right: most organizations simply aren't agent-ready.

What matters: 42% of enterprises need eight or more data source connections to deploy successfully, but they lack the APIs, governance frameworks, and monitoring tools for non-deterministic behavior. The exciting work isn't improving models. It's exposing enterprise APIs and building infrastructure agents actually need to operate.

Pattern Recognition from the Field

The Agent Deployment Gap Everyone's Ignoring

Here's what I keep seeing: vendors announce dozens of agents while actual deployments stall out. Microsoft Ignite paraded Agent 365 and specialized agents. Google updated Vertex AI Agent Builder. OpenAI shipped agent-optimized models. Meanwhile, 65% of enterprises are stuck running pilots. Only 11% reach production.

The gap isn't model quality. Carnegie Mellon's benchmark shows even the best models complete just 30% of professional tasks autonomously. MIT found 95% of implementations falling short. IBM researchers got it right: most organizations simply aren't agent-ready.

What matters: 42% of enterprises need eight or more data source connections to deploy successfully, but they lack the APIs, governance frameworks, and monitoring tools for non-deterministic behavior. The exciting work isn't improving models. It's exposing enterprise APIs and building infrastructure agents actually need to operate.

Pattern observed:

Vendors announce agent capabilities at scale while enterprise deployments remain stuck in pilot phase, with only 11% reaching production despite widespread interest.

Evidence shows:

Companies abandoned 42% of AI initiatives in 2024, up from 17% previously, scrapping 46% of proofs-of-concept on average before deployment.

This reveals:

Infrastructure deficit blocks progress—86% need tech stack upgrades, 53% cite security concerns, and traditional governance frameworks assume deterministic behavior agents don't provide.

Lesson learned:

Single-agent systems with limited autonomy represent the production sweet spot, simpler to debug than multi-agent setups while maintaining dynamic logic capabilities.

Application strategy:

Start with narrow, high-value use cases rather than broad automation, building API exposure and monitoring infrastructure before deploying agents at scale.

Questions Worth Asking

The questions you ask before choosing a tool reveal how much production experience you have. Feature matrices and vendor demos tell you what works in controlled environments. They don't tell you what breaks at 3am or what costs you didn't see coming.

These questions come from operating systems at scale and watching what actually matters versus what just sounds good in architecture reviews. They're not in the comparison spreadsheets. They predict whether you'll still be happy with this choice six months into production, when the real costs become visible and the complexity tax comes due.

Questions Worth Asking

The questions you ask before choosing a tool reveal how much production experience you have. Feature matrices and vendor demos tell you what works in controlled environments. They don't tell you what breaks at 3am or what costs you didn't see coming.

These questions come from operating systems at scale and watching what actually matters versus what just sounds good in architecture reviews. They're not in the comparison spreadsheets. They predict whether you'll still be happy with this choice six months into production, when the real costs become visible and the complexity tax comes due.

Reliability Guarantees

Can You Actually Promise Uptime?

Legal writes 99.9% availability into contracts before engineering confirms it's achievable. Your architecture either supports deterministic SLAs or it doesn't. Probabilistic systems can't make hard promises. Check technical feasibility before the contract goes out.

Total Ownership

What's the Real Cost After Launch?

Google estimates 40-90% of total cost of ownership comes after production launch. That architecture with the attractive licensing price? Calculate what percentage of your budget will go to keeping it running versus building new features. Launch price isn't real price.

Standards Enforcement

Does This Enforce Consistency or Require It?

Sixty-six percent of engineering leaders cite inconsistent standards as their biggest production readiness blocker. Your tool either enforces best practices automatically or demands constant human vigilance. Manual checks get skipped under pressure. Every time.

Complexity Tax

Will Complexity Outpace Your Team's Growth?

System complexity compounds faster than headcount. Engineering expertise splinters across multiple systems. Learning curves steepen. More time goes to maintenance than building value. Look hard at whether this architecture's complexity actually serves business needs or just sounds impressive in reviews.

Error Recovery

What's Your Error Budget and MTTR?

A 99.99% SLO over thirty days gives you roughly four minutes of allowable downtime. When monitoring, ownership, and runbooks exist before launch, mean time to recovery drops significantly. Does this system help you stay within budget and recover fast when you breach it?

Vendor Viability

Does This Vendor Have Critical Mass?

Architecture evaluation should assess whether the component and its vendor have critical mass in their marketplaces. Technically excellent tools from vendors without staying power create infrastructure risk. You're betting on their survival. Know what you're betting on.