Practitioner's Corner

Practitioner's Corner

The Instruments Before the Agents

Every layer of web infrastructure, from crawling to indexing to ranking, was designed for a user who clicks and scans. Agents don't click or scan. They need structured, verifiable text delivered into a context window, and the web doesn't offer that natively. Parag Agrawal founded Parallel Web Systems around this gap. The work looks like solving one bottleneck every few weeks and immediately hitting another: APIs too slow, indexing at insufficient scale, content economics with no measurement layer. Each instrument that makes one dimension legible reveals the next one that isn't.

The Instruments Before the Agents
Every layer of web infrastructure, from crawling to indexing to ranking, was designed for a user who clicks and scans. Agents don't click or scan. They need structured, verifiable text delivered into a context window, and the web doesn't offer that natively. Parag Agrawal founded Parallel Web Systems around this gap. The work looks like solving one bottleneck every few weeks and immediately hitting another: APIs too slow, indexing at insufficient scale, content economics with no measurement layer. Each instrument that makes one dimension legible reveals the next one that isn't.
The Instrument That Doesn't Exist Yet

A production agent runs for twelve minutes, calls nine tools, and returns a clean result. Every dashboard is green. The result is wrong — subtly wrong in a way that passes validation and flows into tomorrow's decisions unchallenged. The observability market now has over 66 tools, most built to answer whether an agent completed its work. Almost none can answer whether the work was correct. The gap between those two questions is wider than it looks, and the instrument that would close it has design requirements that pull against each other in ways we haven't resolved.
The Instrument That Doesn't Exist Yet
A production agent runs for twelve minutes, calls nine tools, and returns a clean result. Every dashboard is green. The result is wrong — subtly wrong in a way that passes validation and flows into tomorrow's decisions unchallenged. The observability market now has over 66 tools, most built to answer whether an agent completed its work. Almost none can answer whether the work was correct. The gap between those two questions is wider than it looks, and the instrument that would close it has design requirements that pull against each other in ways we haven't resolved.


The Metric That Hides the Architecture

Agentic workflows burn 5 to 30 times more tokens per task than a chatbot interaction. Context compounds with every loop, every tool call, every retry. By step 50, you're paying for the same conversation history 50 times over.
Most teams track cost-per-run. The number that actually matters is cost-per-successful-task, which folds in every failed attempt, every retry, every cleanup. At a 70% success rate, your true unit cost is roughly 43% higher than the number on your dashboard.
That gap is where architecture decisions go wrong. A cheaper model with a lower pass rate can quietly cost more per successful outcome than an expensive one that finishes in fewer steps. Teams measuring the wrong thing optimize confidently in the wrong direction, and the spreadsheet agrees with them the whole way down.
Further Reading




Past Articles

Government portals, insurance dashboards, vendor procurement systems. No API, no programmatic access. Just a browser and...

Most enterprise teams running agent workflows have never checked the approval rate on their human-in-the-loop controls. ...

Suchintan Singh built ML infrastructure at Faire and Gopuff. Feature stores, ranking systems, search engines. Controlled...

Before an AI agent processes a single webpage, anti-bot systems have already judged it. The evaluation is about identity...

