Echoes

Echoes

The Same Gap, Three Times

In the early 2000s, an engineer automating IBM mainframes built a small scripting language around a specific expectation: that the screens underneath his scripts would change, and his scripts would break. Fifty years later, the leading benchmark for AI browser agents uses frozen, self-hosted websites for the same reason. The automation got dramatically more sophisticated. What it keeps designing around is the same thing it was fifty years ago.

The Same Gap, Three Times
In the early 2000s, an engineer automating IBM mainframes built a small scripting language around a specific expectation: that the screens underneath his scripts would change, and his scripts would break. Fifty years later, the leading benchmark for AI browser agents uses frozen, self-hosted websites for the same reason. The automation got dramatically more sophisticated. What it keeps designing around is the same thing it was fifty years ago.
ARIA's Accidental Inheritance

The W3C started building ARIA around 2006 because the dynamic web had a blind spot, literally. AJAX pages updated without reloading, and screen readers lost the thread. ARIA's fix was to give every element a role, a state, a label. Translate the visual into the semantic.
Nearly twenty years later, AI agents navigating web pages reach for that same semantic layer first. They can't really see, either. The richest map of meaning on the web was built by engineers thinking about an entirely different audience.

Broken Assumptions

Flaky Tests and the Prehistory of Agent Unreliability
Browser testing was built on an assumption: the script knows what it wants to do, and the chaos is in the world around it. Engineers spent a decade building coping infrastructure for that chaos — auto-waiting, stable locators, containerized environments. AI agents inherit the same non-determinism problem, but at a layer those tools were never designed to reach. The variability now lives inside the planning itself.

The CAPTCHA Arms Race
CAPTCHAs were built on an assumption: that something fundamental separates human behavior from machine behavior, and a test can find it. Distorted text held for a while. Then image grids. Then behavioral scoring. Each generation drew a confident line between human and non-human, and each line eventually turned out to be a guess. The web keeps building gates on a boundary that won't stay put.
Further Threads




Past Articles

Between 2018 and 2023, the RPA industry cataloged its failures in painful detail: UI fragility, credential sprawl, owner...

CORBA deferred its security and versioning problems. SOAP inherited them, declared governance "outside the scope," and p...

When an agent workflow fails at step seven, you need steps one through six to unwind in reverse order. Builders working ...

When browsers started blocking cookies, the ad industry found another identifier hiding in plain sight: the subtle diffe...
