Share

The Recurring Bargain

The Same Gap, Three Times

By Rina Takahashi— May 21, 2026

Feature image for article: The Same Gap, Three Times

In the early 2000s, an engineer automating IBM mainframes built a small scripting language around a specific expectation: that the screens underneath his scripts would change, and his scripts would break. Fifty years later, the leading benchmark for AI browser agents uses frozen, self-hosted websites for the same reason. The automation got dramatically more sophisticated. What it keeps designing around is the same thing it was fifty years ago.

The Recurring Bargain

The Same Gap, Three Times

By Rina Takahashi— May 21, 2026

In the early 2000s, an engineer automating IBM mainframes built a small scripting language around a specific expectation: that the screens underneath his scripts would change, and his scripts would break. Fifty years later, the leading benchmark for AI browser agents uses frozen, self-hosted websites for the same reason. The automation got dramatically more sophisticated. What it keeps designing around is the same thing it was fifty years ago.

ARIA's Accidental Inheritance

The Screen Reader's Gift to AI Agents

The W3C started building ARIA around 2006 because the dynamic web had a blind spot, literally. AJAX pages updated without reloading, and screen readers lost the thread. ARIA's fix was to give every element a role, a state, a label. Translate the visual into the semantic.

Nearly twenty years later, AI agents navigating web pages reach for that same semantic layer first. They can't really see, either. The richest map of meaning on the web was built by engineers thinking about an entirely different audience.

ARIA's Accidental Inheritance

The Screen Reader's Gift to AI Agents

The W3C started building ARIA around 2006 because the dynamic web had a blind spot, literally. AJAX pages updated without reloading, and screen readers lost the thread. ARIA's fix was to give every element a role, a state, a label. Translate the visual into the semantic.

Nearly twenty years later, AI agents navigating web pages reach for that same semantic layer first. They can't really see, either. The richest map of meaning on the web was built by engineers thinking about an entirely different audience.

TAKE NOTE

Token economics: An accessibility tree snapshot costs roughly 4,000 tokens to process versus 50,000 for a screenshot, making ARIA the practical default for agents running multi-step tasks

Quiet dependency: ChatGPT Atlas, Microsoft's Playwright MCP, and OpenAI's CUA all query the accessibility tree as their primary perception layer for navigating pages

Broken bridge: Poorly implemented ARIA harms screen readers and agents alike, and the specification itself warns that no ARIA is often better than bad ARIA

First rule: ARIA's own spec says to prefer native HTML semantics whenever possible, advice that OpenAI's agent implementation guidance quietly contradicts

Original intent: Richard Schwerdtfeger designed ARIA to bring "the richer, dynamic Web content experience to all users." He meant human users

The Déjà Vu

The Graveyard Spreadsheet

The Déjà Vu

The Graveyard Spreadsheet

Broken Assumptions

Flaky Tests and the Prehistory of Agent Unreliability

Browser testing was built on an assumption: the script knows what it wants to do, and the chaos is in the world around it. Engineers spent a decade building coping infrastructure for that chaos — auto-waiting, stable locators, containerized environments. AI agents inherit the same non-determinism problem, but at a layer those tools were never designed to reach. The variability now lives inside the planning itself.

Broken Assumptions

The CAPTCHA Arms Race

CAPTCHAs were built on an assumption: that something fundamental separates human behavior from machine behavior, and a test can find it. Distorted text held for a while. Then image grids. Then behavioral scoring. Each generation drew a confident line between human and non-human, and each line eventually turned out to be a guess. The web keeps building gates on a boundary that won't stay put.

Broken Assumptions

Flaky Tests and the Prehistory of Agent Unreliability

Browser testing was built on an assumption: the script knows what it wants to do, and the chaos is in the world around it. Engineers spent a decade building coping infrastructure for that chaos — auto-waiting, stable locators, containerized environments. AI agents inherit the same non-determinism problem, but at a layer those tools were never designed to reach. The variability now lives inside the planning itself.

Broken Assumptions

The CAPTCHA Arms Race

CAPTCHAs were built on an assumption: that something fundamental separates human behavior from machine behavior, and a test can find it. Distorted text held for a while. Then image grids. Then behavioral scoring. Each generation drew a confident line between human and non-human, and each line eventually turned out to be a guess. The web keeps building gates on a boundary that won't stay put.

Broken Assumptions

Flaky Tests and the Prehistory of Agent Unreliability

Browser testing was built on an assumption: the script knows what it wants to do, and the chaos is in the world around it. Engineers spent a decade building coping infrastructure for that chaos — auto-waiting, stable locators, containerized environments. AI agents inherit the same non-determinism problem, but at a layer those tools were never designed to reach. The variability now lives inside the planning itself.

Broken Assumptions

The CAPTCHA Arms Race

CAPTCHAs were built on an assumption: that something fundamental separates human behavior from machine behavior, and a test can find it. Distorted text held for a while. Then image grids. Then behavioral scoring. Each generation drew a confident line between human and non-human, and each line eventually turned out to be a guess. The web keeps building gates on a boundary that won't stay put.

Further Threads