The Thread That Runs Through the Browser

In the spring of 2004, Jason Huggins had a billing app problem. ThoughtWorks was expanding globally, and consultants in London and India were testing a Time and Expenses system built on Python and Plone. The manual clicking was eating everyone alive. So Huggins wrote a JavaScript tool called JavaScriptTestRunner that lived inside the browser, manipulating the page from within the same sandbox as the application. He renamed it Selenium, a joke about curing mercury poisoning (the dominant testing vendor was Mercury Interactive).

The tool worked because the problem was small enough to hold in your hands. Huggins owned the application. He knew what correct looked like. Every test was a comparison: does the browser state match what we expected? Automation and verification were the same act.

Running JavaScript inside the browser meant obeying the same-origin policy, though. You couldn't reach across domains. Selenium RC solved this with a proxy server by 2005, but the underlying logic stayed the same: scripted replay, measured against known outcomes. Then in January 2007, Simon Stewart pushed the first commit of WebDriver, which bound natively to each browser and controlled it from outside the sandbox entirely. More development work per browser, but you escaped the walls. Stewart and Huggins staged what attendees at the Google Test Automation Conference called the Steel Cage Knife Fight. By 2009, the projects merged into Selenium 2.0. For the people who'd spent years writing tests in one framework or the other, the merger meant relearning habits. But the destination was clear.

Stewart's design principle for the combined project: emulate the user. A perfectly reasonable directive for test automation. You want tests to exercise the application the way a person would. But that phrase quietly encoded something specific. The user being emulated already knew what the next click should be and what the page should look like afterward. Emulation meant replay.

When the W3C began standardizing WebDriver in 2012, the working draft was explicit: the protocol was "primarily intended to allow developers to write tests." By the time the Recommendation shipped in June 2018, the language had softened to automated testing of web applications "and more." That quiet "and more" was absorbing a lot of weight. Plenty of people were already using the protocol for screen scraping, price monitoring, competitive intelligence. The specification's center of gravity held, though: interfaces for controlling a browser "in a way that emulates the actions of a real person."

Today, when an AI agent opens a browser, it pulls an accessibility tree snapshot, a structured representation of every element on the page, and decides what to do next. The snapshot costs a few hundred tokens. The agent studies it, picks an action, watches the page change, pulls another snapshot. It is reasoning about a site it doesn't own, toward a goal that might require improvisation, on a page it has never encountered before. The protocol underneath faithfully emulates the actions of a real person. The agent is trying to figure out what a person would have done if a person had been there, using infrastructure that assumed one always was.

The accessibility tree it reads was built for screen readers. The auto-wait mechanisms it relies on were designed for test stability. And the wire protocol carrying its commands was built so developers could verify that a billing app in Chicago loaded correctly for someone in London. Each layer, a reasonable response to the problem directly in front of it. Twenty-two years of browser automation, and the thread that runs through all of it is a specific, reasonable, increasingly strained assumption: that someone already knew the answer.

Things to follow up on...

Stewart's architectural account: Simon Stewart's chapter in The Architecture of Open Source Applications remains the most detailed technical history of how WebDriver's design decisions were made and why the fallback to JavaScript injection persisted even after native bindings arrived.
When agents outgrow Playwright: The open-source browser agent framework Browser Use moved from Playwright to direct Chrome DevTools Protocol communication in early 2026, and the technical reasoning illustrates exactly where test infrastructure starts straining under agent workloads.
Building the web for agents: A June 2025 preprint from McGill and Mila researchers argues for redesigning web interfaces for agent access rather than forcing agents to navigate surfaces designed for humans, inverting the assumption that has held since Huggins' original tool.
The W3C's next move: The WebDriver BiDi working draft extends the protocol with bidirectional communication between browser and controller, a shift that matters more for agents reacting to live page changes than for test scripts replaying known sequences.