Robots.txt works by matching a single text field. When software requests a web page, it sends an HTTP header containing a user-agent string, a short label identifying what's making the request. The robots.txt file at a site's root lists rules keyed to those labels: allow this crawler here, block that one there. If the visitor's label doesn't match anything in the file, no rules apply.
For thirty years, this was enough. Crawlers were a distinct category of software. Googlebot, Bingbot, various SEO tools. They ran on servers, not in browsers. They fetched raw HTML without rendering JavaScript. They identified themselves because they had to: a crawler that didn't announce itself couldn't be granted the access it needed to do its job. The convention worked because the interests of the governed aligned with the act of self-identification.
That alignment has likely collapsed. And the clearest signal comes from Google itself. Project Mariner identifies as "Google-Agent" and runs a full Chrome instance that renders JavaScript and interacts with dynamic content. But Google classifies it as a "user-triggered fetcher," not a crawler. Because a human initiated the request, robots.txt rules generally don't apply. The category the protocol was built to govern has been redefined out from under it.
The same reclassification is happening across the industry, mostly without the courtesy of a label. Microsoft's Copilot Actions browses the web inside a real Edge session, clicking and scrolling and typing. HUMAN Security found no signed HTTP identity for this traffic. Perplexity's Comet runs as Chromium, appearing indistinguishable from standard Chrome. Anthropic's Computer Use operates inside a sandboxed environment running Firefox and desktop applications, carrying Firefox's standard user-agent string. These are browsers. Full stop. The agent is the user, and the browser is the browser.
Even among agents that still operate as traditional crawlers, the convention is fraying. A 2025 study published at ACM IMC found that AI search crawlers check robots.txt less frequently than virtually any other bot category.
Cloudflare documented something more pointed: Perplexity using undeclared crawlers with generic browser user-agents to circumvent blocks, rotating IPs across different networks. The behavior spanned tens of thousands of domains and millions of requests per day.
In December 2025, a federal judge in Ziff Davis v. OpenAI addressed this directly:
"More akin to a sign than a barrier."
Ignoring robots.txt, Judge Stein wrote, doesn't constitute circumvention under the DMCA because there is nothing to circumvent.
The ruling confirmed what the architecture already made plain. Robots.txt governs a world where automated visitors are a separate species, identifiable by a label they voluntarily wear. The agents arriving now wear no label. They look, to every system designed to see them, exactly like the people they work for.
The sign is still posted. The visitors it addresses no longer exist.

