In 1994, Charles Stross — yes, the science fiction novelist — was running a web crawler that kept hammering Martijn Koster's server. Koster responded with a plain text file placed at the root of his site, listing which pages automated visitors should avoid.
That file became robots.txt. It was never a standard, never enforced by any protocol. A federal judge in December 2025 compared it to a "keep off the grass" sign. And for thirty years, people mostly stayed off the grass. The deal was too good to walk away from.
Search engines crawled your site, indexed your pages, and sent visitors back. Cloudflare's analysis puts the ratio for traditional search at roughly 14 crawls per referral returned. Imperfect, but real. Publishers built businesses on that traffic. Search engines built businesses on that content. The text file held because ignoring it meant losing access to the exchange. Compliance and reward were coupled.
That coupling is what made enforcement work. When Chrome started flagging HTTP pages as "Not Secure" in 2017, HTTPS adoption went from 18.5% of websites to 87% in under a decade. Let's Encrypt removed the cost barrier with free certificates. The browser was the chokepoint, and the browser's incentives aligned with the outcome. Nobody lost. Users got security. Site operators got trust signals. The actor with the power to enforce wanted the same thing the norm wanted.
AI crawling has none of that alignment.
Cloudflare's June 2025 data on crawl-to-referral ratios:
| Crawler type | Crawls per referral returned |
|---|---|
| Traditional search | ~14 |
| OpenAI | ~1,700 |
| Anthropic | ~73,000 |
Total AI crawler volume now runs up to eight times higher than traditional search crawling. The content goes in. The traffic doesn't come back. ChatGPT referrals account for roughly 0.02% of publisher traffic. Cloudflare has started building enforcement tools and content signal policies, which is meaningful. But Cloudflare sits between the publisher and the crawler. The actors consuming the content are the ones who would need to respect a successor protocol. They are, at the moment, the ones with the least reason to.
Proposals keep arriving. Each one assumes the problem is specification: if permissions were more granular, compliance would follow. The HTTPS transition tells a different story, though. That worked because the chokepoint actor gained something from enforcement. With AI crawling, the companies with the infrastructure to build and enforce a replacement are the same companies whose models depend on broad, unrestricted access to the web. And as long as robots.txt remains the reference document for every proposed successor, the problem keeps looking like a coordination failure. Better signage. Clearer instructions. That framing absorbs energy that might otherwise go toward asking a harder question.
Can protocol design solve a problem when the actor who would need to enforce it is the same one benefiting from the current free-for-all? The HTTPS transition suggests enforcement works when incentives align. Seventy-three thousand pages crawled for every referral returned, and the proposals keep describing better signs for the lawn.
Things to follow up on...
- Cloudflare's Content Signals Policy: A new machine-readable extension to robots.txt that lets publishers distinguish between search indexing, AI-assisted answers, and model training as separate permission categories.
- Invisible agentic crawlers: Every major AI browser agent, from Perplexity Comet to ChatGPT Atlas, uses standard Chromium user agents with no bot signal, making them undetectable to the 1994-era permission system.
- Reddit's contract law approach: Rather than relying on robots.txt or copyright claims, Reddit is suing Anthropic for breach of its terms of service, testing whether contract law can do what a voluntary protocol cannot.
- Publisher traffic in freefall: Chartbeat data shows small publishers experienced a 60% decline in search referral traffic over two years, with AI referrals replacing almost none of what was lost.

