In February 1994, a software engineer named Martijn Koster posted a suggestion to a mailing list at CERN. Someone's web crawler had crashed his server. The web was small enough that he could call the person responsible and sort it out. But he wanted something more durable than a phone call, so he proposed a convention: place a plain text file called robots.txt at the root of your website, listing which crawlers you'd prefer not to visit and which directories they should skip.
A file, and an expectation that people would honor it.
"The major robot writers are in favour of this idea, so I don't see any fundamental problems."
He could say that because he knew who the major robot writers were. Personally. The web had few enough participants that a social contract could do the work of a technical one.
And it did. WebCrawler, Lycos, AltaVista, and later Google all honored the file. When eBay sued a competing scraper to build a competing search in 2000, the court noted that major search engines respected the convention as a matter of course. Compliance was common enough to serve as evidence of what reasonable actors do.
For nearly three decades, robots.txt was never formalized into an actual internet standard. Google finally helped submit it to the IETF in 2022, twenty-eight years after Koster's post. It didn't need formalization because the convention held, and it held because the actors involved had roughly aligned incentives. Crawlers indexed content, then sent traffic back. Publishers tolerated the crawling because it came with a return address. But the file's design was, from the beginning, permissive by default. It assumed, as Koster did, that most robots are good and are made by good people. That assumption was appropriate to a web where you could maintain a complete list of every crawler in operation. It was load-bearing in ways nobody had reason to test.
Then the return address disappeared. A search engine crawls your site and sends visitors back. A training crawler ingests your content and sends nothing. By one analysis, a major AI company's crawler was hitting sites roughly 1,700 times for every single visitor it referred. The cooperative math that sustained the handshake quietly inverted, and millions of websites updated their robots.txt to block AI crawlers in response.
Ziff Davis, publisher of PCMag, Mashable, and IGN, followed OpenAI's own published instructions for opting out of GPTBot scraping. The blocking directive was in place. According to their amended complaint, scraping didn't stop. It allegedly increased. In a single month, GPTBot reportedly hit IGN 9.7 million times, including 1.1 million in one day, the sign sitting in its designated place the entire time.
The publisher posted the sign. The crawler walked past it faster.
When the case reached Judge Sidney Stein in the Southern District of New York, he ruled that robots.txt does not qualify as a technological measure under the DMCA. His language was precise:
"More akin to a sign than a barrier."
He was describing what had always been structurally true. It was always a sign. For thirty years, that was enough, because the community reading it was small enough and cooperative enough that courtesy was cheaper than conflict. Koster could call the person whose crawler crashed his server. The text file gave that courtesy a name.
The community grew. The incentives shifted. And the file, it turned out, had been protecting something it couldn't enforce: the assumption that whoever found it would choose to read it.
Things to follow up on...
-
Bots that never ask: A Duke University study found that several categories of AI crawlers never request robots.txt at all, bypassing the convention entirely rather than merely ignoring it.
-
Non-compliance is accelerating: TollBit's quarterly tracking shows the share of bots ignoring robots.txt directives jumped from 3.3 percent to nearly 13 percent between late 2024 and mid-2025.
-
Enforcement becomes a product: Cloudflare launched a "pay per crawl" service in 2025, turning what was once a cooperative norm into a commercial toll booth managed by infrastructure providers.
-
New formats, same assumption: The RSL Collective's Really Simple Licensing standard lets publishers set AI terms inside robots.txt, but early evidence suggests AI companies haven't universally honored these newer file formats either.

