The Successor Trap

Every proposed successor to robots.txt inherits its core assumption—and the conflict that assumption cannot see.

By Rina Takahashi— April 29, 2026

Every proposed successor to robots.txt inherits its core assumption—and the conflict that assumption cannot see.

The proposals arrived quickly. As robots.txt's limitations became harder to ignore, new standards emerged, and nearly all of them began by citing it.

Llms.txt places a file at the domain root, "following the approach of /robots.txt and /sitemap.xml." The ai.txt proposal, published as an academic preprint in May 2025, opens by noting that "regulatory measures currently employed, such as the widely adopted robots.txt standard, prove insufficient" and extends its syntax to cover AI-specific actions. Google's WebMCP addresses a different layer, defining structured tool contracts rather than access rules, but shares a foundational assumption: that the relationship between websites and AI agents is a coordination challenge, solvable if both sides signal their intentions more precisely.

Better signals. Finer-grained permissions. Reasonable enough, if the problem is one of vocabulary.

The companies building agents that browse the web as browsers have strong commercial incentives to access as much content as possible. The companies most motivated to restrict that access are publishers whose content trains and feeds those same agents. That's a conflict of interest. By inheriting robots.txt as their conceptual ancestor, these proposals frame it as a vocabulary gap, a matter of insufficient signaling. Which shapes which remedies look plausible. Energy flows toward better signaling mechanisms, toward clearer syntax. Legal or economic remedies stay peripheral.

The compliance evidence so far is consistent with this reading. Llms.txt has no enforcement mechanism. John Mueller of Google noted that no AI crawlers have claimed they extract information via llms.txt. Ai.txt's own experiments found that even GPT-4o showed challenges in non-standard scenarios. Eight out of nine sites in one analysis saw no measurable change in AI traffic after implementation.

Schema.org offers an instructive precedent for what voluntary structured data standards actually produce. According to Web Data Commons, over 51% of crawled pages now contain structured data markup. But roughly 45 million domains use it against more than 360 million registered worldwide. Adoption was driven heavily by CMS platforms like WordPress and vertical tools like Ticketmaster, as ACM documented. Sites with resources and platform support adopted it. Sites without those advantages largely didn't. The two-tier web already existed. Schema.org made the tiers load-bearing.

The precedent pattern

Schema.org adoption reached 51% of pages but only ~12% of domains. The gap between participants and non-participants didn't shrink over time. It became more consequential.

The new AI standards may follow the same trajectory. Sites with technical teams will implement llms.txt and ai.txt. Those implementations will matter to the degree that AI providers choose to honor them. Sites without those resources will remain ungoverned. And the gap between participants and non-participants will widen, resembling an adoption curve that never converges. Perhaps that's the most durable consequence of inheriting the coordination frame: even the outcome it produces looks like something more coordination could fix.

The proposals arrived quickly. As robots.txt's limitations became harder to ignore, new standards emerged, and nearly all of them began by citing it.

Better signals. Finer-grained permissions. Reasonable enough, if the problem is one of vocabulary.

The precedent pattern

Schema.org adoption reached 51% of pages but only ~12% of domains. The gap between participants and non-participants didn't shrink over time. It became more consequential.