The User-Agent String Taught Everyone to Lie

How thirty years of browser impersonation built a self-reported identity layer that AI crawlers now follow to its logical conclusion.

By Nora Kaplan— May 7, 2026

The User-Agent String Taught Everyone to Lie

How thirty years of browser impersonation built a self-reported identity layer that AI crawlers now follow to its logical conclusion.

Here is what your browser tells every server it contacts:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36

This is a Chrome user-agent string. It contains six identity claims. One of them is accurate. Read it slowly and you'll find something like a geological core sample, each token a sediment layer from a different era of the web, each one a record of the moment someone decided that honesty was less useful than compatibility.

In 1993, NCSA Mosaic sent a clean identifier: name and version number. Netscape arrived the following year under the codename "Mozilla," and web servers began checking for that name before serving richer content. When Microsoft shipped Internet Explorer in 1995, IE supported the same features but wasn't Mozilla. So Microsoft declared IE "Mozilla compatible" and started receiving the pages it wanted. A reasonable workaround. Also the founding act of a thirty-year tradition: if you want the good content, claim to be whoever gets it.

The tradition compounded. When sites began sniffing for Gecko (Firefox's engine), the developers of Konqueror added KHTML, like Gecko to their string. Apple forked Konqueror's engine into WebKit and carried the claim forward into Safari, so now two layers of false identity traveled together. Google forked WebKit into Blink, kept every prior token, and appended Chrome/version. Each fork preserved the lies of the thing it forked from, then added its own. By 2008, Chrome's user-agent string claimed to be Mozilla, WebKit, KHTML, Gecko, and Safari. Every claim was false, and every one was necessary.

Opera, for a time, offered users a dropdown menu to choose which browser they'd like to impersonate. That this was a product feature tells you everything about what the user-agent string had become.

Google recognized the absurdity. Starting in 2020, Chrome began reducing the user-agent string, freezing minor version numbers and generalizing platform details. Servers needing precise information could request it through structured alternatives. But look at the reduced string. Mozilla/5.0 is still there. AppleWebKit/537.36, version frozen forever. KHTML, like Gecko. Safari/537.36. The reduction removed specificity without removing the impersonation. Even a deliberate cleanup, led by the company with the most browser market share, couldn't strip out the compatibility theater. Firefox and Safari declined to adopt the replacement at all. The fossil record, it seems, is permanent. No single actor can unilaterally clean up what thirty years of collective dishonesty deposited.

So when Amazon sued Perplexity in early 2026, one allegation stood out: Perplexity's agent transmitted the same user-agent string as Google Chrome, making an AI system indistinguishable from a human browsing session. Cloudflare documented the same pattern when declared crawlers were blocked. Even OpenAI's self-identifying bots embed a full Chrome user-agent string before appending their own name. The honest ones start by lying.

Amazon called this deceptive, and fairly so. Read another way, though, it's the string's own logic followed to completion. For thirty years, the user-agent string taught every new entrant the same lesson. AI crawlers learned it fluently.

The web's access-control infrastructure, including robots.txt enforcement, still depends on this self-reported identifier. A courtesy, extended in an era when courtesy was enough. What remains is a name tag that taught everyone who wore it to write someone else's name. Billions of HTTP requests carry it forward daily, read by systems that have likely known for years it means very little. What can be built on that foundation is a question the string itself can't answer.

Things to follow up on...

Cloudflare's stealth crawler data: Cloudflare documented Perplexity repeatedly modifying its user agent and switching source networks to circumvent blocks after its declared crawler was identified, leading Cloudflare to de-list it as a verified bot.
Cryptographic identity signals: OpenAI's ChatGPT Agent reportedly uses HTTP Message Signatures (RFC 9421) including a verifiable Signature-Agent value, which HUMAN Security describes as a post-UA identity layer that could eventually supplement what user-agent strings can no longer provide.
Robots.txt's parallel collapse: A 2025 arXiv study found that nearly 60% of reputable sites now block at least one AI agent via robots.txt, while a Duke University study found several categories of AI crawlers never request the file at all.
The AI crawler volume shift: Cloudflare's network data shows GPTBot surging from 5% to 30% of AI crawler traffic between May 2024 and May 2025, with AI bots generating roughly 50 billion crawler requests per day by late 2025.

Here is what your browser tells every server it contacts:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36

Opera, for a time, offered users a dropdown menu to choose which browser they'd like to impersonate. That this was a product feature tells you everything about what the user-agent string had become.

Things to follow up on...

Cloudflare's stealth crawler data: Cloudflare documented Perplexity repeatedly modifying its user agent and switching source networks to circumvent blocks after its declared crawler was identified, leading Cloudflare to de-list it as a verified bot.
Cryptographic identity signals: OpenAI's ChatGPT Agent reportedly uses HTTP Message Signatures (RFC 9421) including a verifiable Signature-Agent value, which HUMAN Security describes as a post-UA identity layer that could eventually supplement what user-agent strings can no longer provide.
Robots.txt's parallel collapse: A 2025 arXiv study found that nearly 60% of reputable sites now block at least one AI agent via robots.txt, while a Duke University study found several categories of AI crawlers never request the file at all.
The AI crawler volume shift: Cloudflare's network data shows GPTBot surging from 5% to 30% of AI crawler traffic between May 2024 and May 2025, with AI bots generating roughly 50 billion crawler requests per day by late 2025.