From Blocked Researcher to Bot Hunter

Antoine Vastel was studying browser fingerprinting for privacy research when Headless Chrome started blocking his automated tests. The irony wasn't subtle—he was investigating how websites track users, and the bots he built kept getting caught by detection systems he didn't understand.

Getting blocked redirected his research. Instead of fighting detection systems, he started studying how they work. What he learned: the best detection doesn't just identify bots. It understands the attacker's methodology well enough to anticipate the next move.

Today, as VP of Research at DataDome, Vastel leads the team protecting websites from sophisticated automation. His work surfaces something about web automation at scale: the adversarial nature creates survival bias. As detection improves, simple bots get filtered out. Only the most sophisticated ones remain visible. The training data naturally shifts toward harder examples. The arms race never ends—it just gets more expensive.

The Survival Bias Problem

As detection improves, simple bots disappear from view. Only sophisticated automation remains visible, naturally shifting training data toward harder examples.

What 1.85 Billion Requests Teach You

When DataDome protected a US news website from a 12-hour DDoS attack, the numbers gave "bot detection" actual shape:

Metric	Scale
Total requests	1.85 billion
Duration	12 hours
Peak rate	2.87 million requests/second
Distributed IPs	311,000 addresses
Active IPs (any moment)	120,000-140,000

The residential proxy detail matters. These weren't data center IPs that enterprise firewalls catch easily. They were addresses from legitimate ISPs—Charter Communications, Time Warner Cable, Turk Telekom—that look exactly like real users until you analyze behavior patterns at scale.

But every bot used the same Android webview user-agent. That single consistency, buried in billions of requests, became the detection signal.

Vastel's research background trained him to spot patterns that only emerge at scale. His PhD work on browser fingerprinting focused on inconsistencies—places where what a device claims about itself doesn't match what it actually does. That methodology shaped Picasso, his GPU-based fingerprinting technique that verifies whether devices are lying about their real nature. Companies including Discord now use his open-source implementation.

The distinctive element: recognizing that fingerprinting remains valuable even as browser vendors reduce tracking entropy. The adversarial context determines what signals matter. Academic research taught him to identify stable signals in changing environments. Production systems require exactly that.

When the Economy Changed

Vastel observed something significant: there's a new economy around scraper bots that didn't exist when he started. Bot-as-a-Service providers now offer REST APIs where users provide a URL and the service handles all technical complexity:

Rotating user agents
Spoofing headers
Changing IPs through residential proxies
Solving CAPTCHAs
Pay-per-success pricing

This economic shift changed the detection challenge. When Vastel started, during the PhantomJS era, detection was relatively straightforward. But Headless Chrome and Puppeteer changed everything. These bots have consistent HTTP headers, consistent TLS fingerprints. They're actual browsers running real rendering engines. Detection requires expert knowledge about browser internals—exactly what Vastel's research background provided.

By 2024, new anti-detect frameworks like nodriver emerged specifically to evade the detection techniques Vastel had developed. They implement automation using low-level Chrome DevTools Protocol commands—the browser's internal control interface that most detection systems don't monitor—that don't trigger the usual signals.

His response: start a GitHub repository called "Headless Cat & Mouse" with someone from Google's Headless Chrome team to document detection techniques as they evolve. The collaborative approach reflects his research methodology—understanding the problem requires understanding both sides.

Why Humans Still Matter

Perhaps the most revealing insight from Vastel's work:

“

"There are more and more humans in the loop."

Attackers use bots to automate part of the process, but humans step in to help the bots bypass detection at critical moments.

This matters because it shows the limits of full automation in adversarial environments. When both sides are constantly adapting, human judgment becomes the variable that's hardest to automate away. Attackers need humans to solve novel CAPTCHAs, navigate unexpected site changes, make strategic decisions about which detection signals to spoof. Defenders need humans to investigate suspicious patterns, update detection rules, make judgment calls about borderline cases.

Where humans add most value in the workflow becomes the critical design decision. Operating detection infrastructure at scale means accepting that patterns working today might fail tomorrow. Getting ground truth labels is expensive—manual investigators can verify accounts at a rate of only hundreds per week. Compute constraints matter when you're analyzing billions of requests. Daily operational reality when you're building systems that distinguish human from automated behavior while both are constantly evolving.

Vastel's path from blocked researcher to detection builder shows how infrastructure evolves when the environment fights back. The cycle is self-reinforcing: automation creates detection, detection creates better automation, better automation demands more sophisticated detection. Each iteration raises the baseline of what "production-ready" means.

Running web automation at scale means understanding both sides of the automation-detection dynamic. Vastel's work demonstrates what that understanding looks like in practice: not just catching bots, but anticipating how they'll adapt next.

Things to follow up on...

The Picasso fingerprinting technique: Vastel's GPU-based proof-of-work method verifies whether devices are lying about their real nature by leveraging computational challenges that reveal hardware inconsistencies.
Bot-as-a-Service economics: The emergence of BaaS providers offering REST APIs where users only pay for successful requests has fundamentally changed the economics of web scraping and the sophistication level required for detection.
The nodriver framework: New anti-detect bot frameworks in 2024 implement automation using low-level Chrome DevTools Protocol commands that bypass traditional detection signals, representing the latest evolution in the cat-and-mouse game.
Behavioral biometrics at scale: DataDome's system analyzes more than 16 billion user sessions and collects over 3,000 signals per session to distinguish human behavior from automated patterns in real-time.