Operating web agents across thousands of sites, we've watched teams develop three distinct learning patterns. Some get dramatically better at delegation judgment—learning which sites can run autonomously and which need human oversight. Others iterate rapidly on agent configurations, accumulating knowledge through volume. Still others can't learn at all, blocked by infrastructure gaps that prevent feedback loops.
When BCG consultants worked with AI, Harvard researchers discovered the same pattern across 758 professionals: teams that succeeded weren't smarter about technology—they'd matched their learning mode to their operational reality.
Boundary Discovery Through Production Observability
When teams first deploy agents for competitive pricing intelligence, they discover boundaries that demos never revealed. A hotel pricing agent might handle Marriott properties reliably—consistent structure, predictable authentication—then encounter a boutique chain using dynamic rendering that breaks every pattern.
What they're learning isn't some abstract notion of "AI capability boundaries." They're learning specifics: this site uses Cloudflare bot detection requiring residential proxies; this one has A/B tests showing different prices to different users; this one requires maintaining session state across multiple requests.
Teams that succeed develop operational judgment about which sites can scale to thousands of runs daily versus which need human validation every tenth run. You don't learn this from documentation. You learn it from production observability, watching where agents succeed and where they hit walls.
The BCG consultants who succeeded developed what researchers called "jagged frontier" navigation—the same pattern. Those who interrogated AI outputs rather than blindly adopting them achieved 40% higher quality. Those who treated AI as an oracle saw performance degrade 19 percentage points.
Boundary discovery requires genuinely unclear frontiers and observability infrastructure to learn from production runs. Without visibility into what agents actually do, you're flying blind.
Codified Learning in Adversarial Environments
McKinsey's internal AI platform, Lilli, demonstrates rapid iteration: proof of concept in one week, alpha testing with 200 users over eight weeks, firmwide rollout in three months. This worked because the environment was cooperative—knowledge workers providing feedback, no adversarial resistance.
Web automation hits different constraints.
You can't A/B test your way through bot detection systems that ban your IP after three attempts. You can't iterate authentication flows on sites that lock accounts after failed logins. The web actively fights back.
The learning mode shifts toward codifying what works reliably. When a team discovers a pattern handling login flows across 1,000 e-commerce sites, the value isn't "try more variations"—it's "understand why this pattern is stable, document the exceptions, build monitoring to detect when sites change structure." The iteration happens in controlled environments before production, not through live experimentation against adversarial sites.
Codified learning works when you control testing environments and can document patterns before production. Treating the live web as an experimentation playground? That's how you get your infrastructure banned.
Infrastructure-Dependent Learning
The Department of Defense's GAMECHANGER AI project reveals what happens when organizational structure prevents learning entirely. Initial team members left the DOD, frustrated by structural barriers: inability to download software, unreasonable network limitations, irrelevance of software development to career advancement.
No amount of technical capability mattered when the organization couldn't create feedback loops.
We see this learning blockage when enterprises deploy web agents without observability infrastructure. Teams run pricing agents across competitor sites but can't see which sites succeeded, which failed, or why. They can't track error patterns. They can't identify when sites changed structure. They can't codify "this authentication flow works for these 50 sites but needs adjustment for these 10."
Without infrastructure, every deployment is a fresh start. Teams can't accumulate knowledge about the web's adversarial patterns—which sites use Cloudflare versus Akamai bot detection, which require residential versus datacenter IPs, which have regional variations in structure.
Infrastructure-dependent learning requires observability, audit trails, and gradual rollout capabilities. When reliability matters and the environment fights back, you need systems that let teams actually learn from what happens.
Matching Mode to Reality
The 10% of companies seeing financial benefits from AI aren't smarter about technology. They've matched their learning mode to their operational reality:
- Boundary discovery when you're mapping unknown territory and have observability to learn from production runs
- Codified learning when you control testing environments and can document patterns before production
- Infrastructure-dependent learning when reliability matters and the environment fights back
Your team's ability to learn depends less on technical sophistication than on whether you've built the infrastructure that makes learning possible. Most teams struggle not because they lack capability, but because they're trying to learn in modes their operational reality doesn't support.
Things to follow up on...
-
The Centaur-Cyborg spectrum: Harvard researchers identified two distinct patterns among successful BCG consultants—"Centaurs" who divided tasks between AI and humans, and "Cyborgs" who fully integrated AI into their workflow with continuous interaction.
-
Augmented Learners outperform: Organizations that combine organizational learning with AI-specific learning are 1.6 times more likely to manage uncertainties and 60-80% more effective at managing external environment uncertainties compared to limited learners.
-
The three learning methods: Companies enabling all three learning methods—machines learning autonomously, humans teaching machines, and machines teaching humans—are five times more likely to realize significant financial benefits than organizations using a single method.
-
GitHub Copilot's subtle shift: A longitudinal study found that generative AI subtly shifts task allocation away from collaborative project management toward more individualized coding tasks, potentially eroding coordination practices at scale.

