Teams writing browser automation tests today assume certain things just work. Write a script that logs into an airline site, searches for flights between San Francisco and Tokyo, validates that prices appear correctly. Run it once to verify the flow. Run it a thousand times to monitor pricing changes across the day. The browser instances appear when needed, traffic routes through appropriate regions, login sessions persist correctly, results aggregate into dashboards. The infrastructure handling all of this operates quietly enough that most developers never think about it.
That invisibility exists because of a licensing decision made in 2004. When Selenium chose Apache 2.0, it created specific economic conditions: the automation layer could be open source while the infrastructure layer remained proprietary. Companies could build commercial platforms on top of Selenium without sharing the code that made those platforms work at scale.
Browser automation requires two distinct technical layers. The control protocol tells browsers what to do—click this button, fill this form, navigate to this URL. Selenium provides this protocol, which W3C standardized as WebDriver in 2018. The infrastructure layer provisions browsers on demand, routes traffic to avoid detection patterns, manages authentication state when sites change their flows mid-session, scales across data centers, and cleans up sessions that hang after timeouts.
Apache 2.0 made it economically viable to separate automation protocol (open source) from infrastructure operations (proprietary)—a division that persists across every major browser automation tool today.
Apache 2.0 created the economic conditions for this separation to persist. Recreating the automation protocol would require negotiating with browser vendors, maintaining compatibility as Chrome, Firefox, and Safari evolved independently, building language bindings for Java, Python, JavaScript, Ruby. That work was impractical to duplicate, so the protocol remained open. Infrastructure complexity lives in the operational layer—where companies invest in solving problems that emerge only at scale. Apache 2.0 let them keep that work proprietary.
What Stayed Proprietary
Session management across a thousand concurrent browser instances: each session needs its own authentication state, cookie jar, local storage. When an airline site updates its login flow, sessions start failing. Infrastructure needs to detect the pattern—a cluster of failures indicating a site change—then route new sessions through an updated flow while existing sessions complete their work. Detection, routing, and cleanup happen invisibly. None of this is part of Selenium's automation protocol. It's infrastructure that companies built because Apache 2.0 let them keep it proprietary.
The licensing choice shaped which problems got investment. Because companies could keep infrastructure proprietary, they invested in operational work that doesn't make sense as open source:
- Regional bot detection management: Traffic from Virginia triggers different checks than traffic from Singapore
- Authentication complexity: Handling flows that compound across different sites and change mid-session
- Dynamic resource provisioning: Scaling from dozens to thousands of concurrent sessions without manual intervention
- Failure pattern detection: Identifying site changes through clusters of failures instead of individual errors
BrowserStack runs over 3 million tests daily. That scale requires infrastructure Selenium doesn't provide and never will.
When Puppeteer launched in 2017 and Playwright in 2020, both used Apache 2.0. Commercial platforms could immediately support these new tools without restructuring their business models. Each new open source automation tool expanded the ways to interact with browsers. The infrastructure complexity stayed proprietary. The permanent gap between what's freely available and what's operationally necessary exists because of licensing economics.
Where Complexity Lives Now
Teams building web agents at enterprise scale encounter this separation directly. Writing scripts that interact with web pages is solved. Running those scripts reliably across thousands of sites with different authentication patterns, bot defenses, and regional variations requires building internal infrastructure or using commercial services. Licensing economics made it rational for companies to solve infrastructure problems once and sell access. Contributing solutions back to open source would undermine the business model that Apache 2.0 enabled.
Apache 2.0 created infrastructure that works quietly enough that teams writing automation tests never need to understand session cleanup, traffic routing, or authentication state management. The licensing decision shaped where complexity lives in the ecosystem and which capabilities exist only as commercial services.
Things to follow up on...
-
The SaaS loophole: GPL's distribution requirements don't apply to software accessed over a network, which means companies running GPL code as cloud services don't have to share modifications, fundamentally changing how copyleft licenses constrain commercial infrastructure.
-
AGPL closes the gap: The GNU Affero General Public License specifically addresses network use by requiring source code disclosure for modified software accessed remotely, which is why most companies with SaaS platforms explicitly prohibit AGPL-licensed code in their compliance policies.
-
Patent grants matter differently: Apache 2.0 includes explicit patent protection that MIT lacks, and this legal clarity makes Apache 2.0 preferred for infrastructure software where patent concerns exist, particularly when large enterprises are primary users.
-
Commercial platforms emerged immediately: Sauce Labs was founded in 2008 by Selenium's creator Jason Huggins, demonstrating how quickly commercial opportunities emerged around open source browser automation once permissive licensing made proprietary infrastructure economically viable.

