In February 1996, the HTML Editorial Review Board struck a deal. Netscape would remove from the official specification if Microsoft removed . Both companies agreed. Neither tag would appear in HTML standards going forward.
Browsers kept supporting them anyway.
The negotiation revealed something deeper: HTML had already forked into two incompatible paradigms, and there was no way back. Between 1991 and 1996, the web split between structure and presentation—and that split created an ambiguity that still shapes how we build web automation today.
The Original Contract
When Tim Berners-Lee published "HTML Tags" in 1991, he designed HTML as an application of SGML. The foundational principle: markup should describe what content is, not how it looks. The 18 original tags—<h1>, <p>, <ul>, <address>—were semantic descriptors.
Researchers needed to share documents across Unix workstations, Macintoshes, and simple terminals. Separating structure from presentation meant the same HTML could render appropriately everywhere. Clean. Logical. Portable.
The Competitive Break
During the browser wars of 1995-1996, Netscape and Microsoft discovered that proprietary HTML tags were easy competitive weapons. While the IETF worked on HTML 2.0, browser makers added visual effects that only worked in their software. Netscape shipped <blink>. Microsoft countered with <marquee> in Internet Explorer 2.0.
These weren't responses to documented user needs—they were market share tactics. Browser makers prioritized visual control over semantic consistency, a priority that would shape web development for decades. The original contract was broken. HTML could now describe either structure or presentation, depending on who wrote it.
Why It Couldn't Be Fixed
The W3C tried. HTML 4.0 (December 1997) deprecated presentational elements in favor of CSS, attempting to restore the structure-presentation separation. But it maintained "Transitional" and "Strict" versions—acknowledging that presentational markup was already everywhere.
The fork couldn't be healed because the web's purpose had fundamentally changed. What began as infrastructure for sharing research documents had become a canvas for visual expression. Businesses needed pixel-perfect control over appearance. The semantic web vision couldn't compete with "make this look exactly like our brand guidelines."
The compromise wasn't technical weakness—it was acknowledgment that the web served incompatible masters.
What This Means When You Build at Scale
When we build web agent infrastructure that extracts data across thousands of sites, we encounter this 1990s fork constantly. To make text bold, developers can use:
<strong>(semantic emphasis)<b>(presentational boldness)<span style="font-weight:bold">(CSS)
The visual result is identical. The semantic meaning is different.
You can't assume <h1> marks important content—developers might use <div style="font-size:24px"> instead. Sites mix approaches within the same page: semantic tags in navigation, presentational markup in content, CSS for layout. Different industries adopted different conventions. E-commerce sites often use tables for visual layout, destroying logical structure. News sites might use semantic heading hierarchies, but override them with CSS. Travel sites frequently nest <div> elements dozens deep with inline styles, making structural parsing nearly impossible.
This ambiguity compounds at scale. Reliable extraction requires parsing for semantic structure while simultaneously interpreting visual presentation. We can't choose one paradigm—we have to navigate both, because the web never chose either.
Error modes multiply: extraction logic that works on semantically-marked sites breaks on presentationally-marked ones. Maintenance burden grows as we handle both approaches across thousands of domains. The W3C later acknowledged this reality:
"There is no hard and fast division between what is 'purely semantic content' and what is 'just presentation.'"
We're not just parsing HTML—we're interpreting two incompatible design philosophies that coexist on every page. The 1996 compromise became the web's permanent condition. Anyone building infrastructure on top of it navigates that duality constantly.
Things to follow up on...
-
CSS adoption timeline: The gap between CSS specification (1996) and usable browser support reveals why semantic markup advocates couldn't win until 2001, when stylesheet support finally became reliable enough for production sites.
-
SGML's original principles: Understanding ISO 8879's foundational rule that "markup should describe a document's structure and other attributes rather than specify the processing" illuminates why Berners-Lee's vision was so different from what the web became.
-
Table-based layout epidemic: The widespread practice of using tables and transparent GIFs for page layout destroyed logical document structure so thoroughly that pages became "frequently useless to anyone using a text browser, or a text-to-speech parser."
-
Browser market dynamics: By 1999, Internet Explorer's 70% market dominance meant that Microsoft's presentational markup choices became de facto standards regardless of W3C specifications, cementing the fork's permanence.

