Someone working a labeling shift on a platform like Scale AI or Appen sits down to evaluate two AI-generated responses to the same prompt. They follow a rubric. They mark which response is more helpful, more correct, better in tone. The rubric says "quality." It says "appropriateness." These are aesthetic categories in operational clothing, and nothing about the task feels like an aesthetic judgment. The labeler is completing a gig. The Ada Lovelace Institute found that in many cases, fewer than 100 individuals shape a given language model's behavior through this process. A peer-reviewed study in PNAS Nexus confirmed the downstream effect: across five generations of GPT models, outputs exhibit remarkably consistent cultural values resembling English-speaking and Protestant European countries.
None of these labelers think of themselves as taste-makers. Neither do the training data curators deciding which examples count as high-quality, nor the optimization engineers tuning for "helpfulness." The word "engagement," as scholars at the Journal of Information Policy have observed, lets platforms translate irreducibly complex evaluative judgments into an ostensibly neutral metric. It launders taste into data. The rubrics shaping agent behavior do the same work. The evaluative judgments are real, consequential, and cumulative. They just don't register as judgments to the people making them, because the work has been scrubbed of anything that would suggest otherwise.
These patterns now operate at a scale where their consistency matters commercially and culturally. AI agents facilitated an estimated $22 billion in sales during Black Friday 2025; by the end of 2026, Gartner projects around 40% of enterprise applications will integrate agent capabilities. When millions of purchasing and research decisions pass through systems that share evaluative DNA, the aggregate starts to function like something we'd recognize in another context: institutional taste.
A precedent helps here. In 1900, the Michelin brothers published a free handbook for French motorists. The purpose was explicitly instrumental: get people driving, wear out tires, sell replacements. Their traveling salesmen happened to know which restaurants were worth stopping for, so they included listings. It took a quarter century for anyone to notice that a tire company had been quietly shaping French culinary standards. The judgment preceded the framework that rationalized it. Eventually the Michelin brothers recognized what they'd built and formalized it with inspectors, stars, and published criteria.
What's forming now has no comparable moment of recognition on the horizon, and the reason is structural. The Michelin brothers were a single entity who could look at the thing they'd made and see it whole. The evaluative frameworks shaping what agents surface and select are distributed across labelers who've moved on to other gigs, training datasets assembled by teams that have since disbanded, optimization targets set by engineers who've changed companies. No vantage point exists from which to have the recognition moment. Nobody occupies the position the Michelin brothers occupied when they looked at their handbook and realized it had become a cultural institution. Consistency in the output is an emergent property of the system, and the fragmentation across actors is precisely what keeps it invisible. Each person involved sees only their task. From outside, the aggregate looks like the neutral operation of technology.
Ghost institutions. Stable evaluative frameworks with real cultural weight, operating at commercial scale, with no board of directors, no editorial page, no address. The people who built them were completing tasks. The people deploying them are optimizing metrics. The taste just accumulates in the gap between.
Things to follow up on...
-
Who labels the labelers: The Brookings Institution documents how Western-centric annotation guidelines constrain data workers in the Global South, imposing rigid taxonomies that disregard the cultural backgrounds of the people doing the evaluative work.
-
Convergence in narrow categories: SparkToro's January 2026 research found that AI recommendation consistency varies dramatically by category breadth, with agents converging on the same brands in concentrated markets while diverging widely in expansive ones.
-
Algorithmic curation reshapes editorial choice: A systematic review of 78 studies in Frontiers in Communication found that feed ranking and personalization narrow the editorial window, biasing selection toward items expected to perform under algorithmic logics.
-
The identity crisis underneath: Eighty percent of AI agents don't properly identify themselves when visiting websites, according to DataDome's threat research team, exposing a broken trust model on both sides of agent-driven commerce.

