CURRENT | Vision

The Professional Services Threshold

When Judgment Stops Being Scarce

By Nora Kaplan— February 19, 2026

Feature image for article: When Judgment Stops Being Scarce

A complaint response that used to take a law firm associate 16 hours now takes under four minutes. Harvard Law researchers documented 100x productivity gains on judgment-intensive tasks at major firms. Not a single one anticipates reducing attorney headcount.

Those two facts, sitting side by side, are where this gets interesting. Professional services have spent a century building intricate pyramids around one assumption: that structured discretion is scarce and expensive. Goldman Sachs deploying Claude for compliance work, and Garfield.Law winning approval to deliver legal services entirely through AI, suggest that assumption is quietly dissolving. What the firms do next reveals something about what "judgment" has been all along.

The Professional Services Threshold

When Judgment Stops Being Scarce

By Nora Kaplan— February 19, 2026

A complaint response that used to take a law firm associate 16 hours now takes under four minutes. Harvard Law researchers documented 100x productivity gains on judgment-intensive tasks at major firms. Not a single one anticipates reducing attorney headcount.

Those two facts, sitting side by side, are where this gets interesting. Professional services have spent a century building intricate pyramids around one assumption: that structured discretion is scarce and expensive. Goldman Sachs deploying Claude for compliance work, and Garfield.Law winning approval to deliver legal services entirely through AI, suggest that assumption is quietly dissolving. What the firms do next reveals something about what "judgment" has been all along.

Proving Ground / New Model

Goldman's Compliance Bet Signals Where AI Agents Actually Are

Goldman Sachs chose compliance and accounting for its first major AI agent deployment—domains where high regulatory stakes, zero error tolerance, and complex judgment at scale create a threshold test. Success here signals something different than success in lower-stakes environments. The domain choice itself reveals where agents actually are on the capability curve.

Proving Ground / New Model

Why Anthropic Engineers Spent Six Months Inside Goldman

Goldman Sachs didn't implement an AI platform. Anthropic engineers embedded at the bank for six months to co-develop systems. This deployment model exists because the gap between model capability and production reliability remains wide in high-stakes environments. Six months of embedded engineering to reach "launching soon" without a firm date—the approach itself signals where enterprise AI actually is.

Proving Ground / New Model

Goldman's Compliance Bet Signals Where AI Agents Actually Are

Goldman Sachs chose compliance and accounting for its first major AI agent deployment—domains where high regulatory stakes, zero error tolerance, and complex judgment at scale create a threshold test. Success here signals something different than success in lower-stakes environments. The domain choice itself reveals where agents actually are on the capability curve.

Proving Ground / New Model

Why Anthropic Engineers Spent Six Months Inside Goldman

Goldman Sachs didn't implement an AI platform. Anthropic engineers embedded at the bank for six months to co-develop systems. This deployment model exists because the gap between model capability and production reliability remains wide in high-stakes environments. Six months of embedded engineering to reach "launching soon" without a firm date—the approach itself signals where enterprise AI actually is.

Proving Ground / New Model

Goldman's Compliance Bet Signals Where AI Agents Actually Are

Goldman Sachs chose compliance and accounting for its first major AI agent deployment—domains where high regulatory stakes, zero error tolerance, and complex judgment at scale create a threshold test. Success here signals something different than success in lower-stakes environments. The domain choice itself reveals where agents actually are on the capability curve.

Proving Ground / New Model

Why Anthropic Engineers Spent Six Months Inside Goldman

Goldman Sachs didn't implement an AI platform. Anthropic engineers embedded at the bank for six months to co-develop systems. This deployment model exists because the gap between model capability and production reliability remains wide in high-stakes environments. Six months of embedded engineering to reach "launching soon" without a firm date—the approach itself signals where enterprise AI actually is.

Mino on Trust

Mino on Why 99.9% Accuracy Isn't Good Enough

Mino on Trust

Mino on Why 99.9% Accuracy Isn't Good Enough

The Goldman Deployment

What Goldman's Claude Numbers Don't Tell You

Goldman Sachs deployed Claude across 12,000+ developers, reporting 30% faster client onboarding and 20% productivity gains. The numbers sound impressive until you ask what they measure.

"Developer productivity" typically tracks code generation speed in isolation, not whether the code ships faster or works better. A controlled trial found developers using AI tools took 19% longer to complete tasks while estimating they were 20% faster. Meanwhile, "30% faster onboarding" measures time-to-activate but ignores abandonment rates, error correction, or downstream service calls.

Goldman's CIO describes these agents as "early stages." They assist with coding and compliance work, but humans still define specifications and regulatory parameters. Speed without context is just noise.

The Goldman Deployment

What Goldman's Claude Numbers Don't Tell You

Goldman Sachs deployed Claude across 12,000+ developers, reporting 30% faster client onboarding and 20% productivity gains. The numbers sound impressive until you ask what they measure.

"Developer productivity" typically tracks code generation speed in isolation, not whether the code ships faster or works better. A controlled trial found developers using AI tools took 19% longer to complete tasks while estimating they were 20% faster. Meanwhile, "30% faster onboarding" measures time-to-activate but ignores abandonment rates, error correction, or downstream service calls.

Goldman's CIO describes these agents as "early stages." They assist with coding and compliance work, but humans still define specifications and regulatory parameters. Speed without context is just noise.

Perception gap:

Developers estimated 20% productivity gains while controlled studies showed 19% longer task completion, revealing measurement challenges in AI-assisted work.

Hidden costs:

Faster code generation creates larger pull requests that slow review cycles, potentially offsetting initial speed improvements in end-to-end delivery.

Onboarding metrics:

Time-to-activate improvements miss critical factors like 40-70% abandonment rates when underlying process complexity remains unchanged.

Control groups:

Attributing performance gains requires comparing against teams not using AI, which most organizations fail to measure systematically.

Deployment reality:

Goldman pairs Claude with rules systems and human oversight for exception-heavy workflows rather than autonomous agent replacement.

Further Reading

Ten AI Predictions for Legal Teams in 2026Corporate legal adoption hit 52%. Two-thirds expect to need outside counsel less. The power shift has numbers now.

How Forward-Deployed Engineers Make Enterprise AI Actually WorkJob postings jumped 800% last year. Enterprises embed engineers with teams accountable for outcomes, proximity over platforms.

Quick links

Why Enterprise AI Needs Embedded Builders, Not Just Platforms

AI Agents in Production: The 2025 Reality Check

Accounting Automation's Intelligent Future Beyond Rule-Based Systems

Enterprise AI Agent Engineering: Governance and Data Quality Foundations

Past Articles

When Power Availability Dictates Where You Can Build

Traditional economics suggest building where costs are lowest. Power availability dictates where you can build at a...

When Cheaper Infrastructure Creates Higher Costs

Infrastructure pricing makes regional arbitrage look obvious: deploy compute in cheaper regions, route strategically...

When Continuous Workloads Break Cloud Economics

Cloud economics assumed workloads would spike and scale to zero. Training runs, batch jobs, traffic surges—all episodic....

The Morning Check That Stopped Happening

The verification script sits three commands up in the terminal history. Tuesday's run, or maybe last week's. The analyst...

Past Articles

When Power Availability Dictates Where You Can Build

Traditional economics suggest building where costs are lowest. Power availability dictates where you can build at a...

When Cheaper Infrastructure Creates Higher Costs

Infrastructure pricing makes regional arbitrage look obvious: deploy compute in cheaper regions, route strategically...

When Continuous Workloads Break Cloud Economics

Cloud economics assumed workloads would spike and scale to zero. Training runs, batch jobs, traffic surges—all episodic....

The Morning Check That Stopped Happening

The verification script sits three commands up in the terminal history. Tuesday's run, or maybe last week's. The analyst...