The Category Confusion Costing Enterprises Millions

An enterprise legal team issues an RFP for AI agents to improve contract analysis. The evaluation criteria focus on speed metrics and accuracy benchmarks against their existing team of three analysts. The winning vendor promises 40% faster processing.

Six months later, the project stalls. The agent handles the 200 contracts the team reviews annually just fine. But the real opportunity sits untouched: 50,000 historical contracts that could reveal upsell patterns, compliance risks, and revenue opportunities. No human team ever had budget to analyze them. The deterministic system built for known workflows can't handle exploratory pattern-finding across massive unstructured datasets.

The enterprise evaluated a work discovery problem using labor replacement criteria. They just spent six months building the wrong thing.

Two Categories, One Expensive Mistake

Box CEO Aaron Levie recently observed something striking:

“

"some of the most interesting use-cases that keep coming up for AI agents are on bringing automated work to areas that the companies would not have been able to apply labor to before."

Real estate companies analyzing every lease agreement for patterns they never knew existed. Financial services reviewing decades of past deals for monetization opportunities. Legal teams executing work for previously unprofitable segments. These companies never said "let's have 50 people read all the contracts again." It just never happened. But if it costs $5,000 for an agent? "They would do that all day long."

Levie's talking about work discovery: making economically impossible analysis viable. Labor replacement automates existing workflows more efficiently. The two look similar on the surface. Forty-two percent of AI initiatives are now abandoned before reaching production, many because teams built for the wrong category.

Why the Confusion Costs Real Money

The architectural requirements diverge sharply. Labor replacement needs deterministic systems: predefined rules, predictable processes, reproducible outputs. Work discovery requires exploratory systems that can reason probabilistically, recognize patterns, adapt to unstructured data.

Building web agent infrastructure at scale makes these differences visible fast. Deterministic execution for known workflows requires fundamentally different infrastructure than exploratory systems navigating the messy reality of the live web. Deploy an exploratory agent with deterministic safeguards and you risk data leaks or damaged databases. The architectural mismatch creates real operational risk.

The evaluation criteria shift completely. Labor replacement gets measured on efficiency: How much faster? How much cheaper? Work discovery makes those questions meaningless. The ROI wasn't negative. It was negative infinity. The work literally didn't happen. You can't calculate cost savings against labor that was never deployed. MIT's NANDA initiative found that only 5% of AI pilots achieve rapid revenue acceleration. Most stall because enterprises evaluate work discovery agents using labor replacement criteria.

Then there's cost structure. Deterministic systems have predictable compute costs. Exploratory systems require more investment in infrastructure, monitoring, and governance. Gartner reports that CIOs frequently underestimate AI costs by up to 1,000%, often because they budget for deterministic execution but deploy exploratory systems.

The Question That Actually Matters

The Category Test

Before your next AI agent evaluation: Is this work we currently do, or work we've never been able to afford?

If it's work you currently do (documented processes, existing headcount), you need deterministic infrastructure. Evaluate on efficiency metrics. Architect for predictable behavior.

If it's work you've never done because it was economically impossible (analyzing thousands of contracts, monitoring fragmented web surfaces, extracting patterns from massive datasets), you need exploratory infrastructure. Evaluate on capability unlock. Architect for non-deterministic outputs with appropriate containment.

The category distinction matters. It's the difference between replacing three contract analysts and discovering opportunities in 50,000 agreements no one ever read. Between automating existing workflows and unlocking capabilities that were never economically viable.

Levie predicts 95% of agent work will be on tasks humans never did before. That only happens if enterprises stop evaluating work discovery problems with labor replacement criteria. The confusion isn't just conceptual. It's costing millions in failed deployments and missed opportunities.

Things to follow up on...

Aaron Levie's enterprise meetings: Levie's prediction emerged from meetings with about 30 enterprises across two days, revealing patterns in how companies think about impossible-to-afford work versus existing workflows.
The "church and state" separation: At TechCrunch Disrupt 2025, Levie explained why mission-critical processes require separation between deterministic and non-deterministic systems to prevent agents from unexpectedly damaging production databases.
McKinsey's scaling gap: While 79% of organizations report adopting AI agents, only 23% are actually scaling agentic AI systems across their enterprises, revealing the difficulty of moving from experimentation to production deployment.
Integration platform readiness: Nearly half of enterprises report their existing integration platforms are only "somewhat ready" for AI's data demands, with 42% needing access to eight or more data sources to deploy agents successfully.