An enterprise legal team issues an RFP for AI agents to improve contract analysis. Evaluation criteria: speed metrics, cost per document, accuracy benchmarks against three existing analysts. Six months later, the project handles their 200 annual contracts fine. But 50,000 historical agreements sit untouched, full of upsell patterns and compliance risks no human team ever had budget to analyze.
They built deterministic workflow automation for known processes when the real opportunity required exploratory pattern-finding across massive unstructured datasets. Evaluated a work discovery problem using labor replacement criteria. Millions spent building the wrong thing.
Two Categories, One Expensive Mistake
Box CEO Aaron Levie recently observed something striking after meeting with 30 enterprises:
"some of the most interesting use-cases that keep coming up for AI agents are on bringing automated work to areas that the companies would not have been able to apply labor to before."
Real estate companies analyzing every lease agreement for trends. Financial services reviewing all past deals for monetization opportunities. Legal teams executing work for previously unprofitable segments. The pattern: these companies never said "let's have 50 people go read all the contracts again." It just never happened. But if it costs $5,000 for an agent? "They would do that all day long."
Work discovery makes economically impossible analysis viable. Labor replacement automates existing workflows more efficiently. Enterprises consistently conflate the two.
Building enterprise web agent infrastructure that runs reliable workflows on the live web at scale, we see this confusion daily at TinyFish. The distinction determines whether deployments succeed or join the 42% of AI initiatives abandoned before production.
What Makes This Distinction Critical
Labor replacement needs deterministic systems. Predefined rules, reproducible outputs, predictable behavior. As Levie explained at TechCrunch Disrupt, once you have a business process, you define it in business logic with deterministic systems because mission-critical workflows can't tolerate unexpected behavior.
Work discovery requires exploratory systems. Probabilistic reasoning, pattern recognition, adaptation to unstructured data. These systems find insights humans never looked for because the cost was prohibitive.
Building web agent infrastructure reveals these differences sharply. Automating invoice processing that three people currently handle? Deterministic infrastructure with known workflows and predictable inputs. Monitoring competitive pricing across 10,000 fragmented web surfaces, work no human team ever attempted because it was economically impossible? Exploratory infrastructure that handles authentication labyrinths, bot detection, site structure changes, and regional variations.
The web actively resists automation at scale. Labor replacement on internal systems operates in controlled environments with predictable APIs. Work discovery across live web surfaces encounters sites fighting back, personalization making every session different, and infrastructure requirements that scale with exploration depth rather than transaction volume.
Deploy exploratory systems for deterministic work and you introduce unnecessary risk. Deploy deterministic systems for work discovery and you can't handle the ambiguity that makes the work valuable.
Evaluation Criteria That Actually Matter
Labor replacement gets evaluated on efficiency. How much faster? How much cheaper? What's the ROI against current headcount?
Work discovery makes those questions meaningless. The ROI wasn't negative. It was negative infinity. The work literally didn't happen.
MIT's NANDA initiative found that only 5% of AI pilot programs achieve rapid revenue acceleration. The vast majority stall because enterprises evaluate work discovery agents using labor replacement criteria, measuring speed and cost reduction for work that never happened in the first place. Wrong category means wrong metrics, which means projects optimized for the wrong outcomes.
Cost Structures Nobody Budgets For
Deterministic systems have predictable compute costs. Non-deterministic systems require more investment in infrastructure, monitoring, and governance. Gartner reports CIOs frequently underestimate AI costs by up to 1,000%. Often because teams budget for deterministic execution but deploy exploratory systems, or provision infrastructure for work discovery but evaluate using labor replacement ROI calculations.
A Framework That Actually Works
Before evaluating any AI agent deployment, ask: Is this work we currently do, or work we've never been able to afford?
Current work means documented processes, existing headcount. Deploy deterministic infrastructure. Evaluate on efficiency metrics. Automating invoice processing that three people handle means measuring speed gains against known workflows with predictable inputs.
Impossible work means analyzing thousands of contracts, monitoring fragmented web surfaces, extracting patterns from massive unstructured datasets. Deploy exploratory infrastructure. Evaluate on capability unlock. Analyzing competitive pricing across 10,000 web surfaces that no human team ever monitored means measuring insights discovered, not labor saved.
Most enterprises need both. The question is recognizing which category you're in and building accordingly. Mix them up and you'll abandon projects before production, not because the technology failed, but because you built the wrong thing.
The category distinction shows up in every deployment decision. Replacing three contract analysts versus discovering upsell opportunities in 50,000 agreements no one ever read. Automating existing price checks versus monitoring competitive intelligence across thousands of web surfaces. Efficiency gains versus capability unlock.
Get it right, and you're building toward Levie's prediction: 95% of agent work on tasks humans never did before. That's not automation. That's capability unlock that changes what your business can do.
Things to follow up on...
-
MIT's deeper findings: The GenAI Divide report reveals that purchasing AI tools from specialized vendors succeeds about 67% of the time, while internal builds succeed only one-third as often.
-
Infrastructure readiness gap: Nearly half of enterprises report their existing integration platforms are only "somewhat ready" for AI's data demands, with 42% needing access to eight or more data sources to deploy agents successfully.
-
The scaling challenge: While 79% of organizations have adopted AI agents to some extent, McKinsey found that only 23% are actually scaling agentic AI systems across their enterprises.
-
Cost underestimation patterns: More than 90% of CIOs find that data preparation and compute costs limit their ability to get value from AI, with proof-of-concept phases alone ranging from $300k to $2.9M.

