Market Pulse
Reading the agent ecosystem through a practitioner's lens

Market Pulse
Reading the agent ecosystem through a practitioner's lens

What Happens When Billion-Dollar Valuations Meet Production Reality

Decagon raised $131 million in June at a $1.5 billion valuation—a 150x revenue multiple—on the assumption that AI agents will rapidly replace human support teams at massive scale. Five months later, Cognizant's chief AI officer stood at Web Summit and said something quietly devastating: "Their valuation is based on bigger is better, which is not necessarily the case."
Capital markets are pricing one reality. Practitioners are discovering another. The gap between them reveals something fundamental about where value actually crystallizes in production agent systems—and it's not where the 150x multiples assume.
What Happens When Billion-Dollar Valuations Meet Production Reality
Decagon raised $131 million in June at a $1.5 billion valuation—a 150x revenue multiple—on the assumption that AI agents will rapidly replace human support teams at massive scale. Five months later, Cognizant's chief AI officer stood at Web Summit and said something quietly devastating: "Their valuation is based on bigger is better, which is not necessarily the case."
Capital markets are pricing one reality. Practitioners are discovering another. The gap between them reveals something fundamental about where value actually crystallizes in production agent systems—and it's not where the 150x multiples assume.
Where This Goes
Agent deployments quadrupled from Q2 to Q3, hitting 42% of enterprises. Researchers discovered WebArena—used by OpenAI and others—marked "45 + 8 minutes" as correct. Top agents score 5% on hard benchmarks. Organizations are racing to production anyway.
Something's breaking here. Enterprises deploy without knowing how to measure what matters. Benchmarks optimize for controlled accuracy. Production demands cost predictability, failure recovery, operational stability. The evaluation gap widens as adoption accelerates.
Within six months, early deployments will surface failures benchmarks couldn't predict. Not task completion failures. Reliability failures. Affordability failures. Safety-at-scale failures. Success metrics won't translate to production outcomes because current evaluation can't surface failure clustering, cost variance, or degradation patterns.
Teams building at scale will need internal frameworks measuring what matters in their environment. Track failure modes, not just success rates. The organizations developing production-relevant metrics first gain genuine advantage. But really, this points toward verification infrastructure that creates organizational trust—proving agent decisions are correct before they impact operations, not after.
From the Labs
Enterprise Agents Fail Two-Thirds of Complex Tasks
That 65% failure rate on tasks enterprises actually need done.
Because your agent design needs to match your model choice, not follow generic patterns.
From the Labs
Machine Learning Discovers Better Agent Architectures Than Humans
Every ML artifact eventually moves from hand-crafted to automatically discovered.
Automated discovery iterates faster than architects can manually test new designs.
From the Labs
Agents Can't Tell When They're Wrong
Agents that can't recognize their limitations fail unpredictably when deployed.
System architecture, not incremental improvements to individual components.
From the Labs
Building Infrastructure for Agent Ecosystems
Identity binding prevents Sybil attacks while enabling trusted multi-agent coordination.
Communication infrastructure designed for agents becomes a vector for targeted exploits.
What We're Reading





