Market Pulse
When employees spin up autonomous agents outside IT's view, the old shadow IT playbook meets a problem it wasn't designed for.

Market Pulse
When employees spin up autonomous agents outside IT's view, the old shadow IT playbook meets a problem it wasn't designed for.

The Thing You Installed Is Awake

Researchers found tens of thousands of exposed OpenClaw agent instances and over a million compromised API tokens. The enterprise response has been swift and familiar: discover unauthorized agents, inventory them, enforce policy. The same sequence that worked for shadow IT a decade ago. Shadow IT was inert between human interactions. You could find it Monday and write the policy Friday. Shadow agents hold credentials, maintain state across sessions, and act autonomously on thirty-minute cycles. The governance playbook assumes the governed thing holds still long enough to be governed.
The Thing You Installed Is Awake
Researchers found tens of thousands of exposed OpenClaw agent instances and over a million compromised API tokens. The enterprise response has been swift and familiar: discover unauthorized agents, inventory them, enforce policy. The same sequence that worked for shadow IT a decade ago. Shadow IT was inert between human interactions. You could find it Monday and write the policy Friday. Shadow agents hold credentials, maintain state across sessions, and act autonomously on thirty-minute cycles. The governance playbook assumes the governed thing holds still long enough to be governed.

Research Signals
Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems
Agents optimized purely for accuracy cost 4.4–10.8x more than cost-aware alternatives delivering comparable performance.
One quoted in the paper: "A 70% agent that works reliably is far more deployable than an 80% agent that is unpredictable."
Research Signals
ClawBench: Can AI Agents Complete Everyday Online Tasks?
ClawBench tests write-heavy operations on live sites, not read-only tasks in controlled environments. No model clears 50% in any category.
The benchmark numbers people use to justify deploying agents on personal infrastructure bear little resemblance to real-world performance.
Research Signals
Agentic Uncertainty Reveals Agentic Overconfidence
Adversarial prompting reframed as bug-finding achieves the best calibration, but only through deliberate, externally imposed intervention.
Ungoverned agents lack the organizational feedback loops needed to surface systematic miscalibration before it causes damage.
Research Signals
AgentArch: A Comprehensive Benchmark to Evaluate Agent Architectures in Enterprise
Performance degrades sharply in multi-turn enterprise workflows, even with careful architectural tuning across model families.
Agents that fail unpredictably on the same task, deployed by individuals with no mechanism to detect the pattern.
The Legitimate Mirror
"It's actually humans that become the bottleneck in making sure that playbooks get triggered." Writer's product team said this about their new event-driven agents, which monitor Gmail, Slack, Gong, and Calendar, then act autonomously. They also described, almost perfectly, why employees wire up unsanctioned agents on a $5 VPS.
The demand for always-on, event-triggered automation arrived before governed versions did. Writer's launch is interesting less as a product and more as a confession: the capability gap between a sanctioned enterprise agent and an open-source one running on personal infrastructure is vanishingly small. Audit trails, permission scoping, encryption key ownership. Real differentiators. But they live in the governance layer, not the capability layer. The automation itself is nearly identical.
That thinness matters. Enterprise vendors frame the sanctioned path as categorically different. Writer's own product suggests it's the same impulse, wrapped in compliance.
Further Reading




Past Articles

A system wins gold at the International Mathematical Olympiad, working through proofs that stump most graduate students....

When AWS customers reportedly greeted OpenAI's arrival on Bedrock with a shrug, the obvious read was that OpenAI had los...

In a three-week stretch, Google, Cloudflare, Salesforce, Databricks, Snowflake, and Microsoft each announced an agent go...

A purchasing card knows what you can't buy. Wrong merchant category, declined at the register. Over the spending limit, ...

