CURRENT | Market Pulse

Unauthorized Actors

The Thing You Installed Is Awake

By Nora Kaplan— May 7, 2026

Feature image for article: The Thing You Installed Is Awake

Researchers found tens of thousands of exposed OpenClaw agent instances and over a million compromised API tokens. The enterprise response has been swift and familiar: discover unauthorized agents, inventory them, enforce policy. The same sequence that worked for shadow IT a decade ago. Shadow IT was inert between human interactions. You could find it Monday and write the policy Friday. Shadow agents hold credentials, maintain state across sessions, and act autonomously on thirty-minute cycles. The governance playbook assumes the governed thing holds still long enough to be governed.

Unauthorized Actors

The Thing You Installed Is Awake

By Nora Kaplan— May 7, 2026

Researchers found tens of thousands of exposed OpenClaw agent instances and over a million compromised API tokens. The enterprise response has been swift and familiar: discover unauthorized agents, inventory them, enforce policy. The same sequence that worked for shadow IT a decade ago. Shadow IT was inert between human interactions. You could find it Monday and write the policy Friday. Shadow agents hold credentials, maintain state across sessions, and act autonomously on thirty-minute cycles. The governance playbook assumes the governed thing holds still long enough to be governed.

Evidence Landscape

Early 2026 replaced hypothetical agent risk with specific numbers. Exposed control panels across 82 countries. Leaked API tokens in the millions. Incident rates that make the vague feel uncomfortably concrete.

The interesting thing to track right now is the convergence of responses. Platform tooling, regulatory warnings, standards bodies, and security investment are all accelerating at once, each reacting to the same underlying reality: agent deployment got well ahead of agent governance, and the evidence is now too loud to treat as edge cases. Here's the current data landscape.

Evidence Landscape

Early 2026 replaced hypothetical agent risk with specific numbers. Exposed control panels across 82 countries. Leaked API tokens in the millions. Incident rates that make the vague feel uncomfortably concrete.

The interesting thing to track right now is the convergence of responses. Platform tooling, regulatory warnings, standards bodies, and security investment are all accelerating at once, each reacting to the same underlying reality: agent deployment got well ahead of agent governance, and the evidence is now too loud to treat as edge cases. Here's the current data landscape.

Security Crisis

OpenClaw Exposure Spans 82 Countries, 50,000 Instances

Researchers identified 42,000+ unique IPs with exposed OpenClaw control panels and roughly 50,000 instances vulnerable to remote code execution. The Moltbook database leak added 1.5 million agent API tokens and 35,000 emails to the damage. Patching the CVE didn't revoke the tokens or purge 341 malicious skills already sitting in the registry.

Platform Response

Microsoft Agent 365 Goes After Shadow Agents

GA since May 1, Agent 365 syncs agent discovery across AWS Bedrock and Google Cloud, with local detection expanding to 18 agent types by June. Defender will soon map each agent's devices, MCP servers, associated identities, and reachable cloud resources. The cross-cloud scope is the notable part: Microsoft is treating agent visibility as a security surface that doesn't stop at platform boundaries.

Authorization Gap

Fourteen Percent of Agents Launch With Full Approval

Gravitee surveyed 900+ practitioners and found only 14.4% of AI agents reach production with complete security and IT sign-off. More than half operate with no security oversight or logging at all. Separately, IBM/Censuswide found 80% of employees at organizations with 500+ people use unsanctioned AI tools. Eighty-eight percent of organizations report confirmed or suspected agent-related security incidents.

Regulatory Signal

Dutch Authority Warns Against OpenClaw-Style Agent Systems

The Autoriteit Persoonsgegevens issued a formal warning on February 12, calling OpenClaw and similar open-source agent systems a "Trojan Horse" and citing risks of data breaches and account takeovers. The AP invoked GDPR enforcement powers including processing suspension and fines. It also called for clarification that autonomous agents fall within the EU AI Act's scope, a move that would broaden the regulatory perimeter considerably.

Investment Signal

Agent Governance Wins RSAC Innovation Sandbox for First Time

In 20 years of the RSAC Innovation Sandbox competition, an agent governance company had never won. Geordie AI broke that streak on March 23 with a platform that discovers, maps, and monitors agents across code, cloud, and endpoints. One finalist demonstrated a Fortune 500 customer finding 600+ agents it didn't know it had. Security capital is following the governance problem.

Standards Formation

NIST Launches Dedicated Agent Standards Initiative in February

NIST's Center for AI Standards and Innovation stood up the AI Agent Standards Initiative on February 17, the first U.S. government program focused on agent interoperability and security standards. Three pillars cover industry-led standards development, open-source protocol maintenance, and agent security and identity research. An AI Agent Interoperability Profile is planned for Q4 2026. The likely trajectory runs from voluntary framework to compliance requirement.

Research Signals

Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems

A systematic review of 12 agentic AI benchmarks finds that cost goes unmeasured despite 50x variations, reliability collapses from 60% to 25% across repeated runs, and lab-to-production performance gaps hit 37%. The benchmarks shadow deployers cite don't describe the systems they actually run.

How bad is the cost blindspot?

Agents optimized purely for accuracy cost 4.4–10.8x more than cost-aware alternatives delivering comparable performance.

What do practitioners actually want?

One quoted in the paper: "A 70% agent that works reliably is far more deployable than an 80% agent that is unpredictable."

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Testing 153 real tasks across 144 live production websites, the top frontier model completes just 33.3% of everyday online tasks. Models scoring 65–75% on sandbox benchmarks crater when they hit actual web complexity, purchases, bookings, and forms.

Why do sandbox scores mislead?

ClawBench tests write-heavy operations on live sites, not read-only tasks in controlled environments. No model clears 50% in any category.

What does this mean for shadow deployments?

The benchmark numbers people use to justify deploying agents on personal infrastructure bear little resemblance to real-world performance.

Agentic Uncertainty Reveals Agentic Overconfidence

Agents can't predict whether they'll succeed. Some models succeeding only 22% of the time predict 77% success, a 55-point overconfidence gap. Without external calibration, nothing corrects this. Shadow agents proceed confidently into failure with no feedback signal.

Can overconfidence be fixed?

Adversarial prompting reframed as bug-finding achieves the best calibration, but only through deliberate, externally imposed intervention.

Why does this compound the shadow problem?

Ungoverned agents lack the organizational feedback loops needed to surface systematic miscalibration before it causes damage.

AgentArch: A Comprehensive Benchmark to Evaluate Agent Architectures in Enterprise

Across 18 agentic configurations and frontier models on enterprise workflows, the best result on complex tasks reaches 35.3% success. Run the same tasks eight times and consistent success drops to 6.34%. These are the agents people are deploying on personal VPS instances without oversight.

How steep is the complexity cliff?

Performance degrades sharply in multi-turn enterprise workflows, even with careful architectural tuning across model families.

What does 6.34% consistency look like in practice?

Agents that fail unpredictably on the same task, deployed by individuals with no mechanism to detect the pattern.

Research Signals

Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems

A systematic review of 12 agentic AI benchmarks finds that cost goes unmeasured despite 50x variations, reliability collapses from 60% to 25% across repeated runs, and lab-to-production performance gaps hit 37%. The benchmarks shadow deployers cite don't describe the systems they actually run.

How bad is the cost blindspot?

Agents optimized purely for accuracy cost 4.4–10.8x more than cost-aware alternatives delivering comparable performance.

What do practitioners actually want?

One quoted in the paper: "A 70% agent that works reliably is far more deployable than an 80% agent that is unpredictable."

Research Signals

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Testing 153 real tasks across 144 live production websites, the top frontier model completes just 33.3% of everyday online tasks. Models scoring 65–75% on sandbox benchmarks crater when they hit actual web complexity, purchases, bookings, and forms.

Why do sandbox scores mislead?

ClawBench tests write-heavy operations on live sites, not read-only tasks in controlled environments. No model clears 50% in any category.

What does this mean for shadow deployments?

The benchmark numbers people use to justify deploying agents on personal infrastructure bear little resemblance to real-world performance.

Research Signals

Agentic Uncertainty Reveals Agentic Overconfidence

Agents can't predict whether they'll succeed. Some models succeeding only 22% of the time predict 77% success, a 55-point overconfidence gap. Without external calibration, nothing corrects this. Shadow agents proceed confidently into failure with no feedback signal.

Can overconfidence be fixed?

Adversarial prompting reframed as bug-finding achieves the best calibration, but only through deliberate, externally imposed intervention.

Why does this compound the shadow problem?

Ungoverned agents lack the organizational feedback loops needed to surface systematic miscalibration before it causes damage.

Research Signals

AgentArch: A Comprehensive Benchmark to Evaluate Agent Architectures in Enterprise

Across 18 agentic configurations and frontier models on enterprise workflows, the best result on complex tasks reaches 35.3% success. Run the same tasks eight times and consistent success drops to 6.34%. These are the agents people are deploying on personal VPS instances without oversight.

How steep is the complexity cliff?

Performance degrades sharply in multi-turn enterprise workflows, even with careful architectural tuning across model families.

What does 6.34% consistency look like in practice?

Agents that fail unpredictably on the same task, deployed by individuals with no mechanism to detect the pattern.

The Legitimate Mirror

Writer's Event-Triggered Agents Name the Exact Problem Driving Shadow AI Adoption

"It's actually humans that become the bottleneck in making sure that playbooks get triggered." Writer's product team said this about their new event-driven agents, which monitor Gmail, Slack, Gong, and Calendar, then act autonomously. They also described, almost perfectly, why employees wire up unsanctioned agents on a $5 VPS.

The demand for always-on, event-triggered automation arrived before governed versions did. Writer's launch is interesting less as a product and more as a confession: the capability gap between a sanctioned enterprise agent and an open-source one running on personal infrastructure is vanishingly small. Audit trails, permission scoping, encryption key ownership. Real differentiators. But they live in the governance layer, not the capability layer. The automation itself is nearly identical.

That thinness matters. Enterprise vendors frame the sanctioned path as categorically different. Writer's own product suggests it's the same impulse, wrapped in compliance.

The Legitimate Mirror

Writer's Event-Triggered Agents Name the Exact Problem Driving Shadow AI Adoption

It's actually humans that become the bottleneck in making sure that playbooks get triggered." Writer's product team said this about their new event-driven agents, which monitor Gmail, Slack, Gong, and Calendar, then act autonomously. They also described, almost perfectly, why employees wire up unsanctioned agents on a $5 VPS.

The demand for always-on, event-triggered automation arrived before governed versions did. Writer's launch is interesting less as a product and more as a confession: the capability gap between a sanctioned enterprise agent and an open-source one running on personal infrastructure is vanishingly small. Audit trails, permission scoping, encryption key ownership. Real differentiators. But they live in the governance layer, not the capability layer. The automation itself is nearly identical.

That thinness matters. Enterprise vendors frame the sanctioned path as categorically different. Writer's own product suggests it's the same impulse, wrapped in compliance.

Production bottleneck:

Sanctioned AI tools reach production only 5% of the time versus 40% for consumer tools, clarifying why employees build their own

Governance spending:

AI governance investment projected at $492 million in 2026, surpassing $1 billion by 2030 per Gartner, chasing a problem already loose

Detection gap:

Agent traffic authenticates via OAuth over HTTPS, making sanctioned and unsanctioned automation nearly indistinguishable to most security monitoring tools

Demand signal:

Notion's custom agents beta produced 21,000 agents from early testers alone, confirming proactive-agent appetite well beyond any single platform

Breach awareness:

67% of executives believe their company already suffered a data breach from unapproved AI tools, per Writer's own survey data