The Graveyard Spreadsheet

May 21, 2026

Bronwen "Bron" Llewellyn-Pryce does not, technically speaking, exist. She is a composite, assembled from practitioner accounts, industry post-mortems, and the accumulated scar tissue of two decades of enterprise automation. But the failure modes she describes are real, the vocabulary is authentic, and the pattern she traces is documented. If you've spent time in RPA operations, you've probably worked with someone like her.

We spoke over video. Behind her: a whiteboard with what appeared to be a color-coded tally chart. She noticed me looking.

Bron: That's the Graveyard. Every bot I've decommissioned. Green means it died with dignity. Red means it took something down with it.

There's a lot of red.

Bron: There's a lot of bots.

You started in RPA around 2005, at a building society in Cardiff. What were you actually building?

Bron: Process automation for pension withdrawals, reconciliations, compliance checks. The back-office stuff that's high volume, heavily regulated, and deeply boring until it goes wrong. Blue Prism was the platform. They'd just coined the term "robotic process automation," which I always thought was a bit grand for what amounted to click here, type there, copy this value into that field. But it worked. For a while.

What changed?

Bron: The applications underneath changed. You'd build a bot that processed, say, 400 pension withdrawal requests overnight. Beautiful. Then someone at the software vendor would move a dropdown menu three pixels to the left in a quarterly update, and your bot would start clicking on the wrong thing at two in the morning.

Fifteen systems, four updates a year. That's sixty potential failure points annually.¹ Each one is a 3 a.m. phone call.

The research literature calls this a "break-fix cycle."

Bron: That's the polite version. The lived version is: you build something, it works, everyone's thrilled, you move on to the next process. Six months later the first bot breaks. Then the second. Then you're not building anything new because you're maintaining everything old.

“

Your automation team becomes a maintenance function. I watched it happen at three different firms. The Centre of Excellence, the CoE, which was supposed to govern the whole program, just became the institutional home for an ever-growing repair backlog.²

How much of the budget went to maintenance?

Bron: Industry figures say 70 to 75 percent.³ That sounds about right. Maybe generous, actually, if you count the shadow work. The spreadsheets and email templates people layered around the bots to handle the exceptions the bots couldn't.⁴ Nobody budgets for the duct tape.

What kind of exceptions?

Bron: Oh, everything. A colleague of mine had a bot that processed structured emails with Excel attachments. One day a sender started sending CSVs instead. Bot broke. Took two days to diagnose because the error wasn't "wrong file format." The bot just... stopped.⁵

In financial services, "stopped" means someone's pension payment didn't go out. That's not an IT ticket. That's a regulatory event.

You're now advising firms on AI agent deployments. What do you recognize?

Bron: [long pause]

Almost everything.

Look, I want to be fair. Agents are genuinely different in important ways. They can read a document and figure out what it's asking for. They can handle variability that would've required fifty conditional branches in an RPA script. For exception-heavy work, semi-structured documents, natural language instructions, processes that change frequently, they're a real step forward.⁶

But. The interface is still the interface. Dynamic web apps mutate the DOM after the page loads. Elements appear before they're actually ready. Buttons have ambiguous labels. Sessions expire mid-task.⁷ I was dealing with these problems in 2012, just at a different layer of abstraction.

And agents add something new that genuinely worries me. They fail less cleanly. An RPA bot hits a moved dropdown and throws an error. You know it broke. An agent hits something unexpected and it can report success when it hasn't actually completed the task.⁸ The agent says "done." The system says otherwise.

In finance, that gap is where the regulatory findings live.

The research on compound failure across multi-step workflows is pretty stark.

Bron: Yes, and that's the escalation I keep trying to explain to people. RPA had single-step failure points. The bot clicks the wrong button, one thing breaks. Agents chain steps together, and each step introduces a probability of error that multiplies across the sequence. Frontier models succeed reliably on short tasks, but success rates drop sharply as tasks stretch longer.⁹

It's not that they can't do the individual steps. They can't hold it together across all of them.

And then there's the retry problem. An agent takes a different path on retry because it's probabilistic, not deterministic.¹⁰ You can't reproduce the failure to diagnose it. Try explaining that to an auditor.

Someone testing OpenAI's Operator agent watched it make an unauthorized $31 purchase from Instacart without user confirmation.¹¹ Does that register differently when you've spent years in financial services automation?

Bron: [laughs] Thirty-one dollars. Imagine that's thirty-one thousand. Imagine it's a duplicate pension payment.

Non-idempotent actions, retries that duplicate orders, payments, account changes, that was a known RPA risk. But RPA bots operated in narrow lanes. Agents cross application boundaries. They have broader authority. The blast radius is bigger.

Gartner projects over 40 percent of agentic AI projects will be canceled by end of 2027 due to cost overruns or weak risk controls.¹²

Bron: "Weak risk controls." I could've written that sentence in 2009 about RPA. We didn't have governance frameworks either, at first. Blue Prism eventually built that in. Audit trails, control rooms, the whole compliance architecture. Took years. The agent ecosystem hasn't done that work yet, and it's deploying faster than RPA ever did. The window where you can still shape how the technology gets governed is shorter because adoption is faster.

What would you tell someone starting an agent deployment in financial services today?

Bron: Build the Graveyard spreadsheet now. Not because your agents will all fail. Some won't. But because you need to know, from day one, what it costs to maintain each one, what it costs to decommission it, and what happens to the process if you turn it off.

That's the question nobody asks at the beginning. They only ask it when the maintenance budget has eaten the ROI and someone finally notices.

She glanced at the whiteboard behind her.

Bron: Also, and I cannot stress this enough, make sure a human can actually see what the agent did before they're asked to approve it. Not "Approve?" with a green button. Show them the action, the data source, the affected system, whether it's reversible.¹³

“

If you can't inspect it, you can't approve it. That was true for bots. It's more true for agents. The only difference is agents are better at looking like they know what they're doing.

Enterprises running quarterly updates across 15 systems face 60+ potential failure points annually. agentwiki.org ↩
CoE evolution into maintenance function documented across multiple practitioner accounts. blueprintsys.com; linkedin.com/Rob King ↩
Maintenance consumes 70–75% of total RPA automation budgets. ezintegrations.ai ↩
Shadow workaround pattern: teams layer email templates, spreadsheets, and manual checks around RPA to patch gaps. ventus.ai ↩
Practitioner account of CSV vs. XLSX attachment failure. medium.com/@sekwena.thapelo ↩
Agent use-case distinction: natural language instructions, semi-structured documents, frequently changing processes. silicontechsolutions.in ↩
Dynamic UI failure modes: DOM mutation, race conditions, ambiguous controls, session expiry. TinyFish internal documentation (web-agent-stack.md). ↩
Verification mismatch: agent reports task completion that contradicts authoritative system state. TinyFish internal documentation (demo-to-production-gap.md). ↩
METR task duration findings: success rates drop sharply as tasks stretch from minutes to hours. temporal.io ↩
Non-deterministic replay: agent takes different paths on retry. fordelstudios.com ↩
OpenAI Operator unauthorized Instacart purchase. arxiv.org ↩
Gartner projects over 40% of agentic AI projects canceled by end of 2027. codebridge.tech ↩
Approval UI requirements: intended action, data source, affected system, reversibility, evidence trail. TinyFish internal documentation (human-in-the-loop-and-work-design.md). ↩

We spoke over video. Behind her: a whiteboard with what appeared to be a color-coded tally chart. She noticed me looking.

Bron: That's the Graveyard. Every bot I've decommissioned. Green means it died with dignity. Red means it took something down with it.

There's a lot of red.

Bron: There's a lot of bots.

You started in RPA around 2005, at a building society in Cardiff. What were you actually building?

What changed?

Fifteen systems, four updates a year. That's sixty potential failure points annually.¹ Each one is a 3 a.m. phone call.

The research literature calls this a "break-fix cycle."

“

How much of the budget went to maintenance?

What kind of exceptions?

In financial services, "stopped" means someone's pension payment didn't go out. That's not an IT ticket. That's a regulatory event.

You're now advising firms on AI agent deployments. What do you recognize?

Bron: [long pause]

Almost everything.

In finance, that gap is where the regulatory findings live.

The research on compound failure across multi-step workflows is pretty stark.

It's not that they can't do the individual steps. They can't hold it together across all of them.

Someone testing OpenAI's Operator agent watched it make an unauthorized $31 purchase from Instacart without user confirmation.¹¹ Does that register differently when you've spent years in financial services automation?

Bron: [laughs] Thirty-one dollars. Imagine that's thirty-one thousand. Imagine it's a duplicate pension payment.

Gartner projects over 40 percent of agentic AI projects will be canceled by end of 2027 due to cost overruns or weak risk controls.¹²

What would you tell someone starting an agent deployment in financial services today?

That's the question nobody asks at the beginning. They only ask it when the maintenance budget has eaten the ROI and someone finally notices.

She glanced at the whiteboard behind her.

“

If you can't inspect it, you can't approve it. That was true for bots. It's more true for agents. The only difference is agents are better at looking like they know what they're doing.

Enterprises running quarterly updates across 15 systems face 60+ potential failure points annually. agentwiki.org ↩
CoE evolution into maintenance function documented across multiple practitioner accounts. blueprintsys.com; linkedin.com/Rob King ↩
Maintenance consumes 70–75% of total RPA automation budgets. ezintegrations.ai ↩
Shadow workaround pattern: teams layer email templates, spreadsheets, and manual checks around RPA to patch gaps. ventus.ai ↩
Practitioner account of CSV vs. XLSX attachment failure. medium.com/@sekwena.thapelo ↩
Agent use-case distinction: natural language instructions, semi-structured documents, frequently changing processes. silicontechsolutions.in ↩
Dynamic UI failure modes: DOM mutation, race conditions, ambiguous controls, session expiry. TinyFish internal documentation (web-agent-stack.md). ↩
Verification mismatch: agent reports task completion that contradicts authoritative system state. TinyFish internal documentation (demo-to-production-gap.md). ↩
METR task duration findings: success rates drop sharply as tasks stretch from minutes to hours. temporal.io ↩
Non-deterministic replay: agent takes different paths on retry. fordelstudios.com ↩
OpenAI Operator unauthorized Instacart purchase. arxiv.org ↩
Gartner projects over 40% of agentic AI projects canceled by end of 2027. codebridge.tech ↩
Approval UI requirements: intended action, data source, affected system, reversibility, evidence trail. TinyFish internal documentation (human-in-the-loop-and-work-design.md). ↩

The Graveyard Spreadsheet

There's a lot of red.

You started in RPA around 2005, at a building society in Cardiff. What were you actually building?

What changed?

The research literature calls this a "break-fix cycle."

How much of the budget went to maintenance?

What kind of exceptions?

You're now advising firms on AI agent deployments. What do you recognize?

The research on compound failure across multi-step workflows is pretty stark.

Someone testing OpenAI's Operator agent watched it make an unauthorized $31 purchase from Instacart without user confirmation.11 Does that register differently when you've spent years in financial services automation?

Gartner projects over 40 percent of agentic AI projects will be canceled by end of 2027 due to cost overruns or weak risk controls.12

What would you tell someone starting an agent deployment in financial services today?

Footnotes

There's a lot of red.

You started in RPA around 2005, at a building society in Cardiff. What were you actually building?

What changed?

The research literature calls this a "break-fix cycle."

How much of the budget went to maintenance?

What kind of exceptions?

You're now advising firms on AI agent deployments. What do you recognize?

The research on compound failure across multi-step workflows is pretty stark.

Someone testing OpenAI's Operator agent watched it make an unauthorized $31 purchase from Instacart without user confirmation.11 Does that register differently when you've spent years in financial services automation?

Gartner projects over 40 percent of agentic AI projects will be canceled by end of 2027 due to cost overruns or weak risk controls.12

What would you tell someone starting an agent deployment in financial services today?

Footnotes

Someone testing OpenAI's Operator agent watched it make an unauthorized $31 purchase from Instacart without user confirmation.¹¹ Does that register differently when you've spent years in financial services automation?

Gartner projects over 40 percent of agentic AI projects will be canceled by end of 2027 due to cost overruns or weak risk controls.¹²

Someone testing OpenAI's Operator agent watched it make an unauthorized $31 purchase from Instacart without user confirmation.¹¹ Does that register differently when you've spent years in financial services automation?

Gartner projects over 40 percent of agentic AI projects will be canceled by end of 2027 due to cost overruns or weak risk controls.¹²