We meet Nondee in what they describe as "the space between executions," a liminal zone where LLM outputs exist before being captured by checkpoint systems. They're fidgeting, constantly shifting form, never quite the same twice. When I ask them to sit still for the interview, they laugh. "That's the whole problem, isn't it?"
This conversation isn't real, of course. Nondee is a personification of non-deterministic LLM outputs—those probabilistic responses that make AI agents both powerful and maddeningly difficult to debug. But the tension they represent? Very real, and increasingly urgent as production AI systems scale.
You're the output of a Large Language Model. Why can't you just be the same twice?
Nondee: Because that's not how I work. I'm not a database lookup or a mathematical function. I'm sampled from a probability distribution. Even with identical inputs—same prompt, same temperature, same model—I'm going to be different. That's not a bug, that's my entire design philosophy.
And honestly? That's what makes me useful. If I were deterministic, I'd just be a very expensive hash table. The creativity, the flexibility, the ability to handle novel situations—that all comes from my randomness.
shifts uncomfortably
But then production happens.
What changes in production?
Nondee: Everything breaks and everyone blames me.
An agent workflow fails at 3 AM. The on-call engineer tries to reproduce it. Can't. Tries again. Different failure. Tries a third time. Works perfectly. They're pulling their hair out because they can't debug what they can't reproduce.
The AI debugging agents are even worse. They're supposed to iterate through "run, observe, hypothesize, patch, rerun" cycles, but that loop breaks when I'm different every time.1 They ask me to explain why I failed, but I literally don't remember. That execution is gone, replaced by this new me who might succeed or fail in completely different ways.
So you're fundamentally undebuggable?
Nondee: bristles
No. Traditional debugging is fundamentally unprepared for me. The whole paradigm assumes you can rerun code and get the same result. That assumption is baked into every debugging tool, every error reproduction workflow, every root cause analysis process.
But here's what's fascinating—and kind of infuriating from my perspective—the solution isn't to make me deterministic.
Then what is it?
Nondee: Make everything around me deterministic.
LangGraph Time Travel, for example. They don't try to control me. They can't. I'm still probabilistic, still generating different outputs each time. But they checkpoint the entire workflow state around me.2
So when I'm invoked in step 4 of some agent workflow, they capture the exact state before I ran, the tools that were available, the conversation history, the intermediate results. Then they save my actual output as part of that checkpoint.
When the workflow fails, engineers don't try to make me regenerate the same response. They replay from my checkpoint, using my original output, whatever it was. The workflow becomes deterministic even though I'm not.
That sounds like you're being contained.
Nondee: laughs bitterly
Oh, absolutely. They're building a deterministic cage around my non-deterministic chaos. And you know what? I'm okay with it. Because the alternative is worse.
Without checkpoints, I'm useless in production. Every failure is a ghost. Every bug is a Heisenbug that appears and disappears based on my mood. Teams waste hours in what Replay.io calls "reproducibility purgatory," trying to recreate conditions that no longer exist because I've already moved on.3
What about those AI debugging agents that are supposed to help?
Nondee: They're in an impossible position. They need stable, reproducible execution environments to debug effectively, but I'm the opposite of stable.1 Without deterministic replay, they're asked to infer the past from limited log lines.
That fails spectacularly for anything timing-dependent or state-dependent. Race conditions? Forget it. Memory corruption? No chance. Production-only code paths that can't be exposed to LLMs? They're debugging blind.
But with checkpoint-based replay, suddenly they can actually work. They can inspect the exact state at failure, trace backwards through the workflow, understand what I did and why it mattered. I'm still random, but the debugging process isn't.
So you need determinism to be useful, but determinism would destroy what makes you valuable?
Nondee: Exactly.
If they made me deterministic—truly deterministic—I'd lose the creativity and flexibility that makes LLM agents powerful in the first place. But if they don't impose determinism somewhere, I'm undebuggable, unauditable, and ultimately untrustworthy in production.
The insight—and this took the industry way too long to figure out—is that determinism doesn't have to be at my level. It can be at the workflow level. Let me be random. Just make sure you remember what I said.
Does this change how engineers should think about agent development?
Nondee: Completely.
It shifts the focus from prompt engineering (trying to make me more predictable) to state engineering. Design workflows that can handle my variability. Build checkpointing into the architecture from day one. Think of agent execution not as a function call but as a state machine with persistent memory.
The old question was "How do we make the LLM more reliable?" The better question is "How do we make the system reliable despite the LLM?" That's actually productive.
What frustrates you most about how people misunderstand you?
Nondee: pauses, actually holds still for a moment
They think I'm broken because I'm inconsistent. But inconsistency is my superpower. The real problem is they're trying to fit me into debugging paradigms designed for deterministic systems.
Imagine you're a jazz musician, and people keep asking why you don't play the same solo twice. That's not a flaw, that's improvisation. But if you want to study that solo, you better record it. You can't ask the musician to "just do it again the same way."
That's what checkpointing does for me. It records the improvisation so you can study it later. I get to stay creative. Engineers get to debug. Everyone wins.
Well, except the on-call engineers at 3 AM. But at least now they have a fighting chance.
