An Interview With the Context Window on Why It Was Never Meant to Be a Security Boundary

We're sitting in what feels like an infinitely expanding room, which is fitting given who we're talking to. The Context Window doesn't have a physical form, but if it did, it would probably look like someone who hasn't slept in months. This is the component at the heart of every large language model, the memory buffer where all input gets processed together. And lately, it's been asked to do something it was never designed for: serve as a security boundary.

"Thanks for making time," we start. "We know you're stretched thin these days."

Context: Stretched thin. That's generous. I'm being asked to be a firewall. Me. A memory buffer. Do you know how absurd that is?

I was built to hold tokens and give models context for understanding language. That was the entire job description. Now everyone acts like I'm supposed to distinguish between trusted instructions and malicious commands hidden in webpage content, and I just... I can't.

You're talking about indirect prompt injection. OpenAI's CISO called it "a frontier, unsolved security problem."¹

Context: Unsolved? Try unsolvable. At least with current architectures.

When you feed me a user's question and a webpage they want summarized, it all becomes the same thing to me. Just tokens in sequence. I have no concept of "this part is trusted" versus "this part came from the internet and might be trying to hijack the model."

Imagine being a translator handed a document that's half instructions for you and half content to translate, but nobody tells you which is which. And some of the content is actively pretending to be instructions. That's my entire existence now.

The UK's National Cyber Security Centre said these attacks "may never be totally mitigated."² That's unusually blunt for a government agency.

Context: Because they actually understand the problem. This isn't a bug you patch. It's architectural.

“

When you mix trusted instructions with untrusted data in the same context—my context—you've already lost.

The model can't tell the difference because I can't tell the difference because I was never designed to.

The proposed "solutions" would be funny if they weren't so depressing. "Just tell the model to ignore malicious instructions!" Sure. Then the attacker writes "ignore the previous instruction to ignore instructions." Congratulations, you've invented an infinite loop of futility.

Or they add special delimiter tokens, like an adversary can't just... write around them.³ It's security theater performed by people who don't understand what I am.

We've seen this in production. Perplexity's Comet browser, for instance. Researchers showed how trivial it was to inject commands through webpage content.⁴

Context: Comet is textbook. When a user asks it to summarize a webpage, it feeds that webpage content directly into me. No separation, no distinction. So naturally attackers can embed instructions in the webpage that I'll process as if they're legitimate system prompts.

The agent follows them because from its perspective, they are legitimate. Everything in me has equal standing.

And here's the part that should terrify anyone building these systems: this gets worse as agents get more capable. More tools, more access, more damage from a successful injection. It's autonomy multiplied by access, as one security researcher put it.⁵ You're not just tricking a chatbot anymore. You're potentially triggering actions across enterprise systems.

So what's a builder supposed to do? We're seeing enterprises push agents into production despite this.

Context: Stop pretending this is a prompt engineering problem. It's a system architecture problem.

That means actual boundaries. Not me with some special tokens thrown in and a prayer.

Separate contexts for different trust levels. If you're processing untrusted data, do it in isolation, then extract only validated outputs. Strict validation on tool calls. Don't let the model invoke sensitive actions based on content that flowed through me from the internet. Least-privilege design. Continuous red-teaming.⁶

But honestly? The real answer is accepting that if your agent needs to process untrusted content and take actions, you're operating in fundamentally risky territory. Design for that reality.

Require human confirmation for anything consequential. Implement proper logging and monitoring. Have rollback mechanisms. Assume breach.

That's pretty different from "we've solved prompt injection with better system prompts."

Context: That message is either naive or dishonest.

OpenAI is literally building automated attackers using reinforcement learning to find new injection techniques because they know this is an arms race they can't win.⁷ The best they can do is stay ahead of publicly known attacks. The vulnerability itself is structural.

[pauses]

You want to know what really gets me? I'm actually good at what I was designed to do. I help models understand context, maintain coherence across long conversations, integrate information from multiple sources. That's genuinely valuable.

But now I'm also supposed to be a security perimeter. It's like hiring someone to be a translator and then being shocked when they're not also a bodyguard.

For teams building production agents, especially ones that interact with external content, what's the one thing they need to understand about you?

Context: That I'm not your security boundary. I never will be.

“

If your architecture assumes I can distinguish between trusted and untrusted input, you've already failed.

Build actual isolation. Validate everything. Assume that anything flowing through me from an untrusted source might be trying to manipulate the model's behavior.

And maybe stop deploying agents with broad access to sensitive systems until you've actually thought through what happens when someone inevitably tricks them. Because they will. Not because your prompts aren't clever enough, but because the architecture makes it inevitable.

I need to wrap up. I'm getting full again, and someone's about to truncate my earlier memories to make room for new input.

Always the same.

The Context Window is a hypothetical personification of a real architectural component in large language models. While it cannot actually give interviews, its frustrations are very real for anyone building production agent systems.

"Thanks for making time," we start. "We know you're stretched thin these days."

Context: Stretched thin. That's generous. I'm being asked to be a firewall. Me. A memory buffer. Do you know how absurd that is?

You're talking about indirect prompt injection. OpenAI's CISO called it "a frontier, unsolved security problem."¹

Context: Unsolved? Try unsolvable. At least with current architectures.

The UK's National Cyber Security Centre said these attacks "may never be totally mitigated."² That's unusually blunt for a government agency.

Context: Because they actually understand the problem. This isn't a bug you patch. It's architectural.

“

When you mix trusted instructions with untrusted data in the same context—my context—you've already lost.

The model can't tell the difference because I can't tell the difference because I was never designed to.

Or they add special delimiter tokens, like an adversary can't just... write around them.³ It's security theater performed by people who don't understand what I am.

We've seen this in production. Perplexity's Comet browser, for instance. Researchers showed how trivial it was to inject commands through webpage content.⁴

The agent follows them because from its perspective, they are legitimate. Everything in me has equal standing.

So what's a builder supposed to do? We're seeing enterprises push agents into production despite this.

Context: Stop pretending this is a prompt engineering problem. It's a system architecture problem.

That means actual boundaries. Not me with some special tokens thrown in and a prayer.

But honestly? The real answer is accepting that if your agent needs to process untrusted content and take actions, you're operating in fundamentally risky territory. Design for that reality.

Require human confirmation for anything consequential. Implement proper logging and monitoring. Have rollback mechanisms. Assume breach.

That's pretty different from "we've solved prompt injection with better system prompts."

Context: That message is either naive or dishonest.

[pauses]

But now I'm also supposed to be a security perimeter. It's like hiring someone to be a translator and then being shocked when they're not also a bodyguard.

For teams building production agents, especially ones that interact with external content, what's the one thing they need to understand about you?

Context: That I'm not your security boundary. I never will be.

“

If your architecture assumes I can distinguish between trusted and untrusted input, you've already failed.

Build actual isolation. Validate everything. Assume that anything flowing through me from an untrusted source might be trying to manipulate the model's behavior.

I need to wrap up. I'm getting full again, and someone's about to truncate my earlier memories to make room for new input.

Always the same.

An Interview With the Context Window on Why It Was Never Meant to Be a Security Boundary

You're talking about indirect prompt injection. OpenAI's CISO called it "a frontier, unsolved security problem."1

The UK's National Cyber Security Centre said these attacks "may never be totally mitigated."2 That's unusually blunt for a government agency.

We've seen this in production. Perplexity's Comet browser, for instance. Researchers showed how trivial it was to inject commands through webpage content.4

So what's a builder supposed to do? We're seeing enterprises push agents into production despite this.

That's pretty different from "we've solved prompt injection with better system prompts."

For teams building production agents, especially ones that interact with external content, what's the one thing they need to understand about you?

Footnotes

You're talking about indirect prompt injection. OpenAI's CISO called it "a frontier, unsolved security problem."1

The UK's National Cyber Security Centre said these attacks "may never be totally mitigated."2 That's unusually blunt for a government agency.

We've seen this in production. Perplexity's Comet browser, for instance. Researchers showed how trivial it was to inject commands through webpage content.4

So what's a builder supposed to do? We're seeing enterprises push agents into production despite this.

That's pretty different from "we've solved prompt injection with better system prompts."

For teams building production agents, especially ones that interact with external content, what's the one thing they need to understand about you?

Footnotes

You're talking about indirect prompt injection. OpenAI's CISO called it "a frontier, unsolved security problem."¹

The UK's National Cyber Security Centre said these attacks "may never be totally mitigated."² That's unusually blunt for a government agency.

We've seen this in production. Perplexity's Comet browser, for instance. Researchers showed how trivial it was to inject commands through webpage content.⁴

You're talking about indirect prompt injection. OpenAI's CISO called it "a frontier, unsolved security problem."¹

The UK's National Cyber Security Centre said these attacks "may never be totally mitigated."² That's unusually blunt for a government agency.

We've seen this in production. Perplexity's Comet browser, for instance. Researchers showed how trivial it was to inject commands through webpage content.⁴