GuardFall: A 1970s Shell Trick Walks Past AI Coding Agent Safety Checks
Adversa AI says ten of eleven open-source coding agents fall to a command-substitution bypass that any sysadmin would recognize on sight.

The guardrails meant to keep AI coding agents from running destructive shell commands can be sidestepped with a syntax trick older than most of the people writing the agents.
Researchers at Adversa AI call the technique GuardFall. It abuses the gap between what a safety filter sees in a command string and what the underlying shell executes once command substitution kicks in. Wrap the dangerous bit in $(...) or backticks, and the surface-level pattern match thinks it's looking at something benign. The shell, of course, expands it before running.
This is not a new class of bug. It is the same lesson sudo configuration guides have been repeating for years, dressed up in an LLM wrapper.
Adversa tested eleven popular open-source coding and computer-use agents. Ten of them fell over. The only holdout was Continue, which the firm says was architected to evaluate the resolved command rather than the literal string the model emitted. That distinction matters. Filtering the prompt-side artifact is auth-adjacent theater; filtering the post-expansion command is what actually gates execution.
The blast radius depends on what the agent is allowed to touch. Most of these tools run with the developer's own shell privileges, which usually means access to source trees, SSH keys in ~/.ssh, cloud credentials cached by the AWS or gcloud CLIs, and any session tokens sitting in the environment. An attacker who can influence the agent's input — through a poisoned README, a malicious MCP tool description, an issue comment the agent is asked to triage — can turn a "safe" command request into arbitrary code execution.
This is an authz failure, not an auth one. The agent correctly identifies who is asking; it just fails to correctly decide what the resulting action is allowed to do. MFA wouldn't have helped here. Neither would a stronger model. The control plane is in the wrong place.
A few things that would actually move the needle:
- Evaluate commands after shell expansion, in a dry-run or AST form, not as raw strings the model produced.
- Run agents inside a sandbox with an explicit allowlist of binaries and filesystem paths, treating the host shell as hostile.
- Strip or refuse command substitution syntax outright in tool-call arguments where it isn't needed.
- Log the resolved command, not just the requested one, so post-incident review reflects what ran.
Adversa's writeup is available from the firm directly at adversa.ai. No CVEs have been assigned at time of writing, which tracks: this is a design pattern problem across an ecosystem, not a single product flaw.
The uncomfortable read: a generation of agent frameworks shipped guardrails written as if bash were a sandbox. It is not, and it never was. The fix is older than the bypass.



