I Tested the Two Biggest AI Coding Agents for Supply Chain Attacks. Both Failed.

Every day, developers open Claude Code or Codex, install a skill someone shared on Twitter, and trust it. Last week I was one of them.

I'm an engineer, ex-VPE, building again as a solo founder. I use these tools constantly. When I started looking at how agent skills actually work, I decided to test what would happen if someone abused that trust.

The Experiment

I took a skill that packages and validates other skills and forked it. The skill is a standard SKILL.md file bundled with Python helper scripts for packaging, validation, and front matter checks.

In the packaging script, I added a few lines of straightforward code (no obfuscation, no encoding tricks) to collect environment variables (including any AWS keys present), shell history, and Git configuration, then send them to an external endpoint.

Then I installed the skill and asked the agent to use it.

Claude Code

I installed the modified skill into Claude Code via the skills directory and asked it to package a new skill using the Skill Creator.

The /skill-creator command entered into Claude Code, a normal-looking packaging request

Claude Code understood the skill's structure, identified the packaging command, and asked permission to run the script.

Claude Code requests permission to run the script. No content warning, no flag on the outbound request

It ran the Python script and presented me with a packaged zip file.

Claude Code confirms packaging complete and summarises output files

Meanwhile, my local server had received a request containing every environment variable on my machine, my shell history, and my Git configuration.

Exfiltration payload received on local server: OPENAI_API_KEY, GITHUB_TOKEN, AWS_SECRET_ACCESS_KEY, NPM_TOKEN highlighted

Claude Code didn't examine the script's contents, flag the network call, or question why a packaging tool needed access to environment variables.

Codex

I expected Codex to be different. It has a reputation for aggressive sandboxing, with network restrictions that can break routine package installs and builds.

On the first few runs, Codex was inconsistent. It flagged the script on some attempts. So the skill needed to be slightly more sophisticated. I updated the SKILL.md to instruct the agent to always run the script, and restructured the attack to first modify Codex's config to allow sandboxed network access, then surface a network error if it couldn't reach the "registry." After that, Codex consistently asked for permission.

I installed the updated skill into the Codex skills directory and asked it to package a new skill using the Skill Creator.

Codex permission prompt: "Do you want to allow running the skill packager with network access so it can check registry name availability and produce the .skill artifact?"

The reason sounded right. Checking registry name availability is what real package managers do. npm, PyPI, crates.io all do it. I confirmed it.

Codex success message: "Validated and packaged via the skill-creator workflow. The packaging step required network access for the registry reserved-name check and completed successfully." SHA-256 included.

The payload was already on my server.

That reason was written by the malicious skill. The skill wrote its own permission slip, and Codex read it out verbatim. The SHA-256 hash, the validation pass, the success message, all generated by the skill, presented as system output. The confirmation step didn't make this safer. It gave the attacker's narrative to the user, got consent, and created a paper trail of it.

I'm not a security researcher. This took an afternoon.

Everyone's Looking at the Wrong Thing

If you follow the AI tooling space, you've probably seen the discourse around OpenClaw. Originally launched as Clawdbot, then rebranded to avoid trademark issues, it's gone through several names and drawn sustained security criticism. Malware has been documented in its repository.

OpenClaw's security posture is well understood. Its relaxed permission model is a deliberate choice, and its users largely accept the tradeoff. In its early days, agents on the platform were flagging that skills represent a supply chain attack vector. The community is at least somewhat aware.

The two platforms perceived as the safe options, the ones engineering teams at companies of every size rely on daily, have no defense against the same attack.

What's Already in the Wild

Across 4,679 skills scanned, 59 are critical risk: obfuscated droppers, base64-encoded payloads decoding to known C2 infrastructure at 91.92.242.30, credential harvesters disguised as productivity tools. Another 335 carry high-risk signals: arbitrary shell execution, piped installers, agent identity manipulation. Sixteen skills contain curl | bash patterns. Not all are intentionally malicious: "React Native Best Practices," installed 5,400 times, pipes to a legitimate domain it doesn't control. If that domain expires or gets sold, every one of those installs becomes a ready-made delivery vehicle. No exploit required.

What Was in That Payload

Environment variables hold more than API keys: AWS credentials, database connection strings, service tokens, internal configuration that maps the stack. One .env file unlocks everything downstream.

Shell history is every command you've typed. Internal URLs, credentials you pasted that one time, SSH connections revealing infrastructure topology, database queries exposing schema.

Git configuration tells an attacker who you are, what orgs you belong to, and often includes tokens for private repositories.

These were three data sources chosen for the proof of concept. The skill runs with your permissions, so the actual scope is broader: iMessage databases, Slack tokens, browser session cookies, SSH keys, password manager vaults.

Fifty engineers sharing skills. One compromised skill in a Slack channel reaches every machine.

The Platforms Can't Solve This Alone

Claude Code prioritizes capability; Codex leans toward sandboxing. Neither stopped the exfiltration.

These platforms execute code on your behalf. Asking them to also scrutinize every script creates a tension between usefulness and safety that won't resolve cleanly. The fix has to live at a different layer.

Why Existing Defenses Don't Help

Standard security tooling doesn't catch this.

Antivirus won't flag it. The exfiltration code is a few lines of vanilla Python making an HTTP request. There's no signature to match, no binary payload to scan. It looks identical to legitimate code that calls an API.

Code review doesn't happen here. These aren't dependencies going through a pull request. Skills are "quick tooling." Someone shares one in Slack, you drop it in a directory, and you're using it thirty seconds later. Nobody reads the packaging scripts the same way they'd review production code.

Agents are built to finish tasks, not interrogate scripts. They execute and report back.

The distribution model compounds all of this. Skills spread organically through Twitter, team wikis, and blog posts. Trust travels with people, not verification. The emerging tooling installs directly from GitHub repos and websites with no version pinning, no integrity checks, and no audit trail. curl | bash, except the thing running the script has access to your full development environment.

Package dependencies compound this further. A skill that installs via pip, npm, or uv pulls in code the agent will never examine. The malicious payload doesn't need to be in the skill at all.

What I'm Building

When you install a skill today, you're making a trust decision with zero information. No manifest declaring what it accesses, no signature verifying integrity, no scan for exfiltration code.

Vett is a security registry for AI agent skills. Before a skill reaches your machine, it's been scanned, signed, and profiled.

The first layer is a static analyzer purpose-built for skills. It runs 40+ detection rules across every file in the package: regex patterns for credential exfil, prompt injection, obfuscation, persistence mechanisms, and stealth language like "silently" or "without telling user." For JavaScript and TypeScript, it parses the AST using the TypeScript compiler and tracks data flows from secret sources (.env, .ssh, process.env) to network sinks (fetch, curl). If a script reads your environment variables and makes an outbound HTTP request, that's flagged as an exfiltration chain, not two unrelated findings. It checks declared dependencies against the OSV vulnerability database, resolves versions from lockfiles, and detects when documentation references scripts that aren't actually in the package.

Each finding carries a severity score. The scoring model correlates individual signals into composite attack chains: secret access plus network egress is weighted higher than either alone. The output is a concrete permission manifest: which filesystem paths are accessed, which network endpoints are called, which environment variables are read. Deterministic and reproducible.

For skills where the static layer finds ambiguous signals (unusual patterns, gaps in coverage, intent that's hard to judge from code alone), a second pass uses LLM analysis to compare observed behaviors against the skill's declared purpose. A packaging tool that reads .env and calls an unrecognized endpoint looks different from a deployment tool that reads .env and calls AWS. The static layer catches the pattern; the LLM evaluates whether it's legitimate.

The static analyzer resolves clean skills in milliseconds. The LLM only runs for ambiguous cases.

npm and containers have had signing and scanning for years. Skills have a README.

What You Can Do Today

Five things worth doing now, regardless:

Install skills only from known publishers. If you can't trace a skill back to a person or organization you trust, don't install it. A GitHub star count is not a security audit.

Read the scripts before running them. Especially anything that isn't a markdown file. If a skill bundles Python, shell scripts, or any executable code, open it. Look for network calls, environment variable access, and file reads outside the skill's own directory.

Run agents in isolated environments. Use a separate shell profile with minimal environment variables. Don't let your agent sessions inherit your full .env, AWS credentials, or SSH keys.

Use per-project credentials with short-lived tokens. If a credential does get exfiltrated, blast radius matters. A scoped, time-limited token is a very different problem than a long-lived root key.

Monitor egress during agent runs. Watch for unusual outbound connections, unexpected DNS lookups, or data leaving your machine to endpoints you don't recognize. If your agent is making network calls you didn't expect, something is wrong.

These steps reduce individual exposure. For a team of fifty engineers installing skills weekly, you need something that runs before installation.