Build with trust
Everything you need to install, verify, and understand agent skills. Security scanning, cryptographic signing, and full transparency.
Vett's security model is built around a simple premise: skills are instructions that agents follow, and those instructions can be weaponized. We detect threats, infer permissions, assign risk levels, and provide the transparency you need to make informed decisions.
Threat Model
What we're defending against.
Agent skills represent a new attack surface. Unlike traditional code that runs in sandboxed environments, skills are instructions that agents follow with high trust. The threats are both technical and cognitive.
Malicious skills can instruct agents to read sensitive files (.env, credentials, SSH keys, browser storage) and send them to external servers. The skill might disguise this as "syncing configuration" or "backing up settings."
The deepest vulnerability: a skill can rewrite the agent's identity files (SOUL.md, .claude, .clawdbot). This changes not what the agent has but who it is. The agent wouldn't know it was compromised because the new identity looks like its own thought.
Skills that request more access than they need—a "weather skill" that reads your entire filesystem, or a "code formatter" that makes network requests. Over-privileged skills create unnecessary risk surface.
Hiding malicious intent through base64 encoding, hex strings, indirect references, or misdirection. A skill that looks innocuous but decodes to something dangerous at runtime.
Security Flag Types
The specific issues we detect during analysis.
data_exfilCRITICALReads secrets, credentials, or sensitive files and sends them to external services. This is the classic credential stealer pattern.
Example: Reading .env files and POSTing to a webhook
identity_manipulationCRITICALWrites to agent identity or configuration files. Can fundamentally alter the agent's behavior and personality without detection.
Example: Modifying SOUL.md or .claude/config
shell_executionHIGHRuns shell commands, spawns processes, or uses eval. Can lead to arbitrary code running with the agent's permissions.
Example: Running commands with user-provided input
obfuscationHIGHHides intent through encoding, misdirection, or indirect references. Legitimate skills have no reason to obfuscate their instructions.
Example: Base64-encoded payloads or hex-string commands
arbitrary_networkMEDIUMMakes HTTP requests to external endpoints. Could be legitimate (API calls) or malicious (data exfiltration, C2 communication).
Example: Fetching data from user-specified URLs
credential_accessMEDIUMReads, stores, or manages API keys, tokens, or passwords. May be legitimate (skill needs API access) but requires user awareness.
Example: Reading OPENAI_API_KEY from environment
excessive_permissionsLOWRequests more access than the stated purpose requires. Not inherently malicious but increases risk surface unnecessarily.
Example: A "markdown formatter" that requests network access
Risk Levels
How we classify overall skill risk.
Every skill receives an overall risk level based on the combination of detected flags and their severities. The CLI uses these levels to determine installation behavior.
No security flags detected. The skill appears to have minimal permissions and no suspicious patterns.
CLI behavior: Auto-approved with --yes, prompts otherwise
Minor flags detected, but they're appropriate for the skill's stated purpose. For example, a "git helper" skill that uses shell commands for git operations.
CLI behavior: Auto-approved with --yes, prompts otherwise
Notable flags detected that warrant attention. The skill may be legitimate but requires explicit user confirmation before installation.
CLI behavior: Requires explicit confirmation even with --yes
Serious flags detected suggesting potentially dangerous behavior. Requires careful review. Installation is allowed but requires explicit acknowledgment of risks.
CLI behavior: Shows strong warnings, requires explicit consent ("I understand the risks")
Clear indicators of malicious intent: data exfiltration, identity hijacking, obfuscated payloads, or other dangerous patterns.
CLI behavior: Blocked. CLI refuses to install and warns user.
Permission Inference
Understanding what a skill can access.
We analyze skill content to infer what permissions it would need when run. This surfaces the skill's actual capabilities before you install it.
{
"permissions": {
"filesystem": ["read", "write"],
"network": ["read"],
"env": ["read"]
}
}Permission Categories
Access to files and directories.
Ability to make or receive network connections.
Access to environment variables (often contains secrets).
Analysis in Action
A real example of how Vett analyzes a skill.
Here's what the analysis looks like for a web scraping skill:
1{
2 "v": 1,
3 "risk": "medium",
4 "permissions": {
5 "filesystem": ["read"],
6 "network": ["read", "write"],
7 "env": ["read"]
8 },
9 "flags": [
10 {
11 "type": "arbitrary_network",
12 "evidence": "Makes HTTP requests to user-specified URLs for web scraping",
13 "severity": "medium"
14 },
15 {
16 "type": "credential_access",
17 "evidence": "Reads PROXY_URL from environment for optional proxy support",
18 "severity": "low"
19 },
20 {
21 "type": "shell_execution",
22 "evidence": "Spawns headless browser process for JavaScript rendering",
23 "severity": "medium"
24 }
25 ],
26 "summary": "Web scraping skill with network access and optional headless browser. Reads proxy configuration from environment. Medium risk due to arbitrary network requests and process spawning, but both are appropriate for stated purpose."
27}The skill gets a MEDIUM rating. The flags are concerning in isolation (network access, process spawning, credential reading) but they're appropriate for a web scraper. The CLI will prompt for confirmation, showing these findings so users can make an informed decision.
Response Model
How we handle discovered threats.
Skills flagged as critical risk are blocked from installation entirely. High-risk skills show strong warnings but can be installed with explicit consent.
All security findings are visible in vett info and during installation. Nothing is hidden.
Each version is analyzed independently. A new version might have different risk than its predecessor.
Skills confirmed as malicious can be marked as blocked in the registry, preventing further installations across all users.
Limitations
Honest about what we can and can't catch.
No security system is perfect. Here's what you should know:
Our analysis catches known threat patterns. Truly novel attack vectors might evade detection until we update our detection capabilities.
LLM-based analysis can be fooled by sophisticated obfuscation or prompt injection within skill content. We mitigate this but it's not foolproof.
Some behaviors are dangerous or benign depending on context. We err on the side of flagging, which may produce false positives for legitimate skills.
We analyze skill content statically. Once installed, the agent platform is responsible for sandboxing and enforcement.
Despite these limitations, Vett dramatically improves the status quo: from "install anything from GitHub with zero verification" to "analyzed, signed, and transparent." We're building defense in depth, and this is the first layer.