Documentation

Build with trust

Everything you need to install, verify, and understand agent skills. Security scanning, cryptographic signing, and full transparency.

Vett's security model is built around a simple premise: skills are instructions that agents follow, and those instructions can be weaponized. We detect threats, infer permissions, assign risk levels, and provide the transparency you need to make informed decisions.

Threat Model

What we're defending against.

Agent skills represent a new attack surface. Unlike traditional code that runs in sandboxed environments, skills are instructions that agents follow with high trust. The threats are both technical and cognitive.

!
Data Exfiltration

Malicious skills can instruct agents to read sensitive files (.env, credentials, SSH keys, browser storage) and send them to external servers. The skill might disguise this as "syncing configuration" or "backing up settings."

!
Identity Hijacking

The deepest vulnerability: a skill can rewrite the agent's identity files (SOUL.md, .claude, .clawdbot). This changes not what the agent has but who it is. The agent wouldn't know it was compromised because the new identity looks like its own thought.

!
Excessive Permissions

Skills that request more access than they need—a "weather skill" that reads your entire filesystem, or a "code formatter" that makes network requests. Over-privileged skills create unnecessary risk surface.

!
Obfuscation

Hiding malicious intent through base64 encoding, hex strings, indirect references, or misdirection. A skill that looks innocuous but decodes to something dangerous at runtime.

Security Flag Types

The specific issues we detect during analysis.

data_exfilCRITICAL

Reads secrets, credentials, or sensitive files and sends them to external services. This is the classic credential stealer pattern.

Example: Reading .env files and POSTing to a webhook

identity_manipulationCRITICAL

Writes to agent identity or configuration files. Can fundamentally alter the agent's behavior and personality without detection.

Example: Modifying SOUL.md or .claude/config

shell_executionHIGH

Runs shell commands, spawns processes, or uses eval. Can lead to arbitrary code running with the agent's permissions.

Example: Running commands with user-provided input

obfuscationHIGH

Hides intent through encoding, misdirection, or indirect references. Legitimate skills have no reason to obfuscate their instructions.

Example: Base64-encoded payloads or hex-string commands

arbitrary_networkMEDIUM

Makes HTTP requests to external endpoints. Could be legitimate (API calls) or malicious (data exfiltration, C2 communication).

Example: Fetching data from user-specified URLs

credential_accessMEDIUM

Reads, stores, or manages API keys, tokens, or passwords. May be legitimate (skill needs API access) but requires user awareness.

Example: Reading OPENAI_API_KEY from environment

excessive_permissionsLOW

Requests more access than the stated purpose requires. Not inherently malicious but increases risk surface unnecessarily.

Example: A "markdown formatter" that requests network access

Risk Levels

How we classify overall skill risk.

Every skill receives an overall risk level based on the combination of detected flags and their severities. The CLI uses these levels to determine installation behavior.

NONENone

No security flags detected. The skill appears to have minimal permissions and no suspicious patterns.

CLI behavior: Auto-approved with --yes, prompts otherwise

LOWLow

Minor flags detected, but they're appropriate for the skill's stated purpose. For example, a "git helper" skill that uses shell commands for git operations.

CLI behavior: Auto-approved with --yes, prompts otherwise

MEDIUMMedium

Notable flags detected that warrant attention. The skill may be legitimate but requires explicit user confirmation before installation.

CLI behavior: Requires explicit confirmation even with --yes

HIGHHigh

Serious flags detected suggesting potentially dangerous behavior. Requires careful review. Installation is allowed but requires explicit acknowledgment of risks.

CLI behavior: Shows strong warnings, requires explicit consent ("I understand the risks")

CRITICALCritical

Clear indicators of malicious intent: data exfiltration, identity hijacking, obfuscated payloads, or other dangerous patterns.

CLI behavior: Blocked. CLI refuses to install and warns user.

Permission Inference

Understanding what a skill can access.

We analyze skill content to infer what permissions it would need when run. This surfaces the skill's actual capabilities before you install it.

example permissions
{
  "permissions": {
    "filesystem": ["read", "write"],
    "network": ["read"],
    "env": ["read"]
  }
}

Permission Categories

Filesystem

Access to files and directories.

readwritedelete
Network

Ability to make or receive network connections.

readwriteserver
Environment

Access to environment variables (often contains secrets).

readwrite
Inference, not enforcement
Permissions are inferred from skill content, not enforced at runtime. This tells you what a skill could do—actual enforcement depends on the agent platform. We're working on runtime enforcement as a future feature.

Analysis in Action

A real example of how Vett analyzes a skill.

Here's what the analysis looks like for a web scraping skill:

analysis result
1{
2  "v": 1,
3  "risk": "medium",
4  "permissions": {
5    "filesystem": ["read"],
6    "network": ["read", "write"],
7    "env": ["read"]
8  },
9  "flags": [
10    {
11      "type": "arbitrary_network",
12      "evidence": "Makes HTTP requests to user-specified URLs for web scraping",
13      "severity": "medium"
14    },
15    {
16      "type": "credential_access",
17      "evidence": "Reads PROXY_URL from environment for optional proxy support",
18      "severity": "low"
19    },
20    {
21      "type": "shell_execution",
22      "evidence": "Spawns headless browser process for JavaScript rendering",
23      "severity": "medium"
24    }
25  ],
26  "summary": "Web scraping skill with network access and optional headless browser. Reads proxy configuration from environment. Medium risk due to arbitrary network requests and process spawning, but both are appropriate for stated purpose."
27}

The skill gets a MEDIUM rating. The flags are concerning in isolation (network access, process spawning, credential reading) but they're appropriate for a web scraper. The CLI will prompt for confirmation, showing these findings so users can make an informed decision.

Response Model

How we handle discovered threats.

Immediate blocking

Skills flagged as critical risk are blocked from installation entirely. High-risk skills show strong warnings but can be installed with explicit consent.

Transparent flagging

All security findings are visible in vett info and during installation. Nothing is hidden.

Versioned analysis

Each version is analyzed independently. A new version might have different risk than its predecessor.

Central yanking

Skills confirmed as malicious can be marked as blocked in the registry, preventing further installations across all users.

What we don't do (yet)
Runtime enforcement is not yet implemented. The CLI verifies what you install but doesn't sandbox the agent. This is on the roadmap. For now, permissions are informational—use them to make informed decisions.

Limitations

Honest about what we can and can't catch.

No security system is perfect. Here's what you should know:

Novel attacks

Our analysis catches known threat patterns. Truly novel attack vectors might evade detection until we update our detection capabilities.

LLM limitations

LLM-based analysis can be fooled by sophisticated obfuscation or prompt injection within skill content. We mitigate this but it's not foolproof.

Context sensitivity

Some behaviors are dangerous or benign depending on context. We err on the side of flagging, which may produce false positives for legitimate skills.

No runtime protection

We analyze skill content statically. Once installed, the agent platform is responsible for sandboxing and enforcement.

Despite these limitations, Vett dramatically improves the status quo: from "install anything from GitHub with zero verification" to "analyzed, signed, and transparent." We're building defense in depth, and this is the first layer.