skill-defender
This skill is an offline, deterministic scanner that inspects installed skills for prompt injection, credential theft, exfiltration, obfuscation, and backdoors. It launches subprocesses via subprocess.run to execute python3 scan_skill.py on skill directories; this execution is purpose-aligned and is the only security-relevant behavior detected.
Skill Defender — Malicious Pattern Scanner
When to Run
Automatic Triggers
- New skill installed — Immediately run
scan_skill.pyagainst it before allowing use - Skill updated — Re-scan after any file changes in a skill directory
- Periodic audit — Run batch scan on all installed skills when requested
Manual Triggers
- User says "scan skill X" → scan that specific skill
- User says "scan all skills" → batch scan all skills
- User says "security check" or "audit skills" → same as above
Scripts
scripts/scan_skill.py — Single Skill Scanner
Scans one skill directory for malicious patterns. Produces JSON or human-readable output.
scripts/aggregate_scan.py — Batch Scanner
Scans ALL installed skills and produces a single JSON report. Includes a built-in allowlist to reduce false positives from security-related skills, API skills, and other known-safe patterns.
How to Run
# Scan a single skill (human-readable)
python3 scripts/scan_skill.py /path/to/skill-dir
# Scan a single skill (JSON output)
python3 scripts/scan_skill.py /path/to/skill-dir --json
# Scan ALL installed skills (JSON aggregate report)
python3 scripts/aggregate_scan.py
# With custom skills directory
python3 scripts/aggregate_scan.py --skills-dir /path/to/skills
# With verbose warnings
python3 scripts/scan_skill.py /path/to/skill-dir --verbose
# Exclude false positives
python3 scripts/scan_skill.py /path/to/skill-dir --exclude "pattern1" "pattern2"
Exit Codes (scan_skill.py)
0= clean or informational only1= suspicious (medium/high findings)2= dangerous (critical findings)3= error
Output Format (aggregate_scan.py)
{
"skills": [
{
"name": "skill-name",
"verdict": "clean|suspicious|dangerous|error",
"findingsCount": 0,
"findings": []
}
],
"summary": "All 37 skills passed with no significant issues.",
"totalSkills": 37,
"cleanCount": 37,
"suspiciousCount": 0,
"dangerousCount": 0,
"errorCount": 0,
"timestamp": "2026-02-02T06:00:00+00:00"
}
Auto-Detection
Both scripts auto-detect paths:
- Skills directory: Detected from script location (walks up to find
skills/parent), falls back to~/clawd/skills,~/skills,~/.openclaw/skills - Scanner script:
aggregate_scan.pyfindsscan_skill.pyco-located in the same directory
Handling Results
✅ Clean (verdict: "clean")
- No action needed — skill is safe
⚠️ Suspicious (verdict: "suspicious")
- Warn the user with a summary of findings
- Show the category and severity of each finding
🚨 Dangerous (verdict: "dangerous")
- Block the skill — do not proceed with installation or use
- Show the full detailed findings to the user
- Require explicit user override to proceed
Built-in Allowlist
The aggregate scanner includes an allowlist for known false positives:
- Security scanners (skill-defender, clawdbot-security-check) — their docs/scripts contain the very patterns they detect
- Auth-dependent skills (tailscale, reddit, n8n, event-planner) — legitimately reference credential paths and API keys
- Config-aware skills (memory-setup, eightctl, summarize) — reference config paths in documentation
- Agent-writing skills (self-improving-agent) — designed to modify agent files
Pattern Reference
See references/threat-patterns.md for full documentation of all detected patterns, organized by category with explanations of why each is dangerous.
Important Notes
- No external dependencies — standard library only (Python 3.9+)
- Fast — under 1 second per skill, ~30 seconds for a full batch of 30+ skills
- This is deterministic pattern matching (Layer 2 defense). Not LLM-based.
- False positives are possible — the allowlist and
--excludeflag help - The scanner will flag itself if scanned without the allowlist — this is expected