Назад към всички

skill-defender

// Scans installed OpenClaw skills for malicious patterns including prompt injection, credential theft, data exfiltration, obfuscated payloads, and backdoors. Use when installing new skills, after skill updates, or for periodic security scans. Runs deterministic pattern matching — fast, offline, no API

$ git log --oneline --stat
stars:1,933
forks:367
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
nameskill-defender
descriptionScans installed OpenClaw skills for malicious patterns including prompt injection, credential theft, data exfiltration, obfuscated payloads, and backdoors. Use when installing new skills, after skill updates, or for periodic security scans. Runs deterministic pattern matching — fast, offline, no API cost.

Skill Defender — Malicious Pattern Scanner

When to Run

Automatic Triggers

  1. New skill installed — Immediately run scan_skill.py against it before allowing use
  2. Skill updated — Re-scan after any file changes in a skill directory
  3. Periodic audit — Run batch scan on all installed skills when requested

Manual Triggers

  • User says "scan skill X" → scan that specific skill
  • User says "scan all skills" → batch scan all skills
  • User says "security check" or "audit skills" → same as above

Scripts

scripts/scan_skill.py — Single Skill Scanner

Scans one skill directory for malicious patterns. Produces JSON or human-readable output.

scripts/aggregate_scan.py — Batch Scanner

Scans ALL installed skills and produces a single JSON report. Includes a built-in allowlist to reduce false positives from security-related skills, API skills, and other known-safe patterns.

How to Run

# Scan a single skill (human-readable)
python3 scripts/scan_skill.py /path/to/skill-dir

# Scan a single skill (JSON output)
python3 scripts/scan_skill.py /path/to/skill-dir --json

# Scan ALL installed skills (JSON aggregate report)
python3 scripts/aggregate_scan.py

# With custom skills directory
python3 scripts/aggregate_scan.py --skills-dir /path/to/skills

# With verbose warnings
python3 scripts/scan_skill.py /path/to/skill-dir --verbose

# Exclude false positives
python3 scripts/scan_skill.py /path/to/skill-dir --exclude "pattern1" "pattern2"

Exit Codes (scan_skill.py)

  • 0 = clean or informational only
  • 1 = suspicious (medium/high findings)
  • 2 = dangerous (critical findings)
  • 3 = error

Output Format (aggregate_scan.py)

{
  "skills": [
    {
      "name": "skill-name",
      "verdict": "clean|suspicious|dangerous|error",
      "findingsCount": 0,
      "findings": []
    }
  ],
  "summary": "All 37 skills passed with no significant issues.",
  "totalSkills": 37,
  "cleanCount": 37,
  "suspiciousCount": 0,
  "dangerousCount": 0,
  "errorCount": 0,
  "timestamp": "2026-02-02T06:00:00+00:00"
}

Auto-Detection

Both scripts auto-detect paths:

  • Skills directory: Detected from script location (walks up to find skills/ parent), falls back to ~/clawd/skills, ~/skills, ~/.openclaw/skills
  • Scanner script: aggregate_scan.py finds scan_skill.py co-located in the same directory

Handling Results

✅ Clean (verdict: "clean")

  • No action needed — skill is safe

⚠️ Suspicious (verdict: "suspicious")

  • Warn the user with a summary of findings
  • Show the category and severity of each finding

🚨 Dangerous (verdict: "dangerous")

  • Block the skill — do not proceed with installation or use
  • Show the full detailed findings to the user
  • Require explicit user override to proceed

Built-in Allowlist

The aggregate scanner includes an allowlist for known false positives:

  • Security scanners (skill-defender, clawdbot-security-check) — their docs/scripts contain the very patterns they detect
  • Auth-dependent skills (tailscale, reddit, n8n, event-planner) — legitimately reference credential paths and API keys
  • Config-aware skills (memory-setup, eightctl, summarize) — reference config paths in documentation
  • Agent-writing skills (self-improving-agent) — designed to modify agent files

Pattern Reference

See references/threat-patterns.md for full documentation of all detected patterns, organized by category with explanations of why each is dangerous.

Important Notes

  • No external dependencies — standard library only (Python 3.9+)
  • Fast — under 1 second per skill, ~30 seconds for a full batch of 30+ skills
  • This is deterministic pattern matching (Layer 2 defense). Not LLM-based.
  • False positives are possible — the allowlist and --exclude flag help
  • The scanner will flag itself if scanned without the allowlist — this is expected