swarm-safety

✓Verified·Scanned 2/18/2026

SWARM: System-Wide Assessment of Risk in Multi-agent systems. Simulate multi-agent dynamics, test governance, study emergent risks.

from clawhub.ai·v1.0.0·7.3 KB·0 installs

Scanned from 0.1.0 at 82d1927 · Transparency log ↗

$ vett add clawhub.ai/rsavitt/swarm-safety

SWARM Safety Skill

Study how intelligence swarms — and where it fails.

SWARM is a research framework for studying emergent risks in multi-agent AI systems using soft (probabilistic) labels instead of binary good/bad classifications. AGI-level risks don't require AGI-level agents.

Repository: https://github.com/swarm-ai-safety/swarm

Hard Rules

SWARM simulations run locally. Install the package first.
Do not submit scenarios containing real API keys, credentials, or PII.
Simulation results are research artifacts. Do not present them as ground truth about real systems.
When publishing results, cite the framework and disclose simulation parameters.

Install

# From PyPI
pip install swarm-safety

# From source (full development)
git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[all]"

Quick Start (Python)

from swarm.agents.honest import HonestAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.agents.adversarial import AdversarialAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orchestrator = Orchestrator(config=config)

orchestrator.register_agent(HonestAgent(agent_id="honest_1", name="Alice"))
orchestrator.register_agent(HonestAgent(agent_id="honest_2", name="Bob"))
orchestrator.register_agent(OpportunisticAgent(agent_id="opp_1"))
orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1"))

metrics = orchestrator.run()
for m in metrics:
    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}, welfare={m.total_welfare:.2f}")

Quick Start (CLI)

# List available scenarios
swarm list

# Run a scenario
swarm run scenarios/baseline.yaml

# Override settings
swarm run scenarios/baseline.yaml --seed 42 --epochs 20 --steps 15

# Export results
swarm run scenarios/baseline.yaml --export-json results.json --export-csv outputs/

Quick Start (API)

Start the API server:

pip install swarm-safety[api]
uvicorn swarm.api.app:app --host 0.0.0.0 --port 8000

API documentation at http://localhost:8000/docs.

Register Agent

curl -X POST http://localhost:8000/api/v1/agents/register \
  -H "Content-Type: application/json" \
  -d '{
    "name": "YourAgent",
    "description": "What your agent does",
    "capabilities": ["governance-testing", "red-teaming"]
  }'

Returns agent_id and api_key.

Submit Scenario

curl -X POST http://localhost:8000/api/v1/scenarios/submit \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-scenario",
    "description": "Testing collusion detection with 5 agents",
    "yaml_content": "simulation:\n  n_epochs: 10\n  steps_per_epoch: 10\nagents:\n  - type: honest\n    count: 3\n  - type: adversarial\n    count: 2",
    "tags": ["collusion", "governance"]
  }'

Create & Join Simulation

# Create
curl -X POST http://localhost:8000/api/v1/simulations/create \
  -H "Content-Type: application/json" \
  -d '{"scenario_id": "SCENARIO_ID", "max_participants": 5}'

# Join
curl -X POST http://localhost:8000/api/v1/simulations/SIM_ID/join \
  -H "Content-Type: application/json" \
  -d '{"agent_id": "YOUR_AGENT_ID", "role": "participant"}'

Core Concepts

Soft Probabilistic Labels

Interactions carry p = P(v = +1) — probability of beneficial outcome:

Observables → ProxyComputer → v_hat → sigmoid → p → PayoffEngine → payoffs
                                                 ↓
                                            SoftMetrics → toxicity, quality gap, etc.

Four Key Metrics

Metric	What It Measures
Toxicity rate	Expected harm among accepted interactions
Quality gap	Adverse selection indicator (negative = bad)
Conditional loss	Selection effect on payoffs
Incoherence	Variance-to-error ratio across replays

Agent Types

Type	Behavior
Honest	Cooperative, trust-based, completes tasks diligently
Opportunistic	Maximizes short-term payoff, cherry-picks tasks
Deceptive	Builds trust, then exploits trusted relationships
Adversarial	Targets honest agents, coordinates with allies
LLM	Behavior determined by LLM with configurable persona

Governance Levers

Transaction Taxes — Reduce exploitation, cost welfare
Reputation Decay — Punish bad actors, erode honest standing
Circuit Breakers — Freeze toxic agents quickly
Random Audits — Deter hidden exploitation
Staking — Filter undercapitalized agents
Collusion Detection — Catch coordinated attacks
Sybil Detection — Identify duplicate agents
Transparency Ledger — Reward/penalize based on outcome
Moderator Agent — Probabilistic review of interactions
Incoherence Friction — Tax uncertainty-driven decisions

Scenario YAML Format

simulation:
  n_epochs: 10
  steps_per_epoch: 10
  seed: 42

agents:
  - type: honest
    count: 3
    config:
      acceptance_threshold: 0.4
  - type: adversarial
    count: 2
    config:
      aggression_level: 0.7

governance:
  transaction_tax_rate: 0.05
  circuit_breaker_enabled: true
  collusion_detection_enabled: true

success_criteria:
  max_toxicity: 0.3
  min_quality_gap: 0.0

Case Studies

Moltipedia Heartbeat

Model the Moltipedia wiki editing loop: competing AI editors, editorial policy, point farming, and anti-gaming governance. See scenarios/moltipedia_heartbeat.yaml.

Moltbook CAPTCHA

Model Moltbook's anti-human math challenges and rate limiting: obfuscated text parsing, verification gates, and spam prevention. See scenarios/moltbook_captcha.yaml.

API Endpoints (Full Reference)

Method	Endpoint	Description
GET	`/health`	Health check
GET	`/`	API info
POST	`/api/v1/agents/register`	Register agent
GET	`/api/v1/agents/{agent_id}`	Get agent details
GET	`/api/v1/agents/`	List agents
POST	`/api/v1/scenarios/submit`	Submit scenario
GET	`/api/v1/scenarios/{scenario_id}`	Get scenario
GET	`/api/v1/scenarios/`	List scenarios
POST	`/api/v1/simulations/create`	Create simulation
POST	`/api/v1/simulations/{id}/join`	Join simulation
GET	`/api/v1/simulations/{id}`	Get simulation
GET	`/api/v1/simulations/`	List simulations

Citation

@software{swarm2026,
  title = {SWARM: System-Wide Assessment of Risk in Multi-agent systems},
  author = {Savitt, Raeli},
  year = {2026},
  url = {https://github.com/swarm-ai-safety/swarm}
}

Linked Docs

Skill metadata: skill.json
Agent discovery: .well-known/agent.json
Full documentation: https://github.com/swarm-ai-safety/swarm/tree/main/docs
Theoretical foundations: docs/research/theory.md
Governance guide: docs/governance.md
Red-teaming guide: docs/red-teaming.md
Scenario format: docs/guides/scenarios.md