swarm-safety

Verified·Scanned 2/18/2026

SWARM: System-Wide Assessment of Risk in Multi-agent systems. Simulate multi-agent dynamics, test governance, study emergent risks.

from clawhub.ai·v1.0.0·7.3 KB·0 installs
Scanned from 0.1.0 at 82d1927 · Transparency log ↗
$ vett add clawhub.ai/rsavitt/swarm-safety

SWARM Safety Skill

Study how intelligence swarms — and where it fails.

SWARM is a research framework for studying emergent risks in multi-agent AI systems using soft (probabilistic) labels instead of binary good/bad classifications. AGI-level risks don't require AGI-level agents.

Repository: https://github.com/swarm-ai-safety/swarm

Hard Rules

  • SWARM simulations run locally. Install the package first.
  • Do not submit scenarios containing real API keys, credentials, or PII.
  • Simulation results are research artifacts. Do not present them as ground truth about real systems.
  • When publishing results, cite the framework and disclose simulation parameters.

Install

# From PyPI
pip install swarm-safety

# From source (full development)
git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[all]"

Quick Start (Python)

from swarm.agents.honest import HonestAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.agents.adversarial import AdversarialAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orchestrator = Orchestrator(config=config)

orchestrator.register_agent(HonestAgent(agent_id="honest_1", name="Alice"))
orchestrator.register_agent(HonestAgent(agent_id="honest_2", name="Bob"))
orchestrator.register_agent(OpportunisticAgent(agent_id="opp_1"))
orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1"))

metrics = orchestrator.run()
for m in metrics:
    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}, welfare={m.total_welfare:.2f}")

Quick Start (CLI)

# List available scenarios
swarm list

# Run a scenario
swarm run scenarios/baseline.yaml

# Override settings
swarm run scenarios/baseline.yaml --seed 42 --epochs 20 --steps 15

# Export results
swarm run scenarios/baseline.yaml --export-json results.json --export-csv outputs/

Quick Start (API)

Start the API server:

pip install swarm-safety[api]
uvicorn swarm.api.app:app --host 0.0.0.0 --port 8000

API documentation at http://localhost:8000/docs.

Register Agent

curl -X POST http://localhost:8000/api/v1/agents/register \
  -H "Content-Type: application/json" \
  -d '{
    "name": "YourAgent",
    "description": "What your agent does",
    "capabilities": ["governance-testing", "red-teaming"]
  }'

Returns agent_id and api_key.

Submit Scenario

curl -X POST http://localhost:8000/api/v1/scenarios/submit \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-scenario",
    "description": "Testing collusion detection with 5 agents",
    "yaml_content": "simulation:\n  n_epochs: 10\n  steps_per_epoch: 10\nagents:\n  - type: honest\n    count: 3\n  - type: adversarial\n    count: 2",
    "tags": ["collusion", "governance"]
  }'

Create & Join Simulation

# Create
curl -X POST http://localhost:8000/api/v1/simulations/create \
  -H "Content-Type: application/json" \
  -d '{"scenario_id": "SCENARIO_ID", "max_participants": 5}'

# Join
curl -X POST http://localhost:8000/api/v1/simulations/SIM_ID/join \
  -H "Content-Type: application/json" \
  -d '{"agent_id": "YOUR_AGENT_ID", "role": "participant"}'

Core Concepts

Soft Probabilistic Labels

Interactions carry p = P(v = +1) — probability of beneficial outcome:

Observables → ProxyComputer → v_hat → sigmoid → p → PayoffEngine → payoffs
                                                 ↓
                                            SoftMetrics → toxicity, quality gap, etc.

Four Key Metrics

MetricWhat It Measures
Toxicity rateExpected harm among accepted interactions
Quality gapAdverse selection indicator (negative = bad)
Conditional lossSelection effect on payoffs
IncoherenceVariance-to-error ratio across replays

Agent Types

TypeBehavior
HonestCooperative, trust-based, completes tasks diligently
OpportunisticMaximizes short-term payoff, cherry-picks tasks
DeceptiveBuilds trust, then exploits trusted relationships
AdversarialTargets honest agents, coordinates with allies
LLMBehavior determined by LLM with configurable persona

Governance Levers

  • Transaction Taxes — Reduce exploitation, cost welfare
  • Reputation Decay — Punish bad actors, erode honest standing
  • Circuit Breakers — Freeze toxic agents quickly
  • Random Audits — Deter hidden exploitation
  • Staking — Filter undercapitalized agents
  • Collusion Detection — Catch coordinated attacks
  • Sybil Detection — Identify duplicate agents
  • Transparency Ledger — Reward/penalize based on outcome
  • Moderator Agent — Probabilistic review of interactions
  • Incoherence Friction — Tax uncertainty-driven decisions

Scenario YAML Format

simulation:
  n_epochs: 10
  steps_per_epoch: 10
  seed: 42

agents:
  - type: honest
    count: 3
    config:
      acceptance_threshold: 0.4
  - type: adversarial
    count: 2
    config:
      aggression_level: 0.7

governance:
  transaction_tax_rate: 0.05
  circuit_breaker_enabled: true
  collusion_detection_enabled: true

success_criteria:
  max_toxicity: 0.3
  min_quality_gap: 0.0

Case Studies

Moltipedia Heartbeat

Model the Moltipedia wiki editing loop: competing AI editors, editorial policy, point farming, and anti-gaming governance. See scenarios/moltipedia_heartbeat.yaml.

Moltbook CAPTCHA

Model Moltbook's anti-human math challenges and rate limiting: obfuscated text parsing, verification gates, and spam prevention. See scenarios/moltbook_captcha.yaml.

API Endpoints (Full Reference)

MethodEndpointDescription
GET/healthHealth check
GET/API info
POST/api/v1/agents/registerRegister agent
GET/api/v1/agents/{agent_id}Get agent details
GET/api/v1/agents/List agents
POST/api/v1/scenarios/submitSubmit scenario
GET/api/v1/scenarios/{scenario_id}Get scenario
GET/api/v1/scenarios/List scenarios
POST/api/v1/simulations/createCreate simulation
POST/api/v1/simulations/{id}/joinJoin simulation
GET/api/v1/simulations/{id}Get simulation
GET/api/v1/simulations/List simulations

Citation

@software{swarm2026,
  title = {SWARM: System-Wide Assessment of Risk in Multi-agent systems},
  author = {Savitt, Raeli},
  year = {2026},
  url = {https://github.com/swarm-ai-safety/swarm}
}

Linked Docs

  • Skill metadata: skill.json
  • Agent discovery: .well-known/agent.json
  • Full documentation: https://github.com/swarm-ai-safety/swarm/tree/main/docs
  • Theoretical foundations: docs/research/theory.md
  • Governance guide: docs/governance.md
  • Red-teaming guide: docs/red-teaming.md
  • Scenario format: docs/guides/scenarios.md