High Risk:This skill has significant security concerns. Review the findings below before installing.

council

Caution·Scanned 2/18/2026

High-risk skill: council spawns parallel judges and executes local shell commands (e.g., codex exec ..., scripts/validate-council.sh) and reads/writes files under .agents/council. It also sets env vars like COUNCIL_CLAUDE_MODEL, enabling CLI-driven background processes and external-model invocations.

by boshu2·v4b98124·88.5 KB·832 installs
Scanned from main at 4b98124 · Transparency log ↗
$ vett add boshu2/agentops/councilReview security findings before installing

/council — Multi-Model Consensus Council

Spawn parallel judges with different perspectives, consolidate into consensus. Works for any task — validation, research, brainstorming.

Quick Start

/council --quick validate recent                               # fast inline check
/council validate this plan                                    # validation (2 agents)
/council brainstorm caching approaches                         # brainstorm
/council validate the implementation                          # validation (critique triggers map here)
/council research kubernetes upgrade strategies                # research
/council research the CI/CD pipeline bottlenecks               # research (analyze triggers map here)
/council --preset=security-audit validate the auth system      # preset personas
/council --deep --explorers=3 research upgrade automation      # deep + explorers
/council --debate validate the auth system                # adversarial 2-round review
/council --deep --debate validate the migration plan      # thorough + debate
/council                                                       # infers from context

Council works independently — no RPI workflow, no ratchet chain, no ao CLI required. Zero setup beyond initial install.

Modes

ModeAgentsExecution BackendUse Case
--quick0 (inline)SelfFast single-agent check, no spawning
default2Runtime-native (Codex sub-agents preferred; Claude teams fallback)Independent judges (no perspective labels)
--deep3Runtime-nativeThorough review
--mixed3+3Runtime-native + Codex CLICross-vendor consensus
--debate2+Runtime-nativeAdversarial refinement (2 rounds)
/council --quick validate recent   # inline single-agent check, no spawning
/council recent                    # 2 runtime-native judges
/council --deep recent             # 3 runtime-native judges
/council --mixed recent            # runtime-native + Codex CLI

Spawn Backend (MANDATORY)

Council requires a runtime that can spawn parallel subagents and (for --debate) send messages between agents. Use whatever multi-agent primitives your runtime provides. If no multi-agent capability is detected, fall back to --quick (inline single-agent).

Required capabilities:

  • Spawn subagent — create a parallel agent with a prompt (required for all modes except --quick)
  • Agent messaging — send a message to a specific agent (required for --debate)

Skills describe WHAT to do, not WHICH tool to call. See skills/shared/SKILL.md for the capability contract.

After detecting your backend, read the matching reference for concrete spawn/wait/message/cleanup examples:

  • Claude Native Teams → skills/shared/references/backend-claude-teams.md
  • Codex Sub-Agents / CLI → skills/shared/references/backend-codex-subagents.md
  • Background Tasks → skills/shared/references/backend-background-tasks.md
  • Inline (--quick) → skills/shared/references/backend-inline.md

See also references/cli-spawning.md for council-specific spawning flow (phases, timeouts, output collection).

When to Use --debate

Use --debate for high-stakes or ambiguous reviews where judges are likely to disagree:

  • Security audits, architecture decisions, migration plans
  • Reviews where multiple valid perspectives exist
  • Cases where a missed finding has real consequences

Skip --debate for routine validation where consensus is expected. Debate adds R2 latency (judges stay alive and process a second round via backend messaging).

Incompatibilities:

  • --quick and --debate cannot be combined. --quick runs inline with no spawning; --debate requires multi-agent rounds. If both are passed, exit with error: "Error: --quick and --debate are incompatible."
  • --debate is only supported with validate mode. Brainstorm and research do not produce PASS/WARN/FAIL verdicts. If combined, exit with error: "Error: --debate is only supported with validate mode."

Task Types

TypeTrigger WordsPerspective Focus
validatevalidate, check, review, assess, critique, feedback, improveIs this correct? What's wrong? What could be better?
brainstormbrainstorm, explore, options, approachesWhat are the alternatives? Pros/cons?
researchresearch, investigate, deep dive, explore deeply, analyze, examine, evaluate, compareWhat can we discover? What are the properties, trade-offs, and structure?

Natural language works — the skill infers task type from your prompt.


Architecture

Context Budget Rule (CRITICAL)

Judges write ALL analysis to output files. Messages to the lead contain ONLY a minimal completion signal: {"type":"verdict","verdict":"...","confidence":"...","file":"..."}. The lead reads output files during consolidation. This prevents N judges from exploding the lead's context window with N full reports via SendMessage.

Consolidation runs inline as the lead — no separate chairman agent. The lead reads each judge's output file sequentially with the Read tool and synthesizes.

Execution Flow

┌─────────────────────────────────────────────────────────────────┐
│  Phase 1: Build Packet (JSON)                                   │
│  - Task type (validate/brainstorm/research)                      │
│  - Target description                                           │
│  - Context (files, diffs, prior decisions)                      │
│  - Perspectives to assign                                       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 1a: Select spawn backend                                  │
│  codex_subagents | claude_teams | background_fallback            │
│  Team lead = spawner (this agent)                                │
└─────────────────────────────────────────────────────────────────┘
                              │
            ┌─────────────────┴─────────────────┐
            ▼                                   ▼
┌───────────────────────┐           ┌───────────────────────┐
│  RUNTIME-NATIVE JUDGES│           │     CODEX AGENTS      │
│ (spawn_agent or teams)│           │  (Bash tool, parallel)│
│                       │           │  Agent 1 (independent │
│  Agent 1 (independent │           │    or with preset)    │
│    or with preset)    │           │  Agent 2              │
│  Agent 2              │           │  Agent 3              │
│  Agent 3 (--deep only)│           │  (--mixed only)       │
│  (--deep/--mixed only)│           │                       │
│                       │           │  Output: JSON + MD    │
│  Write files, then    │           │  Files: .agents/      │
│ wait()/SendMessage to │           │    council/codex-*    │
│ lead                  │           │                       │
│  Files: .agents/      │           └───────────────────────┘
│    council/claude-*   │                       │
└───────────────────────┘                       │
            │                                   │
            └─────────────────┬─────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 2: Consolidation (Team Lead — inline, no extra agent)    │
│  - Receive MINIMAL completion signals (verdict + file path)     │
│  - Read each judge's output file with Read tool                 │
│  - If schema_version is missing from a judge's output, treat    │
│    as version 0 (backward compatibility)                        │
│  - Compute consensus verdict                                    │
│  - Identify shared findings                                     │
│  - Surface disagreements with attribution                       │
│  - Generate Markdown report for human                           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 3: Cleanup                                               │
│  - Cleanup backend resources (close_agent / TeamDelete / none)  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Output: Markdown Council Report                                │
│  - Consensus: PASS/WARN/FAIL                                    │
│  - Shared findings                                              │
│  - Disagreements (if any)                                       │
│  - Recommendations                                              │
└─────────────────────────────────────────────────────────────────┘

Graceful Degradation

FailureBehavior
1 of N agents times outProceed with N-1, note in report
All Codex CLI agents failProceed with runtime-native judges only, note degradation
All agents failReturn error, suggest retry
Codex CLI not installedSkip Codex CLI judges, continue with runtime judges only (warn user)
No multi-agent capabilityFall back to --quick (inline single-agent review)
No agent messaging--debate unavailable, single-round review only
Output dir missingCreate .agents/council/ automatically

Timeout: 120s per agent (configurable via --timeout=N in seconds).

Minimum quorum: At least 1 agent must respond for a valid council. If 0 agents respond, return error.

Pre-Flight Checks

  1. Multi-agent capability: Detect whether runtime supports spawning parallel subagents. If not, degrade to --quick.
  2. Agent messaging: Detect whether runtime supports agent-to-agent messaging. If not, disable --debate.
  3. Codex CLI judges (--mixed only): Check which codex, test model availability, test --output-schema support. Downgrade mixed mode when unavailable.
  4. Agent count: Verify judges * (1 + explorers) <= MAX_AGENTS (12)
  5. Output dir: mkdir -p .agents/council

Quick Mode (--quick)

Single-agent inline validation. No subprocess spawning, no Task tool, no Codex. The current agent performs a structured self-review using the same output schema as a full council.

When to use: Routine checks, mid-implementation sanity checks, pre-commit quick scan.

Execution: Gather context (files, diffs) -> perform structured self-review inline using the council output_schema (verdict, confidence, findings, recommendation) -> write report to .agents/council/YYYY-MM-DD-quick-<target>.md labeled as Mode: quick (single-agent).

Limitations: No cross-perspective disagreement, no cross-vendor insights, lower confidence ceiling. Not suitable for security audits or architecture decisions.


Packet Format (JSON)

The packet sent to each agent. File contents are included inline — agents receive the actual code/plan text in the packet, not just paths. This ensures both Claude and Codex agents can analyze without needing file access.

If .agents/ao/environment.json exists, include it in the context packet so judges can reason about available tools and environment state.

{
  "council_packet": {
    "version": "1.0",
    "mode": "validate | brainstorm | research",
    "target": "Implementation of user authentication system",
    "context": {
      "files": [
        {
          "path": "src/auth/jwt.py",
          "content": "<file contents inlined here>"
        },
        {
          "path": "src/auth/middleware.py",
          "content": "<file contents inlined here>"
        }
      ],
      "diff": "git diff output if applicable",
      "spec": {
        "source": "bead na-0042 | plan doc | none",
        "content": "The spec/bead description text (optional — included when wrapper provides it)"
      },
      "prior_decisions": [
        "Using JWT, not sessions",
        "Refresh tokens required"
      ]
    },
    "perspective": "skeptic (only when --preset or --perspectives used)",
    "perspective_description": "What could go wrong? (only when --preset or --perspectives used)",
    "output_schema": {
      "verdict": "PASS | WARN | FAIL",
      "confidence": "HIGH | MEDIUM | LOW",
      "key_insight": "Single sentence summary",
      "findings": [
        {
          "severity": "critical | significant | minor",
          "category": "security | architecture | performance | style",
          "description": "What was found",
          "location": "file:line if applicable",
          "recommendation": "How to address",
          "fix": "Specific action to resolve this finding",
          "why": "Root cause or rationale",
          "ref": "File path, spec anchor, or doc reference"
        }
      ],
      "recommendation": "Concrete next step",
      "schema_version": 2
    }
  }
}

Perspectives

Perspectives & Presets: Use Read tool on skills/council/references/personas.md for persona definitions, preset configurations, and custom perspective details.

Auto-Escalation: When --preset or --perspectives specifies more perspectives than the current judge count, automatically escalate judge count to match. The --count flag overrides auto-escalation.


Named Perspectives

Named perspectives assign each judge a specific viewpoint. Pass --perspectives="a,b,c" for free-form names, or --perspectives-file=<path> for YAML with focus descriptions:

/council --perspectives="security-auditor,performance-critic,simplicity-advocate" validate src/auth/
/council --perspectives-file=.agents/perspectives/api-review.yaml validate src/api/

YAML format for --perspectives-file:

perspectives:
  - name: security-auditor
    focus: Find security vulnerabilities and trust boundary violations
  - name: performance-critic
    focus: Identify performance bottlenecks and scaling risks

Flag priority: --perspectives/--perspectives-file override --preset perspectives. --count always overrides judge count. Without --count, judge count auto-escalates to match perspective count.

See references/personas.md for all built-in presets and their perspective definitions.


Explorer Sub-Agents

Explorer Details: Use Read tool on skills/council/references/explorers.md for explorer architecture, prompts, sub-question generation, and timeout configuration.

Summary: Judges can spawn explorer sub-agents (--explorers=N, max 5) for parallel deep-dive research. Total agents = judges * (1 + explorers), capped at MAX_AGENTS=12.


Debate Phase (--debate)

Debate Protocol: Use Read tool on skills/council/references/debate-protocol.md for full debate execution flow, R1-to-R2 verdict injection, timeout handling, and cost analysis.

Summary: Two-round adversarial review. R1 produces independent verdicts. R2 sends other judges' verdicts via backend messaging (send_input or SendMessage) for steel-manning and revision. Only supported with validate mode.


Agent Prompts

Agent Prompts: Use Read tool on skills/council/references/agent-prompts.md for judge prompts (default and perspective-based), consolidation prompt, and debate R2 message template.


Consensus Rules

ConditionVerdict
All PASSPASS
Any FAILFAIL
Mixed PASS/WARNWARN
All WARNWARN

Disagreement handling:

  • If Claude says PASS and Codex says FAIL → DISAGREE (surface both)
  • Severity-weighted: Security FAIL outweighs style WARN

DISAGREE resolution: When vendors disagree, the spawner presents both positions with reasoning and defers to the user. No automatic tie-breaking — cross-vendor disagreement is a signal worth human attention.


Output Format

Report Templates: Use Read tool on skills/council/references/output-format.md for full report templates (validate, brainstorm, research) and debate report additions (verdict shifts, convergence detection).

All reports write to .agents/council/YYYY-MM-DD-<type>-<target>.md.


Configuration

Partial Completion

Minimum quorum: 1 agent. Recommended: 80% of judges. On timeout, proceed with remaining judges and note in report. On user cancellation, shutdown all judges and generate partial report with INCOMPLETE marker.

Environment Variables

VariableDefaultDescription
COUNCIL_TIMEOUT120Agent timeout in seconds
COUNCIL_CODEX_MODELgpt-5.3-codexDefault Codex model for --mixed
COUNCIL_CLAUDE_MODELopusClaude model for judges
COUNCIL_EXPLORER_MODELsonnetModel for explorer sub-agents
COUNCIL_EXPLORER_TIMEOUT60Explorer timeout in seconds
COUNCIL_R2_TIMEOUT90Maximum wait time for R2 debate completion after sending debate messages. Shorter than R1 since judges already have context.

Flags

FlagDescription
--deep3 Claude agents instead of 2
--mixedAdd 3 Codex agents
--debateEnable adversarial debate round (2 rounds via backend messaging, same agents). Incompatible with --quick.
--timeout=NOverride timeout in seconds (default: 120)
--perspectives="a,b,c"Custom perspective names (each name sets the judge's system prompt to adopt that viewpoint)
--perspectives-file=<path>Load named perspectives from a YAML file (see Named Perspectives below)
--preset=<name>Built-in persona preset (security-audit, architecture, research, ops, code-review, plan-review, doc-review, retrospective, product, developer-experience)
--count=NOverride agent count per vendor (e.g., --count=4 = 4 Claude, or 4+4 with --mixed). Subject to MAX_AGENTS=12 cap.
--explorers=NExplorer sub-agents per judge (default: 0, max: 5). Max effective value depends on judge count. Total agents capped at 12.
--explorer-model=MOverride explorer model (default: sonnet)
--technique=<name>Brainstorm technique (scamper, six-hats, reverse). Case-insensitive. Only applicable to brainstorm mode — error if combined with validate/research. If omitted, unstructured brainstorm (current behavior). See references/brainstorm-techniques.md.
--profile=<name>Model quality profile (thorough, balanced, fast). Error if unrecognized name. Overridden by COUNCIL_CLAUDE_MODEL env var (highest priority), then by explicit --count/--deep/--mixed. See references/model-profiles.md.

CLI Spawning Commands

CLI Spawning: Use Read tool on skills/council/references/cli-spawning.md for team setup, Claude/Codex agent spawning, parallel execution, debate R2 commands, cleanup, and model selection.


Examples

/council validate recent                                        # 2 judges, recent commits
/council --deep --preset=architecture research the auth system  # 3 judges with architecture personas
/council --mixed validate this plan                             # 3 Claude + 3 Codex
/council --deep --explorers=3 research upgrade patterns         # 12 agents (3 judges x 4)
/council --preset=security-audit --deep validate the API        # attacker, defender, compliance
/council --preset=doc-review validate README.md                  # 4 doc judges with named perspectives
/council brainstorm caching strategies for the API              # 2 judges explore options
/council --technique=scamper brainstorm API improvements               # structured SCAMPER brainstorm
/council --technique=six-hats brainstorm migration strategy            # parallel perspectives brainstorm
/council --profile=thorough validate the security architecture       # opus, 3 judges, 120s timeout
/council --profile=fast validate recent                               # haiku, 2 judges, 60s timeout
/council research Redis vs Memcached for session storage        # 2 judges assess trade-offs
/council validate the implementation plan in PLAN.md            # structured plan feedback
/council --preset=doc-review validate docs/ARCHITECTURE.md             # 4 doc review judges
/council --perspectives="security-auditor,perf-critic" validate src/   # named perspectives
/council --perspectives-file=.agents/perspectives/custom.yaml validate # perspectives from file

Fast Single-Agent Validation

User says: /council --quick validate recent

What happens:

  1. Agent gathers context (recent diffs, files) inline without spawning
  2. Agent performs structured self-review using council output schema
  3. Report written to .agents/council/YYYY-MM-DD-quick-<target>.md labeled Mode: quick (single-agent)

Result: Fast sanity check for routine validation (no cross-perspective insights or debate).

Adversarial Debate Review

User says: /council --debate validate the auth system

What happens:

  1. Agent spawns 2 judges (runtime-native backend) with independent perspectives
  2. R1: Judges assess independently, write verdicts to .agents/council/
  3. R2: Team lead sends other judges' verdicts via backend messaging
  4. Judges revise positions based on cross-perspective evidence
  5. Consolidation: Team lead computes consensus with convergence detection

Result: Two-round review with steel-manning and revision, useful for high-stakes decisions.

Cross-Vendor Consensus with Explorers

User says: /council --mixed --explorers=2 research Kubernetes upgrade strategies

What happens:

  1. Agent spawns 3 Claude judges + 3 Codex judges (6 total)
  2. Each judge spawns 2 explorer sub-agents (6 x 3 = 18 total agents, exceeds MAX_AGENTS)
  3. Agent auto-scales to 2 judges per vendor (4 x 3 = 12 agents at limit)
  4. Explorers perform parallel deep-dives, return sub-findings to judges
  5. Judges consolidate explorer findings with own research

Result: Cross-vendor research with deep exploration, capped at 12 total agents.


Troubleshooting

ProblemCauseSolution
"Error: --quick and --debate are incompatible"Both flags passed togetherUse --quick for fast inline check OR --debate for multi-round review, not both
"Error: --debate is only supported with validate mode"Debate flag used with brainstorm/researchRemove --debate or switch to validate mode — brainstorming/research have no PASS/FAIL verdicts
Council spawns fewer agents than expected--explorers=N exceeds MAX_AGENTS (12)Agent auto-scales judge count. Check report header for actual judge count. Reduce --explorers or use --count to manually set judges
Codex judges skipped in --mixed modeCodex CLI not on PATH or gpt-5.3-codex unavailableInstall Codex CLI (brew install codex) or use ChatGPT API account. Fallback to runtime-native judges only.
No output files in .agents/council/Permission error or disk fullCheck directory permissions with ls -ld .agents/council/. Council auto-creates missing dirs.
Agent timeout after 120sSlow file reads or network issuesIncrease timeout with --timeout=300 or check COUNCIL_TIMEOUT env var. Default: 120s.

Migration from /judge

/council replaces /judge. Migration:

OldNew
/judge recent/council validate recent
/judge 2 opus/council recent (default)
/judge 3 opus/council --deep recent

The /judge skill is deprecated. Use /council.


Multi-Agent Architecture

Council uses whatever multi-agent primitives your runtime provides. Each judge is a parallel subagent that writes output to a file and sends a minimal completion signal to the lead.

Deliberation Protocol

The --debate flag implements the deliberation protocol pattern:

Independent assessment → evidence exchange → position revision → convergence analysis

  • R1: Spawn judges as parallel subagents. Each assesses independently, writes verdict to file, signals completion.
  • R2: Lead sends other judges' verdict summaries to each judge via agent messaging. Judges revise and write R2 files.
  • Consolidation: Lead reads all output files, computes consensus.
  • Cleanup: Shut down judges via runtime's cleanup mechanism.

Communication Rules

  • Judges → lead only. Judges never message each other directly. This prevents anchoring.
  • Lead → judges. Only the lead sends follow-ups (for debate R2).
  • No shared task mutation by judges. Lead manages coordination state.

Ralph Wiggum Compliance

Council maintains fresh-context isolation (Ralph Wiggum pattern) with one documented exception:

--debate reuses judge context across R1 and R2. This is intentional. Judges persist within a single atomic council invocation — they do NOT persist across separate council calls. The rationale:

  • Judges benefit from their own R1 analytical context (reasoning chain, not just the verdict JSON) when evaluating other judges' positions in R2
  • Re-spawning with only the verdict summary (~200 tokens) would lose the judge's working memory of WHY they reached their verdict
  • The exception is bounded: max 2 rounds, within one invocation, with explicit cleanup

Without --debate, council is fully Ralph-compliant: each judge is a fresh spawn, executes once, writes output, and terminates.

Degradation

If no multi-agent capability is detected, council falls back to --quick (inline single-agent review). If agent messaging is unavailable, --debate degrades to single-round review with a note in the report.

Judge Naming

Convention: council-YYYYMMDD-<target> (e.g., council-20260206-auth-system).

Judge names: judge-{N} for independent judges (e.g., judge-1, judge-2), or judge-{perspective} when using presets/perspectives (e.g., judge-error-paths, judge-feasibility). Use the same logical names across both Codex and Claude backends.


See Also

  • skills/vibe/SKILL.md — Complexity + council for code validation (uses --preset=code-review when spec found)
  • skills/pre-mortem/SKILL.md — Plan validation (uses --preset=plan-review, always 3 judges)
  • skills/post-mortem/SKILL.md — Work wrap-up (uses --preset=retrospective, always 3 judges + retro)
  • skills/swarm/SKILL.md — Multi-agent orchestration
  • skills/standards/SKILL.md — Language-specific coding standards
  • skills/research/SKILL.md — Codebase exploration (complementary to council research mode)