wandb

✓Verified·Scanned 2/17/2026

This skill monitors and analyzes Weights & Biases training runs and provides CLI scripts such as scripts/characterize_run.py, scripts/watch_runs.py, and scripts/compare_runs.py. It performs network calls via wandb.Api() and instructs use of WANDB_API_KEY, and includes shell examples like ~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/characterize_run.py.

from clawhub.ai·va2a2c1a·39.9 KB·0 installs

Scanned from 1.0.0 at a2a2c1a · Transparency log ↗

$ vett add clawhub.ai/chrisvoncsefalvay/wandb

Weights & Biases

Monitor, analyze, and compare W&B training runs.

Setup

wandb login
# Or set WANDB_API_KEY in environment

Scripts

Characterize a Run (Full Health Analysis)

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/characterize_run.py ENTITY/PROJECT/RUN_ID

Analyzes:

Loss curve trend (start → current, % change, direction)
Gradient norm health (exploding/vanishing detection)
Eval metrics (if present)
Stall detection (heartbeat age)
Progress & ETA estimate
Config highlights
Overall health verdict

Options: --json for machine-readable output.

Watch All Running Jobs

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/watch_runs.py ENTITY [--projects p1,p2]

Quick health summary of all running jobs plus recent failures/completions. Ideal for morning briefings.

Options:

--projects p1,p2 — Specific projects to check
--all-projects — Check all projects
--hours N — Hours to look back for finished runs (default: 24)
--json — Machine-readable output

Compare Two Runs

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/compare_runs.py ENTITY/PROJECT/RUN_A ENTITY/PROJECT/RUN_B

Side-by-side comparison:

Config differences (highlights important params)
Loss curves at same steps
Gradient norm comparison
Eval metrics
Performance (tokens/sec, steps/hour)
Winner verdict

Python API Quick Reference

import wandb
api = wandb.Api()

# Get runs
runs = api.runs("entity/project", {"state": "running"})

# Run properties
run.state      # running | finished | failed | crashed | canceled
run.name       # display name
run.id         # unique identifier
run.summary    # final/current metrics
run.config     # hyperparameters
run.heartbeat_at # stall detection

# Get history
history = list(run.scan_history(keys=["train/loss", "train/grad_norm"]))

Metric Key Variations

Scripts handle these automatically:

Loss: train/loss, loss, train_loss, training_loss
Gradients: train/grad_norm, grad_norm, gradient_norm
Steps: train/global_step, global_step, step, _step
Eval: eval/loss, eval_loss, eval/accuracy, eval_acc

Health Thresholds

Gradients > 10: Exploding (critical)
Gradients > 5: Spiky (warning)
Gradients < 0.0001: Vanishing (warning)
Heartbeat > 30min: Stalled (critical)
Heartbeat > 10min: Slow (warning)

Integration Notes

For morning briefings, use watch_runs.py --json and parse the output.

For detailed analysis of a specific run, use characterize_run.py.

For A/B testing or hyperparameter comparisons, use compare_runs.py.