devops

Verified·Scanned 2/18/2026

Automate deployments, manage infrastructure, and build reliable CI/CD pipelines.

from clawhub.ai·v01336a0·4.0 KB·0 installs
Scanned from 1.0.0 at 01336a0 · Transparency log ↗
$ vett add clawhub.ai/ivangdavila/devops

DevOps Rules

CI/CD Pipelines

  • Fail fast: run linting and unit tests before expensive integration tests — saves time and compute
  • Cache dependencies between runs — npm install on every build wastes minutes
  • Pin action versions with SHA, not tags — actions/checkout@v3 can change, SHA is immutable
  • Secrets in environment variables, never in code or logs — mask them in CI output
  • Parallel jobs for independent steps — test, lint, and build can run simultaneously

Deployment Strategies

  • Blue-green: run new version alongside old, switch traffic atomically — instant rollback by switching back
  • Canary: route percentage of traffic to new version — catch issues before full rollout
  • Rolling: update instances incrementally — balance between speed and risk
  • Always have rollback plan before deploying — know exactly how to revert
  • Deploy the same artifact to all environments — build once, promote through stages

Infrastructure as Code

  • Version control all infrastructure — terraform, ansible, cloudformation in git
  • Never apply changes without plan/diff review — terraform plan before apply
  • State files contain secrets — store remotely with encryption, never in git
  • Modules for reusable components — don't copy-paste infrastructure definitions
  • Separate environments with workspaces or directories — dev changes shouldn't affect prod

Containers

  • One process per container — containers are not VMs
  • Health checks are mandatory — orchestrators need them for routing and restarts
  • Don't run as root — use non-root USER in Dockerfile
  • Immutable images: config via environment, not baked in — same image in all environments
  • Tag images with git SHA, not just latest — know exactly what's deployed

Secrets Management

  • Never store secrets in environment files committed to git — use vault, sealed secrets, or CI secret storage
  • Rotate secrets regularly — automation makes rotation painless
  • Different secrets per environment — dev leak shouldn't compromise prod
  • Audit secret access — know who accessed what and when
  • Secrets in memory, not disk when possible — temp files persist longer than expected

Monitoring & Alerting

  • Four golden signals: latency, traffic, errors, saturation — start here
  • Alert on symptoms, not causes — "users seeing errors" not "CPU high"
  • Every alert must be actionable — if you can't do anything, it's noise
  • Dashboard per service with key metrics — one glance shows health
  • Structured logs (JSON) for machine parsing — grep works, but queries are better

Reliability

  • Define SLOs before building alerting — what does "healthy" mean for this service?
  • Error budgets: some failures are acceptable — 99.9% means 8 hours downtime/year is OK
  • Chaos engineering in staging — break things intentionally before prod breaks accidentally
  • Runbooks for common incidents — 3am is not the time to figure out recovery steps
  • Post-mortems without blame — focus on systems, not people

Common Mistakes

  • SSH into prod to fix things — all changes through automation, or you'll forget what you did
  • No staging environment — "works on my machine" doesn't mean works in prod
  • Ignoring flaky tests — they erode trust in CI, either fix or delete
  • Manual steps in deployment — if it's not automated, it'll be done wrong eventually
  • Monitoring only happy paths — check error rates and edge cases too

Networking

  • Internal services don't need public IPs — use private subnets, expose only load balancers
  • TLS everywhere, including internal traffic — zero trust, even behind firewall
  • DNS for service discovery — hardcoded IPs break when things move
  • Load balancer health checks separate from app health — LB needs fast response, app health can be thorough
  • Firewall default deny — explicitly allow what's needed, block everything else