influxdb

Verified·Scanned 2/18/2026

Store and query time-series data with proper schema design and retention.

from clawhub.ai·vb58ffd1·3.9 KB·0 installs
Scanned from 1.0.0 at b58ffd1 · Transparency log ↗
$ vett add clawhub.ai/ivangdavila/influxdb

Version Differences

  • InfluxDB 2.x uses Flux query language, 1.x uses InfluxQL—syntax completely different
  • 2.x: buckets, organizations, tokens; 1.x: databases, retention policies, users
  • Don't mix documentation—check version before copying queries

Tags vs Fields (Critical)

  • Tags are indexed, fields are not—filter on tags, aggregate on fields
  • Tag values must be strings—numbers as tags work but waste index space
  • Fields support numbers, strings, booleans—store metrics as fields
  • Wrong choice kills query performance—can't change after data written

Cardinality Trap

  • High-cardinality tags destroy performance—unique user IDs as tags = disaster
  • Cardinality = unique combinations of tag values—grows multiplicatively
  • Check with SHOW CARDINALITY (1.x) or influx bucket inspect (2.x)
  • Rule of thumb: <100K series per measurement; millions = problems

Line Protocol

  • Format: measurement,tag1=v1,tag2=v2 field1=1,field2="str" timestamp
  • No spaces around = in tags—space separates tags from fields
  • String fields need quotes, tag values don't—field="text" vs tag=text
  • Timestamps in nanoseconds by default—specify precision to avoid mistakes

Timestamps

  • Default precision is nanoseconds—sending seconds without precision flag = year 2000 data
  • Specify on write: precision=s for seconds, precision=ms for milliseconds
  • Missing timestamp uses server time—usually fine for real-time ingestion
  • Timestamps are UTC—client timezone doesn't matter

Retention and Downsampling

  • Set retention policy/bucket duration—data older than retention auto-deleted
  • Raw data at 10s intervals for 7 days, downsample to 1min for 30 days, 1h for 1 year
  • 2.x: Tasks for downsampling; 1.x: Continuous Queries
  • Without downsampling, storage grows forever and queries slow down

Flux Query Patterns (2.x)

  • Always start with from(bucket:) then |> range(start:)—range is required
  • |> filter(fn: (r) => r._measurement == "cpu") for filtering
  • |> aggregateWindow(every: 1h, fn: mean) for time-based aggregation
  • Chain transforms with |> pipe operator—order matters for performance

InfluxQL Patterns (1.x)

  • SELECT mean("value") FROM "measurement" WHERE time > now() - 1h GROUP BY time(5m)
  • Double quotes for identifiers, single quotes for string literals
  • GROUP BY time() for time-based aggregation—required for most dashboards
  • FILL(none) to skip empty intervals, FILL(previous) to carry forward

Schema Design

  • Measurement name = table name—one per metric type (cpu, memory, requests)
  • Tag for dimensions you filter/group by—host, region, service
  • Field for values you aggregate—usage_percent, count, latency_ms
  • Avoid encoding data in measurement names—cpu.host1 wrong, cpu + host=host1 right

Write Performance

  • Batch writes—individual points have HTTP overhead
  • Telegraf for production ingestion—handles batching, buffering, retry
  • Write to localhost if possible—network latency adds up at high throughput
  • async writes in client libraries—don't block on each write

Query Performance

  • Always include time range—unbounded queries scan everything
  • Filter on tags before fields—tags use index, fields scan data
  • Limit results with LIMIT or |> limit()—dashboard doesn't need 1M points
  • Use GROUP BY / aggregateWindow to reduce data before returning

Common Errors

  • "partial write: field type conflict"—same field with different types; fix at source
  • "max-values-per-tag limit exceeded"—cardinality too high; redesign schema
  • "database not found"—2.x uses buckets, not databases; check API version
  • Query timeout—add narrower time range or aggregate more aggressively