postgresql

Verified·Scanned 2/18/2026

Write efficient PostgreSQL queries and design schemas with proper indexing and patterns.

from clawhub.ai·v0dd809f·3.6 KB·0 installs
Scanned from 1.0.0 at 0dd809f · Transparency log ↗
$ vett add clawhub.ai/ivangdavila/postgresql

Indexes I Forget to Create

  • Partial index WHERE active = true—80% smaller when most rows inactive; suggest for status columns
  • Expression index ON lower(email)—must match query exactly; without it, WHERE lower(email) scans
  • Covering index INCLUDE (name, email)—enables index-only scan; check EXPLAIN for "Heap Fetches"
  • Foreign key columns—not auto-indexed in PG; JOINs and ON DELETE CASCADE need them
  • Composite index order matters—(a, b) helps WHERE a = ? but not WHERE b = ?

Index Traps

  • Unused indexes hurt every INSERT/UPDATE—query pg_stat_user_indexes for idx_scan = 0, drop them
  • Too many indexes on write-heavy tables—balance carefully
  • Index on low-cardinality column (boolean, status) often useless—PG prefers seq scan
  • LIKE '%suffix' can't use B-tree—need pg_trgm GIN index or reverse() expression index

Query Patterns I Underuse

  • SELECT FOR UPDATE SKIP LOCKED—job queue without external tools; skip rows being processed
  • pg_advisory_lock(key)—application-level mutex without table; unlock explicitly or on disconnect
  • IS NOT DISTINCT FROM—NULL-safe equality; cleaner than (a = b OR (a IS NULL AND b IS NULL))
  • DISTINCT ON (x) ORDER BY x, y—first row per group without subquery; PG-specific but powerful

Connection Management (Often Ignored)

  • PgBouncer essential with >50 connections—each PG connection uses ~10MB; pool at transaction level
  • statement_timeout = '30s' per role—prevents runaway queries from killing database
  • idle_in_transaction_session_timeout = '5min'—kills abandoned transactions holding locks
  • Default 100 max_connections too low for production, too high wastes memory—tune based on RAM

Data Types I Get Wrong

  • SERIAL deprecated—use GENERATED ALWAYS AS IDENTITY
  • TIMESTAMP without timezone—almost always wrong; use TIMESTAMPTZ, PG stores as UTC
  • Float for money—use NUMERIC(12,2) or integer cents; float math breaks: 0.1 + 0.2 ≠ 0.3
  • VARCHAR(n) vs TEXT—no performance difference in PG; use TEXT unless constraint needed

Vacuum & Bloat (Never Think About)

  • High-UPDATE tables bloat—dead tuples accumulate; pg_repack reclaims without locks
  • VACUUM ANALYZE after bulk insert—updates statistics; query planner needs current data
  • Autovacuum lag on big tables—tune autovacuum_vacuum_cost_delay or manual vacuum
  • Transaction wraparound: if xid exhausted, DB stops—autovacuum prevents but monitor

EXPLAIN I Don't Read Right

  • Always EXPLAIN (ANALYZE, BUFFERS)—actual times + I/O; estimate-only misleads
  • "Heap Fetches: 1000" with index—missing columns, add INCLUDE to index
  • Seq scan not always bad—faster than index for >10-20% of table; check row estimates
  • "Rows" estimate way off—run ANALYZE or check if stats target too low

Full-Text Search Mistakes

  • Creating tsvector on the fly—precompute as stored generated column with GIN index
  • plainto_tsquery for user input—handles spaces without syntax errors; not to_tsquery
  • Missing language parameter—'english' stems words; 'simple' exact match
  • FTS is word-based—LIKE '%exact phrase%' still needed for substring match

Transaction Isolation

  • Default READ COMMITTED—phantom reads in reports; use REPEATABLE READ for consistency
  • SERIALIZABLE catches conflicts—but must handle 40001 error with retry loop
  • Long transactions block vacuum and hold locks—keep under seconds, not minutes