mongodb

Verified·Scanned 2/18/2026

Design schemas, write queries, and configure MongoDB for consistency and performance.

from clawhub.ai·v7d52098·3.9 KB·0 installs
Scanned from 1.0.0 at 7d52098 · Transparency log ↗
$ vett add clawhub.ai/ivangdavila/mongodb

Schema Design Decisions

  • Embed when data is queried together and doesn't grow unboundedly
  • Reference when data is large, accessed independently, or many-to-many
  • Arrays that grow infinitely = disaster—document size limit 16MB; use bucketing pattern
  • Denormalize for read performance, accept update complexity—no JOINs means duplicate data

Array Pitfalls

  • Arrays > 1000 elements hurt performance—pagination inside documents is hard
  • $push without $slice = unbounded growth; use $push: {$each: [...], $slice: -100}
  • Multikey indexes on arrays: index entry per element—can explode index size
  • Can't have multikey index on more than one array field in compound index

$lookup Is Not a JOIN

  • $lookup performance degrades with collection size—no index usage on foreign collection (until 5.0)
  • One $lookup per pipeline stage—nested lookups get complex and slow
  • Consider embedding or application-side joins for frequent lookups
  • $lookup with pipeline (5.0+) can filter before joining—massive performance improvement

Index Strategy

  • ESR rule for compound indexes: Equality fields first, Sort fields next, Range fields last
  • MongoDB doesn't do efficient index intersection—single compound index often better than multiple
  • Only one text index per collection—plan carefully; use Atlas Search for complex text
  • TTL index for auto-expiration: {createdAt: 1}, {expireAfterSeconds: 86400}

Aggregation Pipeline

  • Stage order matters: $match and $project early to reduce documents flowing through
  • $match at start can use indexes; $match after $unwind or $lookup cannot
  • allowDiskUse: true for large aggregations—without it, 100MB memory limit per stage
  • $facet for multiple aggregations in one query—but all facets process same documents

Consistency & Transactions

  • Default read/write concern not fully consistent—{w: "majority", readConcern: "majority"} for strong consistency
  • Multi-document transactions since 4.0—but add latency and lock overhead; design to minimize
  • Single-document operations are atomic—exploit this by embedding related data
  • retryWrites: true in connection string—handles transient failures

ObjectId Behavior

  • Contains timestamp: ObjectId.getTimestamp()—can extract creation time
  • Roughly time-ordered—can sort by _id for creation order
  • Not random—predictable if you know creation time and machine; don't rely on for security
  • 12 bytes: 4 timestamp + 5 random + 3 counter

Explain & Performance

  • explain("executionStats") shows actual execution—not just plan
  • Look for COLLSCAN—means no index used; add appropriate index
  • totalDocsExamined vs nReturned—ratio should be close to 1; otherwise index missing
  • Covered queries: IXSCAN + "totalDocsExamined": 0—all data from index

Document Size

  • 16MB max per document—plan for this; use GridFS for large files
  • BSON overhead: field names repeated per document—short names save space at scale
  • Nested depth limit 100 levels—rarely hit but exists

Read Preferences

  • primary for strong consistency; secondaryPreferred for read scaling with eventual consistency
  • Stale reads on secondaries—replication lag can be seconds
  • nearest for lowest latency—but may read stale data
  • Write always goes to primary—read preference doesn't affect writes

Common Mistakes

  • Treating MongoDB as "schemaless"—still need schema design; just enforced in app
  • Not adding indexes—scans entire collection; every query pattern needs index
  • Giant documents via array pushes—hit 16MB limit or slow BSON parsing
  • Ignoring write concern—data may appear written but not persisted/replicated