mongodb
✓Verified·Scanned 2/18/2026
Design schemas, write queries, and configure MongoDB for consistency and performance.
from clawhub.ai·v7d52098·3.9 KB·0 installs
Scanned from 1.0.0 at 7d52098 · Transparency log ↗
$ vett add clawhub.ai/ivangdavila/mongodb
Schema Design Decisions
- Embed when data is queried together and doesn't grow unboundedly
- Reference when data is large, accessed independently, or many-to-many
- Arrays that grow infinitely = disaster—document size limit 16MB; use bucketing pattern
- Denormalize for read performance, accept update complexity—no JOINs means duplicate data
Array Pitfalls
- Arrays > 1000 elements hurt performance—pagination inside documents is hard
$pushwithout$slice= unbounded growth; use$push: {$each: [...], $slice: -100}- Multikey indexes on arrays: index entry per element—can explode index size
- Can't have multikey index on more than one array field in compound index
$lookup Is Not a JOIN
$lookupperformance degrades with collection size—no index usage on foreign collection (until 5.0)- One
$lookupper pipeline stage—nested lookups get complex and slow - Consider embedding or application-side joins for frequent lookups
$lookupwith pipeline (5.0+) can filter before joining—massive performance improvement
Index Strategy
- ESR rule for compound indexes: Equality fields first, Sort fields next, Range fields last
- MongoDB doesn't do efficient index intersection—single compound index often better than multiple
- Only one text index per collection—plan carefully; use Atlas Search for complex text
- TTL index for auto-expiration:
{createdAt: 1}, {expireAfterSeconds: 86400}
Aggregation Pipeline
- Stage order matters:
$matchand$projectearly to reduce documents flowing through $matchat start can use indexes;$matchafter$unwindor$lookupcannotallowDiskUse: truefor large aggregations—without it, 100MB memory limit per stage$facetfor multiple aggregations in one query—but all facets process same documents
Consistency & Transactions
- Default read/write concern not fully consistent—
{w: "majority", readConcern: "majority"}for strong consistency - Multi-document transactions since 4.0—but add latency and lock overhead; design to minimize
- Single-document operations are atomic—exploit this by embedding related data
retryWrites: truein connection string—handles transient failures
ObjectId Behavior
- Contains timestamp:
ObjectId.getTimestamp()—can extract creation time - Roughly time-ordered—can sort by
_idfor creation order - Not random—predictable if you know creation time and machine; don't rely on for security
- 12 bytes: 4 timestamp + 5 random + 3 counter
Explain & Performance
explain("executionStats")shows actual execution—not just plan- Look for
COLLSCAN—means no index used; add appropriate index totalDocsExaminedvsnReturned—ratio should be close to 1; otherwise index missing- Covered queries:
IXSCAN+"totalDocsExamined": 0—all data from index
Document Size
- 16MB max per document—plan for this; use GridFS for large files
- BSON overhead: field names repeated per document—short names save space at scale
- Nested depth limit 100 levels—rarely hit but exists
Read Preferences
primaryfor strong consistency;secondaryPreferredfor read scaling with eventual consistency- Stale reads on secondaries—replication lag can be seconds
nearestfor lowest latency—but may read stale data- Write always goes to primary—read preference doesn't affect writes
Common Mistakes
- Treating MongoDB as "schemaless"—still need schema design; just enforced in app
- Not adding indexes—scans entire collection; every query pattern needs index
- Giant documents via array pushes—hit 16MB limit or slow BSON parsing
- Ignoring write concern—data may appear written but not persisted/replicated