MongoDB
MongoDB performance monitoring in production: a 2026 guide
Four surfaces (serverStatus, db.stats, currentOp, profiler), a sane default for what to scrape from each, and how to reason about replica lag, oplog window, and aggregation pipeline cost.
MongoDB monitoring is split across four surfaces — serverStatus(), db.stats(), currentOp(), and the profiler. Each tells a different story; none alone is enough. This is what we scrape from each in production, and how to reason about replica lag, oplog window, and aggregation pipeline cost.
On this page
The four surfaces
| Surface | Cardinality | Cost | Use it for |
|---|---|---|---|
| serverStatus() | 1 doc / call | Cheap | Host-level rollups: connections, opcounters, WiredTiger cache, network |
| db.stats() | 1 doc / database | Cheap | Storage size, index size, collection count |
| currentOp() | N docs / call | Medium | Live in-flight ops, lock waits, slow op detection |
| Profiler | Continuous | Medium-high | Persisted slow-op log per database (system.profile) |
serverStatus — host metrics
Run from admin. The whole document is huge; the fields worth scraping every 15s are bounded.
db.adminCommand({ serverStatus: 1 })
// extract:
opcounters.{insert,query,update,delete,getmore,command}
opcountersRepl.* // for replicas
connections.{current,available,totalCreated}
network.{bytesIn,bytesOut,numRequests}
wiredTiger.cache.{
"bytes currently in the cache",
"tracked dirty bytes in the cache",
"pages evicted by application threads",
"unmodified pages evicted",
"modified pages evicted"
}
locks.Global.acquireCount.{r,w,R,W}
asserts.{regular,warning,msg,user,rollovers}
metrics.queryExecutor.scanned // collection scans
metrics.queryExecutor.scannedObjectsdb.stats — per-database storage
db.stats() // fields worth tracking: collections, indexes, dataSize, storageSize, indexSize, totalSize
Track these per-database, daily. Sudden growth in indexSize usually means someone added an index that doesn’t fit in cache; sudden growth in storageSize without document growth means fragmentation.
currentOp — live activity
The 1 Hz poll for live ops. Filter aggressively or you’ll DoS your own monitoring.
db.currentOp({
active: true,
$or: [
{ secs_running: { $gt: 1 } }, // > 1s
{ "lockStats.acquireWaitCount.r": { $gt: 0 } },
{ "lockStats.acquireWaitCount.w": { $gt: 0 } }
]
})For each op, scrape:
opid,op,ns(namespace = db.collection)secs_running— wall-clock duration so farcommand— the BSON of the actual operationplanSummary— index hint + plan stage nameswaitingForLock,lockStats— lock waits per scope (Global / Database / Collection)
Profiler — slow query log
Set per-database. The profiler writes to system.profile in that database. Level 1 means “log slow ops only.”
db.setProfilingLevel(1, { slowms: 100, sampleRate: 1.0 })
// then read:
db.system.profile
.find({ millis: { $gt: 200 } })
.sort({ ts: -1 }).limit(50)For each profile entry, the meaningful fields are ts, op, ns, command, planSummary, keysExamined, docsExamined, nreturned, millis, and writeConflicts.
High docsExamined with low nreturned = missing index. High writeConflicts = MongoDB’s deadlock-equivalent on an unsharded write-heavy collection.
Replica set metrics
rs.status() // derive: members[i].health, members[i].state, members[i].uptime members[i].optimeDate // for replication lag SECONDARY_optime - PRIMARY_optime // = lag, in seconds
- Track lag as
maxLagacross all secondaries — alert on it. - Oplog window:
db.getReplicationInfo().tFirsttotLast. If a secondary is > window behind, it falls off and needs a resync. - Election count from
serverStatus.electionMetrics— repeated elections mean instability.
Sharded cluster metrics
From mongos:
sh.status() // extract per shard: chunks count moveChunk activity (from changelog) balancer state config.collections (sharded collections, shard keys)
The most common sharded-cluster pathology is a hot shard — one shard handling most of the writes. Look at per-shard opcounters and chunk counts.
Aggregation pipeline performance
$group, $lookup, and $unwind stages are where aggregations become expensive. Use $indexStats and explain():
db.orders.explain("executionStats").aggregate([
{ $match: { customerId: ObjectId(...) } },
{ $group: { _id: "$status", n: { $sum: 1 } } }
])
// look for:
executionStats.totalDocsExamined / nReturned // ratio < 10 ideal
stages[i].executionTimeMillisEstimate // hot stagesFAQ
Profiler vs scraping currentOp — which is the source of truth?+
Profiler overhead?+
MongoDB Atlas — does monitoring change?+
How does Obsfly MongoDB integration work?+
Keep reading
MySQL
MySQL Performance Schema vs sys schema: a 2026 monitoring guide
Performance Schema is unreadable. sys schema is friendly but lossy. Here's exactly which to use for which production question, with the eight queries every MySQL DBA should know by heart.
AI
Anomaly detection on database metrics: why thresholds fail and what works
A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.