Blog
Field notes from the DB.
Benchmarks, plan diffs, lock-chain post-mortems, and opinions on what real DB observability should look like in 2026.
- Redis·2026-04-26·12 min read
Redis monitoring in production: the 2026 guide
INFO, slowlog, latency monitor, keyspace notifications, big-key sampling — what to scrape from each, and the eight metrics that predict every Redis incident before it pages.
- Postgres·2026-04-25·13 min read
Postgres bloat and autovacuum: a 2026 tuning guide
What table and index bloat actually costs you, how autovacuum works in 16+, the parameters that matter, and the queries to find your worst offenders before they trigger an OOM.
- MySQL·2026-04-23·10 min read
MySQL replica lag: 9 causes, ranked by how often they bite
Seconds_Behind_Master is a lying integer. Here's a real diagnostic order — single-threaded apply, long transactions, schema migrations, network — with the SQL to confirm each.
- Postgres·2026-04-22·14 min read
pg_stat_statements: the complete 2026 guide
Every column, every gotcha, the queries you should run today, and why pg_stat_statements is still the most useful 80 lines of telemetry in Postgres — even with five new alternatives in 2026.
- ClickHouse·2026-04-20·11 min read
ClickHouse in production: monitoring without becoming a query hot-spot yourself
system.query_log is huge. system.parts is huger. Here's what to actually scrape, what to throw away, and how to monitor a ClickHouse cluster without spending half its CPU on system queries.
- Postgres·2026-04-10·11 min read
Why your Postgres p99 latency lies — and what to track instead
p99 over 1m windows is the most-displayed and most-misleading number on every DBM dashboard. Here's the histogram math, the seasonality math, and a saner default.
- Pricing·2026-03-22·9 min read
We added up Datadog DBM at 50 databases. Here's the bill.
A line-by-line walkthrough of what 50 Postgres + 12 MySQL + 8 Mongo databases actually cost on Datadog DBM in 2026, with ways to reduce it that don't involve switching tools.
- BYOC·2026-03-04·14 min read
Why regulated SaaS can't use Datadog DBM — and the BYOC fix
Walking through the architecture of a BYOC observability deployment: where data lives, what crosses the boundary, and how to satisfy SOC2 / HIPAA / GDPR without giving up the UX.
- Postgres·2026-04-18·17 min read
Postgres slow queries: 12 causes and how to find each one
A field-tested playbook for diagnosing a slow Postgres query in production — from missing indexes to plan flips to bloated tables — with the SQL to find each cause and the fix.
- Postgres·2026-04-17·14 min read
Postgres connection pooling: pgBouncer, RDS Proxy, and the math you skipped
Why max_connections is the wrong knob, how pgBouncer pool modes really differ, and the back-of-envelope formula that tells you the right pool size for your workload.
- Postgres·2026-04-15·11 min read
Postgres lock chains: how to find the session blocking yours
A practical walkthrough of pg_locks, pg_blocking_pids, and the recursive CTE that gives you the full chain — including the AccessExclusiveLocks that quietly take your DB down.
- SRE·2026-04-14·9 min read
Database SLOs that aren't useless: a working definition
Most DB SLOs are 'CPU under 80%.' That's a budget alert, not a service-level objective. Here's how to define an SLO an executive can sign off on and an engineer can act on.
- MySQL·2026-04-12·13 min read
MySQL Performance Schema vs sys schema: a 2026 monitoring guide
Performance Schema is unreadable. sys schema is friendly but lossy. Here's exactly which to use for which production question, with the eight queries every MySQL DBA should know by heart.
- Postgres·2026-04-11·16 min read
EXPLAIN ANALYZE for Postgres: read every line in 2026
The vocabulary that turns a query plan from a wall of text into a story. Costs, rows, loops, buffers, timing — what each means in 2026 (Pg 16+), and the four anti-patterns to spot in five seconds.
- MongoDB·2026-04-08·14 min read
MongoDB performance monitoring in production: a 2026 guide
Four surfaces (serverStatus, db.stats, currentOp, profiler), a sane default for what to scrape from each, and how to reason about replica lag, oplog window, and aggregation pipeline cost.
- MongoDB·2026-04-06·12 min read
Sharded MongoDB monitoring: the metrics that predict an imbalance
Chunk distribution, jumbo chunks, balancer round time, hot shards. The handful of metrics that distinguish a healthy sharded cluster from one that's about to need a rebalance party.
- AI·2026-04-04·12 min read
Anomaly detection on database metrics: why thresholds fail and what works
A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.
- DevOps·2026-04-02·10 min read
Monitoring schema migrations: how to ship without taking the database down
ALTER TABLE on a billion-row table is the most-feared 30-line PR in any backend repo. Here's the monitoring you need before, during, and after — for Postgres, MySQL, and MongoDB.
- AI·2026-03-28·11 min read
Database capacity forecasting that actually catches breaches 30 days out
Linear regression isn't enough. ARIMA is overkill. Prophet works but you need to know which exogenous variables to feed it. A practical recipe for capacity forecasts that page you 30 days before the cliff.
- Redis·2026-03-24·8 min read
Redis SLOWLOG: the misunderstood telemetry that catches half your incidents
Most teams ship Redis with default SLOWLOG settings and never look at it. Here's how to tune it, what to scrape from it, and the three Redis incident classes that only show up in SLOWLOG.
- Postgres·2026-03-20·10 min read
Postgres transaction-ID wraparound: 4 hours from your worst Saturday
When pg_stat_activity shows 'autovacuum (to prevent wraparound)' and your write rate stops, you have 4 hours of work to do correctly or your DB goes read-only. Here's the real runbook.
· · ·
Watch your databases the way you watch your services.
Book a 30-minute demo. We'll spec your fleet together and quote your first 30-day deal.