Blog

Field notes from the DB.

Benchmarks, plan diffs, lock-chain post-mortems, and opinions on what real DB observability should look like in 2026.

Pricing·2026-05-26·12 min read

Datadog DBM pricing in 2026: a real-world calculator (and what teams actually pay)

Datadog DBM list price says $35/DB-instance. The bill you get says $9,400/month. We break down host + DB-instance + APM + Infra + retention overages so you can model your actual number — and compare it line-by-line.

Read post →

Build vs Buy·13 min
Grafana DBM build-vs-buy: what the 'we'll just use Prometheus' plan actually costs
postgres_exporter ships in an afternoon. Per-query digests, plan-flip detection, lock-chain graphs, anomaly bands — each of those costs 1–3 engineer-weeks. We measure the real build cost vs Obsfly's $39/DB and tell you when each side wins.
Read →
AWS·11 min
RDS Performance Insights: where it stops and what you actually need next
PI is free up to 7 days, ships with every RDS, and surfaces top SQL by wait class. It also stops short on plan history, multi-host correlation, multi-engine fleets, alerting, and AI suggestions. Here's where the line is and what to bolt on.
Read →
Comparison·11 min
pganalyze vs Obsfly: which Postgres monitoring tool is right for you
pganalyze is the gold standard for Postgres-only depth. Obsfly covers 9 databases at lower per-DB pricing with BYOC. The honest, dimension-by-dimension comparison — including what pganalyze does better.
Read →
Comparison·12 min
Datadog DBM vs Obsfly: side-by-side feature and pricing breakdown
The honest comparison — pricing, feature parity, deployment models, and what Datadog does better. Includes the 3 places Datadog DBM is the right answer in 2026.
Read →
Oracle·11 min
Oracle ASH and AWR: a field guide for the rest of us
ASH and AWR are two of the best performance views any database has shipped — but they sit behind the Diagnostics Pack license. This is how to get 90% of the value, with or without it.
Read →
Redis·12 min
Redis monitoring in production: the 2026 guide
INFO, slowlog, latency monitor, keyspace notifications, big-key sampling — what to scrape from each, and the eight metrics that predict every Redis incident before it pages.
Read →
Postgres·13 min
Postgres bloat and autovacuum: a 2026 tuning guide
What table and index bloat actually costs you, how autovacuum works in 16+, the parameters that matter, and the queries to find your worst offenders before they trigger an OOM.
Read →
SQL Server·12 min
SQL Server Query Store: the field guide most teams skip
Query Store is the single biggest reason a SQL Server upgrade past 2016 was worth the weekend. The settings that matter, the DMVs you actually use, and how to catch a plan regression in two queries.
Read →
MySQL·10 min
MySQL replica lag: 9 causes, ranked by how often they bite
Seconds_Behind_Master is a lying integer. Here's a real diagnostic order — single-threaded apply, long transactions, schema migrations, network — with the SQL to confirm each.
Read →
Elasticsearch·10 min
Elasticsearch slow log: the cheapest performance tool you're misconfiguring
Default thresholds — 10s warn, 1s info — never catch the queries actually hurting your cluster. Here's how to tune the slow log per-index, what query / fetch / index split actually means, and 3 incident patterns only the slow log surfaces cleanly.
Read →
Postgres·14 min
pg_stat_statements: the complete 2026 guide
Every column, every gotcha, the queries you should run today, and why pg_stat_statements is still the most useful 80 lines of telemetry in Postgres — even with five new alternatives in 2026.
Read →
ClickHouse·11 min
ClickHouse in production: monitoring without becoming a query hot-spot yourself
system.query_log is huge. system.parts is huger. Here's what to actually scrape, what to throw away, and how to monitor a ClickHouse cluster without spending half its CPU on system queries.
Read →
AI·9 min
AI for database query optimization: what's real in 2026 (and what's not)
Two years of shipping LLM-grounded query analysis to production databases. What AI is genuinely good at, what it's bad at, why grounding beats model size, and how BYO LLM works in regulated deployments.
Read →
Postgres·11 min
Why your Postgres p99 latency lies — and what to track instead
p99 over 1m windows is the most-displayed and most-misleading number on every DBM dashboard. Here's the histogram math, the seasonality math, and a saner default.
Read →
Pricing·9 min
We added up Datadog DBM at 50 databases. Here's the bill.
A line-by-line walkthrough of what 50 Postgres + 12 MySQL + 8 Mongo databases actually cost on Datadog DBM in 2026, with ways to reduce it that don't involve switching tools.
Read →
BYOC·14 min
Why regulated SaaS can't use Datadog DBM — and the BYOC fix
Walking through the architecture of a BYOC observability deployment: where data lives, what crosses the boundary, and how to satisfy SOC2 / HIPAA / GDPR without giving up the UX.
Read →
Postgres·17 min
Postgres slow queries: 12 causes and how to find each one
A field-tested playbook for diagnosing a slow Postgres query in production — from missing indexes to plan flips to bloated tables — with the SQL to find each cause and the fix.
Read →
Postgres·14 min
Postgres connection pooling: pgBouncer, RDS Proxy, and the math you skipped
Why max_connections is the wrong knob, how pgBouncer pool modes really differ, and the back-of-envelope formula that tells you the right pool size for your workload.
Read →
Postgres·11 min
Postgres lock chains: how to find the session blocking yours
A practical walkthrough of pg_locks, pg_blocking_pids, and the recursive CTE that gives you the full chain — including the AccessExclusiveLocks that quietly take your DB down.
Read →
SRE·9 min
Database SLOs that aren't useless: a working definition
Most DB SLOs are 'CPU under 80%.' That's a budget alert, not a service-level objective. Here's how to define an SLO an executive can sign off on and an engineer can act on.
Read →
MySQL·13 min
MySQL Performance Schema vs sys schema: a 2026 monitoring guide
Performance Schema is unreadable. sys schema is friendly but lossy. Here's exactly which to use for which production question, with the eight queries every MySQL DBA should know by heart.
Read →
Postgres·16 min
EXPLAIN ANALYZE for Postgres: read every line in 2026
The vocabulary that turns a query plan from a wall of text into a story. Costs, rows, loops, buffers, timing — what each means in 2026 (Pg 16+), and the four anti-patterns to spot in five seconds.
Read →
MongoDB·14 min
MongoDB performance monitoring in production: a 2026 guide
Four surfaces (serverStatus, db.stats, currentOp, profiler), a sane default for what to scrape from each, and how to reason about replica lag, oplog window, and aggregation pipeline cost.
Read →
MongoDB·12 min
Sharded MongoDB monitoring: the metrics that predict an imbalance
Chunk distribution, jumbo chunks, balancer round time, hot shards. The handful of metrics that distinguish a healthy sharded cluster from one that's about to need a rebalance party.
Read →
AI·12 min
Anomaly detection on database metrics: why thresholds fail and what works
A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.
Read →
DevOps·10 min
Monitoring schema migrations: how to ship without taking the database down
ALTER TABLE on a billion-row table is the most-feared 30-line PR in any backend repo. Here's the monitoring you need before, during, and after — for Postgres, MySQL, and MongoDB.
Read →
AI·11 min
Database capacity forecasting that actually catches breaches 30 days out
Linear regression isn't enough. ARIMA is overkill. Prophet works but you need to know which exogenous variables to feed it. A practical recipe for capacity forecasts that page you 30 days before the cliff.
Read →
Redis·8 min
Redis SLOWLOG: the misunderstood telemetry that catches half your incidents
Most teams ship Redis with default SLOWLOG settings and never look at it. Here's how to tune it, what to scrape from it, and the three Redis incident classes that only show up in SLOWLOG.
Read →
Postgres·10 min
Postgres transaction-ID wraparound: 4 hours from your worst Saturday
When pg_stat_activity shows 'autovacuum (to prevent wraparound)' and your write rate stops, you have 4 hours of work to do correctly or your DB goes read-only. Here's the real runbook.
Read →

· · ·

Watch your databases the way you watch your services.

Book a 30-minute demo. We'll spec your fleet together and quote your first 30-day deal.

Book a demo Read the docs

Field notes from the DB.

Datadog DBM pricing in 2026: a real-world calculator (and what teams actually pay)

Grafana DBM build-vs-buy: what the 'we'll just use Prometheus' plan actually costs

RDS Performance Insights: where it stops and what you actually need next

pganalyze vs Obsfly: which Postgres monitoring tool is right for you

Datadog DBM vs Obsfly: side-by-side feature and pricing breakdown

Oracle ASH and AWR: a field guide for the rest of us

Redis monitoring in production: the 2026 guide

Postgres bloat and autovacuum: a 2026 tuning guide

SQL Server Query Store: the field guide most teams skip

MySQL replica lag: 9 causes, ranked by how often they bite

Elasticsearch slow log: the cheapest performance tool you're misconfiguring

pg_stat_statements: the complete 2026 guide

ClickHouse in production: monitoring without becoming a query hot-spot yourself

AI for database query optimization: what's real in 2026 (and what's not)

Why your Postgres p99 latency lies — and what to track instead

We added up Datadog DBM at 50 databases. Here's the bill.

Why regulated SaaS can't use Datadog DBM — and the BYOC fix

Postgres slow queries: 12 causes and how to find each one

Postgres connection pooling: pgBouncer, RDS Proxy, and the math you skipped

Postgres lock chains: how to find the session blocking yours

Database SLOs that aren't useless: a working definition

MySQL Performance Schema vs sys schema: a 2026 monitoring guide

EXPLAIN ANALYZE for Postgres: read every line in 2026

MongoDB performance monitoring in production: a 2026 guide

Sharded MongoDB monitoring: the metrics that predict an imbalance

Anomaly detection on database metrics: why thresholds fail and what works

Monitoring schema migrations: how to ship without taking the database down

Database capacity forecasting that actually catches breaches 30 days out

Redis SLOWLOG: the misunderstood telemetry that catches half your incidents

Postgres transaction-ID wraparound: 4 hours from your worst Saturday

Watch your databases the way you watch your services.