Obsfly
build vs buy / overviewliveBuild vs Buy · monitoring · field notes

Build vs Buy

Grafana DBM build-vs-buy: what the 'we'll just use Prometheus' plan actually costs

postgres_exporter ships in an afternoon. Per-query digests, plan-flip detection, lock-chain graphs, anomaly bands — each of those costs 1–3 engineer-weeks. We measure the real build cost vs Obsfly's $39/DB and tell you when each side wins.

Published ·13 min read

The conversation goes like this. At standup someone says “we’ll just stand up Grafana with postgres_exporter and Prometheus, we already have the stack.” Nobody objects. Six months later you have a homemade DBM, you missed two plan flips that caused outages, and the platform engineer who built it is now its full-time owner.

This isn’t an attack on Grafana — Grafana is a great visualization tool and we run it ourselves. It’s a measurement of what “just” actually means when the goal is DBM-grade tooling.

On this page
  1. What Grafana + postgres_exporter actually gives you
  2. The 11 things you have to build
  3. Build cost in engineer-weeks
  4. Side-by-side with Obsfly
  5. When DIY actually wins
  6. When DIY is a trap
  7. FAQ

What Grafana + postgres_exporter actually gives you

  • Cluster-level counters: connections, transactions, buffer hit ratio, replication lag, dead tuple ratio, WAL position. Genuinely useful.
  • A dashboard editor that is the best in the industry. Free to use, easy to share, plays with multiple data sources.
  • Prometheus alerting on threshold rules. Adequate for “disk full” class of alerts.
  • A starter dashboard you can ship in an afternoon. This is the honeypot — it’s real, and it’s the cheapest part of the journey.

The 11 things you have to build

This is what postgres_exporter + Prometheus + Grafana does not ship:

CapabilityBuild effortWhy it matters
1. Per-query top-N with percentiles1–2 engineer-weekspg_stat_statements polling, digest normalization, p50/p95/p99 computation, dashboard.
2. Plan capture1–2 weeksauto_explain configuration, log scraping or pg_stat_plans extension, storage backend.
3. Plan-flip / regression detection1 weekstructure-hash diff per signature over a window, alert when hash flips.
4. Lock-chain / blocking-session graph1 weekpg_blocking_pids() traversal, graph layout, surfacing in dashboard.
5. Anomaly detection per metric2–3 weeksPick algorithm (BOCPD / Prophet / STL), per-metric training, persistent state.
6. Forecast bands (30/90/365 day)1–2 weeksSeasonal decomposition, percentile bands, breach detection.
7. Multi-database fan-out2 weeks per engineMySQL Performance Schema, MongoDB profiler, Redis slowlog have different shapes.
8. Alerting beyond thresholds1 weekMulti-variate, change-point, derived signals (cache hit + lock wait + qps).
9. AI rewrite / index suggestion2–4 weeksLLM integration, prompt engineering, plan-aware context assembly.
10. Retention strategy1 week ongoingPrometheus is not for 15-month retention. You’ll add Thanos or VictoriaMetrics.
11. Maintenance + upgrade cycles0.25–0.5 FTE foreverExporter upgrades, breaking schema changes, dashboard drift, alert tuning.

Build cost in engineer-weeks

Total greenfield build for a Postgres-only, single-engine DBM stack: 16–28 engineer-weeks. Multi-database (add MySQL + MongoDB) doubles it. Steady-state maintenance: 0.25–0.5 FTE per quarter.

At a $180k loaded engineer costAnnual
Build (one-time, amortized over 2 years)$36k–$63k / yr
Maintenance (steady-state)$45k–$90k / yr
Total (DIY, Postgres-only)$81k–$153k / yr
Total (DIY, +MySQL +MongoDB)$150k–$280k / yr

Side-by-side with Obsfly

CapabilityGrafana + Prometheus + exportersObsfly
Top-N queries with p99Build (1–2 wk)Out of box
Plan history / diffBuild (2–3 wk)Out of box
Lock chainsBuild (1 wk)Out of box
Multi-variate anomalyBuild (2–3 wk)Out of box
Forecast bandsBuild (1–2 wk)Out of box
Multi-DB fan-outBuild per engine9 engines covered
AI query rewriteBuild (2–4 wk) or skipBuilt in (Claude)
BYOC / SovereignSelf-host the OSS stackFirst-class
Time to first useful insightWeekend → quarter5 minutes
Cost (50-DB fleet, 2 yr)$240k–$420k engineer time$1,950/mo × 24 = $46,800

When DIY actually wins

  • Single Postgres database, no plan history needed, no anomaly detection needed. The afternoon dashboard is genuinely enough.
  • Strong infra team with slack capacity and an existing observability org that builds these things as a competence. The maintenance cost is internalized.
  • Specific compliance requirements that no commercial vendor can meet — though check if BYOC or Sovereign solves it first, since we built both for exactly this case.

When DIY is a trap

  • More than two database engines in the fleet.
  • Series A–C team where engineering time is the constraint, not budget.
  • You’ve already had one production incident that better tooling would have caught (plan flip, hot lock chain, slow regression).
  • The platform engineer who’d build it is also your only Kubernetes person, only CI/CD person, and only secrets-management person. They will not maintain four things at once well.

FAQ

Can I keep Grafana and use Obsfly as the data source?+
Yes. Obsfly exposes a Prometheus-compatible /metrics endpoint. Your existing Grafana dashboards keep working; you get richer series (per-query digests, plan-flip events, forecast bands) added to them.
What about Grafana Cloud Database Observability?+
It's a thin layer on top of the same exporter approach plus k6 traces. Useful for AWS-native shops already deep in Grafana Cloud, but the depth gap to a real DBM (plan history, lock chains, AI) is similar.
Doesn't postgres_exporter already capture pg_stat_statements?+
Some forks expose subsets of pg_stat_statements as Prometheus metrics. None do it well — the dimensionality blows up Prometheus storage, and you lose the actual query text in the digest. You end up writing a separate digest collector either way.
What's the cheapest way to validate the build estimate?+
Ask your most senior platform engineer to scope a one-pager for capability #5 only (anomaly detection per metric). If they come back with less than 2 weeks, the estimate is wrong; if more than 4 weeks, it's wrong the other way. The honest scope is in there.

Keep reading

· · ·

Watch your databases the way you watch your services.

Book a 30-minute demo. We'll spec your fleet together and quote your first 30-day deal.