Build vs Buy
Grafana DBM build-vs-buy: what the 'we'll just use Prometheus' plan actually costs
postgres_exporter ships in an afternoon. Per-query digests, plan-flip detection, lock-chain graphs, anomaly bands — each of those costs 1–3 engineer-weeks. We measure the real build cost vs Obsfly's $39/DB and tell you when each side wins.
The conversation goes like this. At standup someone says “we’ll just stand up Grafana with postgres_exporter and Prometheus, we already have the stack.” Nobody objects. Six months later you have a homemade DBM, you missed two plan flips that caused outages, and the platform engineer who built it is now its full-time owner.
This isn’t an attack on Grafana — Grafana is a great visualization tool and we run it ourselves. It’s a measurement of what “just” actually means when the goal is DBM-grade tooling.
On this page
What Grafana + postgres_exporter actually gives you
- Cluster-level counters: connections, transactions, buffer hit ratio, replication lag, dead tuple ratio, WAL position. Genuinely useful.
- A dashboard editor that is the best in the industry. Free to use, easy to share, plays with multiple data sources.
- Prometheus alerting on threshold rules. Adequate for “disk full” class of alerts.
- A starter dashboard you can ship in an afternoon. This is the honeypot — it’s real, and it’s the cheapest part of the journey.
The 11 things you have to build
This is what postgres_exporter + Prometheus + Grafana does not ship:
| Capability | Build effort | Why it matters |
|---|---|---|
| 1. Per-query top-N with percentiles | 1–2 engineer-weeks | pg_stat_statements polling, digest normalization, p50/p95/p99 computation, dashboard. |
| 2. Plan capture | 1–2 weeks | auto_explain configuration, log scraping or pg_stat_plans extension, storage backend. |
| 3. Plan-flip / regression detection | 1 week | structure-hash diff per signature over a window, alert when hash flips. |
| 4. Lock-chain / blocking-session graph | 1 week | pg_blocking_pids() traversal, graph layout, surfacing in dashboard. |
| 5. Anomaly detection per metric | 2–3 weeks | Pick algorithm (BOCPD / Prophet / STL), per-metric training, persistent state. |
| 6. Forecast bands (30/90/365 day) | 1–2 weeks | Seasonal decomposition, percentile bands, breach detection. |
| 7. Multi-database fan-out | 2 weeks per engine | MySQL Performance Schema, MongoDB profiler, Redis slowlog have different shapes. |
| 8. Alerting beyond thresholds | 1 week | Multi-variate, change-point, derived signals (cache hit + lock wait + qps). |
| 9. AI rewrite / index suggestion | 2–4 weeks | LLM integration, prompt engineering, plan-aware context assembly. |
| 10. Retention strategy | 1 week ongoing | Prometheus is not for 15-month retention. You’ll add Thanos or VictoriaMetrics. |
| 11. Maintenance + upgrade cycles | 0.25–0.5 FTE forever | Exporter upgrades, breaking schema changes, dashboard drift, alert tuning. |
Build cost in engineer-weeks
Total greenfield build for a Postgres-only, single-engine DBM stack: 16–28 engineer-weeks. Multi-database (add MySQL + MongoDB) doubles it. Steady-state maintenance: 0.25–0.5 FTE per quarter.
| At a $180k loaded engineer cost | Annual |
|---|---|
| Build (one-time, amortized over 2 years) | $36k–$63k / yr |
| Maintenance (steady-state) | $45k–$90k / yr |
| Total (DIY, Postgres-only) | $81k–$153k / yr |
| Total (DIY, +MySQL +MongoDB) | $150k–$280k / yr |
Side-by-side with Obsfly
| Capability | Grafana + Prometheus + exporters | Obsfly |
|---|---|---|
| Top-N queries with p99 | Build (1–2 wk) | Out of box |
| Plan history / diff | Build (2–3 wk) | Out of box |
| Lock chains | Build (1 wk) | Out of box |
| Multi-variate anomaly | Build (2–3 wk) | Out of box |
| Forecast bands | Build (1–2 wk) | Out of box |
| Multi-DB fan-out | Build per engine | 9 engines covered |
| AI query rewrite | Build (2–4 wk) or skip | Built in (Claude) |
| BYOC / Sovereign | Self-host the OSS stack | First-class |
| Time to first useful insight | Weekend → quarter | 5 minutes |
| Cost (50-DB fleet, 2 yr) | $240k–$420k engineer time | $1,950/mo × 24 = $46,800 |
When DIY actually wins
- Single Postgres database, no plan history needed, no anomaly detection needed. The afternoon dashboard is genuinely enough.
- Strong infra team with slack capacity and an existing observability org that builds these things as a competence. The maintenance cost is internalized.
- Specific compliance requirements that no commercial vendor can meet — though check if BYOC or Sovereign solves it first, since we built both for exactly this case.
When DIY is a trap
- More than two database engines in the fleet.
- Series A–C team where engineering time is the constraint, not budget.
- You’ve already had one production incident that better tooling would have caught (plan flip, hot lock chain, slow regression).
- The platform engineer who’d build it is also your only Kubernetes person, only CI/CD person, and only secrets-management person. They will not maintain four things at once well.
FAQ
Can I keep Grafana and use Obsfly as the data source?+
What about Grafana Cloud Database Observability?+
Doesn't postgres_exporter already capture pg_stat_statements?+
What's the cheapest way to validate the build estimate?+
Keep reading
Postgres
pg_stat_statements: the complete 2026 guide
Every column, every gotcha, the queries you should run today, and why pg_stat_statements is still the most useful 80 lines of telemetry in Postgres — even with five new alternatives in 2026.
AI
Anomaly detection on database metrics: why thresholds fail and what works
A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.
AI
Database capacity forecasting that actually catches breaches 30 days out
Linear regression isn't enough. ARIMA is overkill. Prophet works but you need to know which exogenous variables to feed it. A practical recipe for capacity forecasts that page you 30 days before the cliff.