AI
AI for database query optimization: what's real in 2026 (and what's not)
Two years of shipping LLM-grounded query analysis to production databases. What AI is genuinely good at, what it's bad at, why grounding beats model size, and how BYO LLM works in regulated deployments.
Every observability vendor has “AI” on the slide deck in 2026. Most of it is marketing. Some of it is genuinely useful. This post is about telling the difference, and about what we’ve learned shipping LLM-grounded query analysis to production databases.
On this page
What AI is good at, in DB observability
After two years of shipping AI features against real Postgres / MySQL workloads, the useful applications cluster into three categories:
- Plan narration. Given an EXPLAIN ANALYZE plan and a schema, an LLM can explain why a particular plan is slow, in plain language a senior engineer would write. The signal-to-noise on this is high because the plan is structured input — there’s no hallucination space.
- Index recommendations grounded in the workload. Given the actual queries from pg_stat_statements and current indexes, an LLM can propose new indexes (and warn about ones it’d shadow). With grounding, the suggestions are usually right; without it, they’re generic textbook answers.
- Query rewrites. Specifically: pulling out scalar subqueries, replacing NOT IN with NOT EXISTS, suggesting SELECT column lists instead of SELECT *. These rewrites are mechanical, the LLM gets them right ~95% of the time, and the rare wrong ones are obvious in a diff.
What it’s bad at
The list is shorter but more important:
- Predicting whether a recommended index will help. Predicting query performance from a plan + index proposal requires the optimizer’s cost model. LLMs don’t have it. They’ll happily say “this will reduce execution time by 40%” when the actual answer is “maybe”.
- Anomaly detection. An LLM cannot beat a seasonal forecast plus change-point detection on time-series metrics. We tried; it didn’t work; we shipped Prophet + BOCPD instead.
- Prioritization in incidents. “Which of these 14 alerts should I look at first?” is a question about your team, your SLOs, and your business. The LLM doesn’t know any of those.
Why grounding matters more than the model
The biggest determinant of output quality is not which model you use — it’s what you feed it. A small model with the right context (schema, top queries, current plan, recent wait events) outperforms a frontier model with no context. We see this consistently in our own benchmarks.
The minimum useful context for query analysis:
- The query itself (normalized and original).
- The current EXPLAIN plan with rows, costs, and actual times.
- The schema of every table the query touches: columns, indexes, row counts.
- Recent runtime stats from pg_stat_statements: calls, mean, p99, plan flips.
- Cluster context: Postgres version, settings that affect the planner (work_mem, random_page_cost).
Skip any of those and the suggestions degrade. Send all of them and the suggestions are usually actionable on the first read.
BYO LLM in regulated deployments
For regulated customers — fintech, healthcare, gov — sending raw query text to a public LLM is non-negotiable: it would mean account numbers and PII flowing to a third party. Two patterns work:
- Customer-hosted LLM endpoint. They run a model in their VPC (Bedrock, Vertex, a self-hosted Llama). The DBM tool calls their endpoint with the query text, gets back a response, and never persists the request. This is the dominant pattern in BYOC deployments.
- Aggressive on-prem normalization. Strip every literal value before sending. WHERE email = ‘alice@example.com’ becomes WHERE email = $1. Combined with column-name redaction, this works for a long tail of customers who can’t bring their own model but still need analysis.
FAQ
Which model do you use under the hood?+
How do you keep AI suggestions from being wrong in obvious ways?+
Will AI replace DBAs?+
What about agentic systems that 'auto-fix' production queries?+
Keep reading
Postgres
EXPLAIN ANALYZE for Postgres: read every line in 2026
The vocabulary that turns a query plan from a wall of text into a story. Costs, rows, loops, buffers, timing — what each means in 2026 (Pg 16+), and the four anti-patterns to spot in five seconds.
AI
Anomaly detection on database metrics: why thresholds fail and what works
A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.
BYOC
Why regulated SaaS can't use Datadog DBM — and the BYOC fix
Walking through the architecture of a BYOC observability deployment: where data lives, what crosses the boundary, and how to satisfy SOC2 / HIPAA / GDPR without giving up the UX.