SQL Server
SQL Server Query Store: the field guide most teams skip
Query Store is the single biggest reason a SQL Server upgrade past 2016 was worth the weekend. The settings that matter, the DMVs you actually use, and how to catch a plan regression in two queries.
Query Store is the single biggest reason a SQL Server upgrade past 2016 is worth the weekend. It records every plan, every execution, every regression — and it’s built in. Most teams either don’t turn it on, or turn it on and never look at it. Here’s the field guide.
On this page
What Query Store actually captures
Think of Query Store as a black box flight recorder for the optimizer. For every query signature it sees, it stores: each plan it’s ever compiled, runtime stats per plan (CPU, IO, duration, memory grant, dop), and wait stats. The killer feature is that this survives restarts — when a plan flips at 4am because the optimizer recompiled with a different parameter, Query Store still has both plans and the runtime stats for each.
Turn it on (the right settings)
Defaults are conservative. Two changes pay for themselves immediately:
ALTER DATABASE [OrdersDB] SET QUERY_STORE = ON
(
OPERATION_MODE = READ_WRITE,
QUERY_CAPTURE_MODE = AUTO, -- ignore one-shot ad-hocs
MAX_STORAGE_SIZE_MB = 2048, -- default 100 MB is far too small
INTERVAL_LENGTH_MINUTES = 15, -- the runtime stats bucket
STALE_QUERY_THRESHOLD_DAYS = 30,
SIZE_BASED_CLEANUP_MODE = AUTO
);QUERY_CAPTURE_MODE = AUTOfilters out ad-hoc single-execution queries that fill Query Store with noise.ALLcaptures everything; only use it when actively debugging.MAX_STORAGE_SIZE_MB = 2048at minimum. The default 100 MB will fill in hours on a busy database, after which Query Store flips to read-only and stops capturing.INTERVAL_LENGTH_MINUTES = 15gives you tight resolution on regressions without exploding storage. 60 is the default; 5 is overkill for most shops.
Catching plan regressions
The classic SQL Server pain pattern: a query that ran in 80 ms yesterday is taking 8 s today. Same SQL, same data shape — different plan. Query Store makes this trivial to diagnose:
-- Find queries with multiple plans where the new plan is dramatically slower
SELECT TOP 20
q.query_id,
qt.query_sql_text,
p1.plan_id AS old_plan,
p2.plan_id AS new_plan,
rs1.avg_duration / 1000.0 AS old_avg_ms,
rs2.avg_duration / 1000.0 AS new_avg_ms,
(rs2.avg_duration - rs1.avg_duration) * 1.0 / rs1.avg_duration AS regression_ratio
FROM sys.query_store_query q
INNER JOIN sys.query_store_query_text qt ON qt.query_text_id = q.query_text_id
INNER JOIN sys.query_store_plan p1 ON p1.query_id = q.query_id
INNER JOIN sys.query_store_plan p2 ON p2.query_id = q.query_id AND p2.plan_id <> p1.plan_id
INNER JOIN sys.query_store_runtime_stats rs1 ON rs1.plan_id = p1.plan_id
INNER JOIN sys.query_store_runtime_stats rs2 ON rs2.plan_id = p2.plan_id
WHERE rs2.last_execution_time > DATEADD(HOUR, -2, GETUTCDATE())
AND rs1.last_execution_time < DATEADD(HOUR, -2, GETUTCDATE())
AND rs2.avg_duration > rs1.avg_duration * 3
ORDER BY regression_ratio DESC;Forcing and unforcing plans
When you find a regression, you have two options: fix the underlying cause (stale statistics, parameter sniffing, schema change), or pin the good plan. Pinning is a one-statement operation:
-- Force the old (good) plan EXEC sp_query_store_force_plan @query_id = 4218, @plan_id = 9991; -- Unforce later when the underlying issue is resolved EXEC sp_query_store_unforce_plan @query_id = 4218, @plan_id = 9991;
DMVs you should know
The four-table mental model
sys.query_store_query_text— the actual SQL text.sys.query_store_query— the query signature plus context (database, schema, query_hash).sys.query_store_plan— every plan ever compiled for that signature.sys.query_store_runtime_stats— execution stats per plan, bucketed by interval.
FAQ
Does Query Store work on Azure SQL?+
How much storage will Query Store actually use?+
Should we capture wait stats too?+
What about Extended Events?+
Keep reading
Postgres
EXPLAIN ANALYZE for Postgres: read every line in 2026
The vocabulary that turns a query plan from a wall of text into a story. Costs, rows, loops, buffers, timing — what each means in 2026 (Pg 16+), and the four anti-patterns to spot in five seconds.
Oracle
Oracle ASH and AWR: a field guide for the rest of us
ASH and AWR are two of the best performance views any database has shipped — but they sit behind the Diagnostics Pack license. This is how to get 90% of the value, with or without it.
AI
Anomaly detection on database metrics: why thresholds fail and what works
A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.