数据库指标异常检测:为什么阈值失效,什么真正有效

预测带、变点检测、多变量异常,以及让 'p99 > 200ms' 默认成为错误告警的季节性数学。

Published 2026-04-04·12 min read

Threshold alerts are a 2010 idea that observability teams keep paying for in 2026. p99 over 200ms, connections over 80% of max, error rate over 1%. They generate noise when nothing is wrong, and they’re silent when something is. This post is about why thresholds fail on database metrics specifically, and what to use instead.

On this page

The threshold problem
Why naive forecasting also fails
What actually works
Forecast bands vs threshold lines
Multi-variate anomalies
Seasonality is most of the data
The feedback loop nobody builds
FAQ

The threshold problem

Pick any DB metric: query latency, IOPS, connection count, replication lag. Plot it for a week. You see daily and weekly seasonality, regime changes when releases ship, and tail spikes that aren’t actually problems. A threshold draws one horizontal line through all of that.

Two failure modes:

False positives. Latency at 14:00 daily is naturally 2× the 03:00 baseline. Your 200ms threshold pages every weekday at lunch.
False negatives. A regression doubles latency from 30ms to 60ms — both still under 200ms. The threshold says everything is fine.

Why naive forecasting also fails

The first thing observability teams reach for is “a moving average ± 3σ.” That fails too. Three reasons:

Seasonality. A 1-hour moving average can’t tell you that 14:00 is normally high. It just averages and gets confused.
Regime changes. Push a release that doubles QPS — the moving average catches up in an hour and stops alerting on the new regime, even if the new regime is broken.
Long-tail metrics. Latency has a power-law tail; σ from a Gaussian model doesn’t fit. You get spurious 5σ alerts when reality was just ordinary heavy-tail behavior.

What actually works

Three modeling families, used together:

1. Forecasts with seasonality

Prophet (Facebook), STL decomposition, ETS, or ETSformer. They fit a model that’s aware of daily / weekly / business-day seasonality. The output isn’t a number — it’s an interval [lower, upper] at a given quantile. Reality outside the interval is the anomaly.

2. Change-point detection (BOCPD)

Bayesian Online Change-Point Detection finds the moments your time series shifted regimes. Useful because anomalies often start at a change point. CPD answers “when did this start being weird?” rather than “is it weird right now?”

3. Robust EWMA fallback

For low-volume metrics where you don’t have weeks of training data, an EWMA + MAD (median absolute deviation) is robust to outliers and works from day one. Use it as the floor while the seasonal model is still warming up.

Forecast bands vs threshold lines

A threshold is a flat line. A forecast band is a tube around the predicted future, computed at every timestamp from the model’s posterior. When reality leaves the tube, you alert. When the tube widens (the model is uncertain), you alert less aggressively.

# pseudo-code, the math you actually want
for t in horizon:
    mu, sigma = forecast_model.predict(t)
    lower = mu - z_alpha * sigma
    upper = mu + z_alpha * sigma
    if observed[t] < lower or observed[t] > upper:
        emit_anomaly(t, severity = z_score(observed[t], mu, sigma))

This trivially handles the “14:00 is naturally high” case — the band rises with it.

Multi-variate anomalies

Some incidents are visible only in the joint distribution. CPU is fine. IOPS is fine. The ratio of the two — IOPS / CPU — is broken because the workload mix changed.

Modeling per-metric and OR-ing the alerts misses these. You need a model that ingests multiple metrics together — even a simple linear regression of one on the others is enough to flag joint shifts.

Seasonality is most of the data

On any DB metric you care about, daily and weekly cycles explain > 80% of variance. A model that ignores them is mostly modeling noise. The math is well-understood — Fourier basis, STL, dummy variables for weekend/holidays — and there’s no excuse to ship without it in 2026.

The feedback loop nobody builds

Every anomaly alert should have a thumbs-up / thumbs-down button. Operators know whether a given alert was real or noise. Feed that back into a per-tenant fine-tuning step and your model gets sharper every week.

It’s the difference between a generic anomaly model and a model that’s right about your fleet. We do this in Obsfly Anomaly Detection: every alert ships with a feedback control, and the per-tenant model retrains nightly.

FAQ

How long until a forecast model is useful?+

A useful baseline at ~3 days of data, an excellent one at ~14 days for daily seasonality and ~6 weeks for weekly. Until then a robust EWMA + MAD is the floor.

What about deploys — they look like anomalies but aren't.+

Inject deploy events as exogenous variables (Prophet's `holidays` parameter is the easiest way). The model expects a step on deploy and stops alerting on it.

Can I bring my own anomaly model?+

Yes — Obsfly Business+ exposes a webhook for the detector. Post your own anomaly verdicts and they show up in the same UI alongside ours.

What about pure ML models — do you use them?+

We use TimeGPT-style transformer models for high-cardinality forecasting where Prophet's per-series fit is too slow. For low-volume per-tenant series, classical models still beat transformers in our benchmarks.

Keep reading

Postgres

Why your Postgres p99 latency lies — and what to track instead

p99 over 1m windows is the most-displayed and most-misleading number on every DBM dashboard. Here's the histogram math, the seasonality math, and a saner default.