AI
Prévisions de capacité DB qui paginent 30 jours à l'avance
La régression linéaire ne suffit pas. ARIMA est exagéré. Prophet marche si vous savez quelles variables exogènes lui donner. Recette pratique pour des prévisions à 30 jours.
Linear extrapolation pages you 12 hours before the disk fills. Useful for the on-call, useless for provisioning — you can’t resize an EBS volume on Saturday in 12 hours and have it land cleanly. The actually-useful forecast is one that pages 30 days out, with the math right enough to trust. Here’s what works.
On this page
Why linear extrapolation fails
- Seasonality. Disk grows faster on weekdays. A linear fit on 7 days extrapolates the wrong slope.
- Step changes. Last week’s deploy doubled write rate. Linear fit smooths it; the real curve has a kink.
- Non-stationarity. Growth rate itself changes over time (acquisition spike, seasonal product launch).
The model stack that works
Three models, ensemble:
- Prophet for the seasonality + holiday + changepoint backbone. Fast to fit per-series, robust on noisy data.
- ETSformer / N-BEATS for high-cardinality scenarios where you fit thousands of series. Transformers handle long histories better than Prophet.
- Linear baseline as a safety floor. If the ensemble disagrees with linear by > 3×, alert the operator that something needs review.
# Prophet recipe — minimum viable forecast
from prophet import Prophet
m = Prophet(
daily_seasonality=True,
weekly_seasonality=True,
yearly_seasonality='auto',
changepoint_prior_scale=0.05, # tune up for spiky workloads
seasonality_prior_scale=10,
)
m.add_country_holidays(country_name='US')
m.add_regressor('deploy_count') # exogenous
m.fit(history_df)
future = m.make_future_dataframe(periods=30, freq='D')
future['deploy_count'] = predicted_deploys(future)
fcst = m.predict(future)
# Use yhat_lower / yhat_upper bounds, not yhat alone — the band is what alerts.Exogenous variables that move the needle
- Day-of-week / business-day flag — the single biggest accuracy gain.
- Holidays — country-specific. Black Friday, Lunar New Year, regional holidays for B2C.
- Deploy events — regimes change after deploys. Inject as event markers; Prophet handles them as “holidays”.
- Marketing campaign flags — if the team can post events to the metrics pipeline, you get free correlation.
Evaluating forecasts honestly
- Backtest with sliding window. Fit on weeks 1-4, predict week 5, score. Slide.
- Score with MAPE for level metrics (disk, connections), SMAPE for noisier metrics (QPS).
- Track calibration of bands: if your 90% interval contains the actual value 70% of the time, your bands are too narrow.
Alerting on forecast breaches
The alert isn’t “disk is full.” It’s “disk will be full in N days.”
# Pseudo-rule
if forecast.crosses_threshold(metric='disk_used',
threshold=0.85 * disk_total,
within_days=30):
page(severity='warning',
message=f"Disk on {host} forecast to breach 85% in {days} days")Useful: tier severity by lead time. 30-day = warning (planning), 7-day = high (provision now), 24h = critical (page on-call).
FAQ
Why not just use ARIMA?+
How often should I refit?+
Can the operator override the forecast?+
Keep reading
AI
Anomaly detection on database metrics: why thresholds fail and what works
A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.
Postgres
Why your Postgres p99 latency lies — and what to track instead
p99 over 1m windows is the most-displayed and most-misleading number on every DBM dashboard. Here's the histogram math, the seasonality math, and a saner default.