Prévisions de capacité DB qui paginent 30 jours à l'avance

La régression linéaire ne suffit pas. ARIMA est exagéré. Prophet marche si vous savez quelles variables exogènes lui donner. Recette pratique pour des prévisions à 30 jours.

Published 2026-03-28·11 min read

Linear extrapolation pages you 12 hours before the disk fills. Useful for the on-call, useless for provisioning — you can’t resize an EBS volume on Saturday in 12 hours and have it land cleanly. The actually-useful forecast is one that pages 30 days out, with the math right enough to trust. Here’s what works.

On this page

Why linear extrapolation fails
The model stack that works
Exogenous variables that move the needle
Evaluating forecasts honestly
Alerting on forecast breaches
FAQ

Why linear extrapolation fails

Seasonality. Disk grows faster on weekdays. A linear fit on 7 days extrapolates the wrong slope.
Step changes. Last week’s deploy doubled write rate. Linear fit smooths it; the real curve has a kink.
Non-stationarity. Growth rate itself changes over time (acquisition spike, seasonal product launch).

The model stack that works

Three models, ensemble:

Prophet for the seasonality + holiday + changepoint backbone. Fast to fit per-series, robust on noisy data.
ETSformer / N-BEATS for high-cardinality scenarios where you fit thousands of series. Transformers handle long histories better than Prophet.
Linear baseline as a safety floor. If the ensemble disagrees with linear by > 3×, alert the operator that something needs review.

# Prophet recipe — minimum viable forecast
from prophet import Prophet

m = Prophet(
    daily_seasonality=True,
    weekly_seasonality=True,
    yearly_seasonality='auto',
    changepoint_prior_scale=0.05,   # tune up for spiky workloads
    seasonality_prior_scale=10,
)
m.add_country_holidays(country_name='US')
m.add_regressor('deploy_count')   # exogenous
m.fit(history_df)

future = m.make_future_dataframe(periods=30, freq='D')
future['deploy_count'] = predicted_deploys(future)
fcst = m.predict(future)

# Use yhat_lower / yhat_upper bounds, not yhat alone — the band is what alerts.

Exogenous variables that move the needle

Day-of-week / business-day flag — the single biggest accuracy gain.
Holidays — country-specific. Black Friday, Lunar New Year, regional holidays for B2C.
Deploy events — regimes change after deploys. Inject as event markers; Prophet handles them as “holidays”.
Marketing campaign flags — if the team can post events to the metrics pipeline, you get free correlation.

Evaluating forecasts honestly

Backtest with sliding window. Fit on weeks 1-4, predict week 5, score. Slide.
Score with MAPE for level metrics (disk, connections), SMAPE for noisier metrics (QPS).
Track calibration of bands: if your 90% interval contains the actual value 70% of the time, your bands are too narrow.

Alerting on forecast breaches

The alert isn’t “disk is full.” It’s “disk will be full in N days.”

# Pseudo-rule
if forecast.crosses_threshold(metric='disk_used',
                              threshold=0.85 * disk_total,
                              within_days=30):
    page(severity='warning',
         message=f"Disk on {host} forecast to breach 85% in {days} days")

Useful: tier severity by lead time. 30-day = warning (planning), 7-day = high (provision now), 24h = critical (page on-call).

FAQ

Why not just use ARIMA?+

ARIMA needs careful (p, d, q) tuning per series; doesn't handle holidays out of the box. Prophet is opinionated and works well at default settings on DB-shaped data.

How often should I refit?+

Daily is the typical cadence. More often is overhead with diminishing returns; less often misses recent regime shifts.

Can the operator override the forecast?+

Yes — annotate the metric with a 'planned event' marker (deploy, launch, holiday). Prophet treats it as a regressor on the next refit.

Keep reading

Anomaly detection on database metrics: why thresholds fail and what works

A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.

Postgres

Why your Postgres p99 latency lies — and what to track instead

p99 over 1m windows is the most-displayed and most-misleading number on every DBM dashboard. Here's the histogram math, the seasonality math, and a saner default.

← All posts