Back to catalog
Predictions

Forecasting workload through summer 2026

A monthly forecast of modification requests, with a walk-forward backtest and a track record of how predictions have changed as new data lands. The migration scenario is an interactive overlay — adjust it inline below.

catalog adebc82cc9c1commit 614eb0e806e4forecast generated just nowupstream synced just nowlog: 3 runs
Loading predictions view…

How this was built — for the technical reader

Model type
linear regression with monthly seasonality

Ordinary least squares linear regression with a linear time trend (months since first observation) and one-hot dummies for month-of-year. January is dropped as the baseline reference category.

Library
scikit-learn 1.8.0

Computed by scripts/predict.py and refreshed every nightly sync. pandas handles monthly aggregation; scikit-learn fits the OLS regression. The script also runs a walk-forward backtest and appends to data/raw/_predictions_log.jsonl so historical predictions are versioned alongside the catalog.

Features
2 columns

t — months since the first observation (linear trend) · m_2 … m_12 — one-hot dummies for Feb–Dec (Jan = baseline)

Training data
86 months

Mar 1, 2019 → Apr 1, 2026. Partial current month (2026-05) dropped to avoid biasing the model with incomplete counts.

Fitted coefficients

Read these as “months relative to January, adjusted by the time trend.” A positive month dummy means that month typically sees more modifications than January.

intercept+2.996
t+0.053
m_2-1.339
m_3-2.092
m_4-1.395
m_5-3.720
m_6-0.344
m_7+5.603
m_8-2.164
m_9+3.497
m_10-2.984
m_11-0.037
m_12-1.518
Validation
Backtest MAE
9.74 mods/mo
walk-forward, last 12 months
Holdout MAE
9.96 mods/mo
tail-holdout, single split
Residual σ
8.25
powers the 95% CI band

For each of the last 12 months, the model was retrained on every month strictly before it and asked to predict that month. The result is an honest estimate of one-step-ahead accuracy under the current data regime. Holdout: trained on all but the last 12 months; predictions on the held-out tail had MAE=10.0 and R²=-0.87. The walk-forward backtest below is the more useful accuracy signal — it asks 'if we had been running this nightly, what would we have predicted each month vs what actually happened?'

What this assumes
  • Seasonality is stable. The model learns one month-of-year coefficient and assumes the pattern repeats. The walk-forward backtest above shows where this assumption broke down — late-2025 actuals far exceeded the model, which is real signal about the regime change.
  • Trend is linear. If the catalog’s growth rate changes (it has been volatile), the model will lag the new regime by definition.
  • The scenario is judgement, not inference. The bump is applied multiplicatively to forecasts inside the window. The model does not learn it — it’s a planning overlay you can dial in from the slider above. Defaults travel with the JSON so a non-interactive viewer still sees a useful range.
  • CIs are residual-based, not Bayesian. The 95% band uses ±1.96·σ from training residuals. Read it as plausible monthly variation, not a coverage guarantee.
  • Predictions are versioned. Every nightly run appends a record to data/raw/_predictions_log.jsonl keyed on the catalog SHA. The drift panel above reads this log; git history is the deeper audit trail.