Predictions

Forecasting workload through summer 2026

A monthly forecast of modification requests, with a walk-forward backtest and a track record of how predictions have changed as new data lands. The migration scenario is an interactive overlay — adjust it inline below.

catalog 809ecc0a877ecommit 407e588b6a5bforecast generated just nowupstream synced 24h agolog: 45 runs

Loading predictions view…

How this was built — for the technical reader

Model type

linear regression with monthly seasonality

Ordinary least squares linear regression with a linear time trend (months since first observation) and one-hot dummies for month-of-year. January is dropped as the baseline reference category.

Library

scikit-learn 1.9.0

Computed by scripts/predict.py and refreshed every nightly sync. pandas handles monthly aggregation; scikit-learn fits the OLS regression. The script also runs a walk-forward backtest and appends to data/raw/_predictions_log.jsonl so historical predictions are versioned alongside the catalog.

Features

2 columns

t — months since the first observation (linear trend) · m_2 … m_12 — one-hot dummies for Feb–Dec (Jan = baseline)

Training data

88 months

Mar 1, 2019 → Jun 1, 2026. Partial current month (2026-07) dropped to avoid biasing the model with incomplete counts.

Fitted coefficients

Read these as “months relative to January, adjusted by the time trend.” A positive month dummy means that month typically sees more modifications than January.

intercept+2.384

t+0.057

m_2-1.200

m_3-1.772

m_4-0.954

m_5-3.261

m_6+1.432

m_7+5.627

m_8-1.716

m_9+3.942

m_10-2.544

m_11+0.257

m_12-1.372

Validation

Backtest MAE

9.78 mods/mo

walk-forward, last 12 months

Holdout MAE

10.46 mods/mo

tail-holdout, single split

Residual σ

7.91

powers the 95% CI band

For each of the last 12 months, the model was retrained on every month strictly before it and asked to predict that month. The result is an honest estimate of one-step-ahead accuracy under the current data regime. Holdout: trained on all but the last 12 months; predictions on the held-out tail had MAE=10.5 and R²=-1.23. The walk-forward backtest below is the more useful accuracy signal — it asks 'if we had been running this nightly, what would we have predicted each month vs what actually happened?'

What this assumes

Seasonality is stable. The model learns one month-of-year coefficient and assumes the pattern repeats. The walk-forward backtest above shows where this assumption broke down — late-2025 actuals far exceeded the model, which is real signal about the regime change.
Trend is linear. If the catalog’s growth rate changes (it has been volatile), the model will lag the new regime by definition.
The scenario is judgement, not inference. The bump is applied multiplicatively to forecasts inside the window. The model does not learn it — it’s a planning overlay you can dial in from the slider above. Defaults travel with the JSON so a non-interactive viewer still sees a useful range.
CIs are residual-based, not Bayesian. The 95% band uses ±1.96·σ from training residuals. Read it as plausible monthly variation, not a coverage guarantee.
Predictions are versioned. Every nightly run appends a record to data/raw/_predictions_log.jsonl keyed on the catalog SHA. The drift panel above reads this log; git history is the deeper audit trail.