WC 2026 · Forecasting Oxford Football Forecasting

Shrinkage Shrinkage · partial pooling for newcomers

Bayesian hierarchical — newcomers borrow strength

The same Poisson goal core, but each team’s attack and defence is latent, with a prior centred on what its history and squad value imply. A short-history qualifier is pulled toward that informed prior with calibrated uncertainty instead of an overconfident guess.

0.1927

OOS RPS · expanding

the headline skill (realism protocol)

0.1905

OOS RPS · LOTO

optimistic ceiling (leaks future folds)

#5

Leaderboard rank

of 7 · CI 0.1805–0.2050

Same Poisson goal core, but each team's attack/defence is a latent quantity with a prior centred on what its Elo (history) and squad value imply. A short-history newcomer with a near-flat likelihood is pulled toward that covariate-informed prior with calibrated uncertainty (James-Stein shrinkage) instead of an overconfident guess. It also emits the decoupling g as a posterior quantity with a credible interval.

The hierarchical prior on latent strength, over the Poisson match likelihood

01The match likelihood

The emission is the Dixon–Coles kernel — two log-linear Poisson rates with the low-score correction — fitted at match resolution on the 49,445-match international record. What changes is where the strengths come from.

The Poisson match likelihood under the latent strengths

02The hierarchical prior

History is the prior mean; squad covariates shift strength off that mean; context must earn its way in. A team’s attack is drawn from a Normal centred on what its Elo and squad imply, with a learned pooling scale — the structural reason newcomers are handled by construction.

Latent attack and defence under the covariate-informed prior

03The stated priors

Every hyper-prior is declared. The history loading κ is a ridge — history is the prior mean and expected to matter. The context loading δ is a tight ridge: off unless the data insist. The pooling scales are half-Normal, and the dependence parameter ρ matches the Dixon–Coles clip.

The declared hyper-priors of the hierarchy

04The horseshoe on squad covariates

The prior expectation is that only a handful of the squad covariates genuinely matter. The regularised horseshoe encodes exactly that: local scales let a few loadings escape to full size while the rest are pinned near zero, and the slab bounds the survivors. The global scale is set small (τ₀ = 0.05) to match a prior guess of three to five active features.

The regularised horseshoe prior on the squad loadings β

05Coverage-aware measurement error

Club-form coverage is itself a strength signal — a naive model reads “little data” as “weak team”, which is circular. The fix is a measurement-error layer: observed squad form is the true signal plus noise whose variance grows as coverage falls, so thinly-observed teams shrink automatically toward their value-implied prior.

Squad form enters with coverage-scaled error

06What shrinkage does

In the conjugate Normal approximation the posterior is a precision-weighted compromise between a team’s own record and its prior mean. Data-rich powerhouses keep their maximum-likelihood estimate — the model nests Dixon–Coles — while a short-history qualifier, whose likelihood is nearly flat, is pulled to what its Elo and squad value imply. That is the James–Stein argument: shrinkage strictly dominates the noisy per-team estimate in total risk.

Partial pooling — the posterior as a precision-weighted average

07The decoupling residual

History and squad value correlate at 0.82, so a structural “history X%, squad Y%” split is not identified. The decoupling g is therefore defined as a projection residual — squad strength minus what history predicts — which stays well-conditioned even when the individual loadings are not, and ships with a posterior standard deviation.

g — squad quality above or below what history implies

Symbol key

team i’s latent attack and defence — the quantities with a prior
the global attack and defence intercepts
history loading × the standardized history index (Elo + national-team form)
squad covariates (value, club form, age structure) under the horseshoe prior
context covariates (travel, heat, fitness) under a tight ridge
the pooling scales — how far teams may stray from the prior (learned)
horseshoe local and global shrinkage scales — a few covariates escape, the rest pin to zero
the horseshoe slab variance — bounds the covariates that do escape
measurement error on observed club form, inflated where data coverage is thin
the pooling weight — how much team i’s posterior trusts its own matches vs the prior
the decoupling residual — squad strength above or below what history predicts
  • 49,445 international results (the match likelihood)
  • History index h (Elo + national-team form, one combined component)
  • Squad covariates s (value, club form, age) under a horseshoe prior
  • Coverage-scaled measurement error on squad form

Fig. M·Bayesian Conceptual schematic

Bayesian hierarchical — wired end to end

prior meana₀ + κ·h̃ + βᵀs̃ + δᵀc̃history · squad · context likelihoodY ~ Poisson(λ)49,445 internationals posterior attᵢ , defᵢ+ decoupling g (CI) James-Stein shrinkage prior MLE power newcomer → prior
Source · Oxford Football Forecasting model · structural diagram, not a data plot

Fig. V11 Lower is better · floor = Elo-only · ceiling = de-vigged market

OOS RPS — expanding (headline) and LOTO (optimistic)

On the headline expanding window this model scores 0.1927 — −0.0010 below the Elo floor (0.1938) and +0.0022 versus the market ceiling (0.1905).

Expanding 0.1927
LOTO 0.1905

Bar fills to the model’s RPS on the floor–ceiling axis; the whisker on the expanding bar is the conservative 95% CI (0.1805–0.2050). Lower (left) is better.

It clears the Elo floor; the gap to the market is small and — at n = 3 — inside the bootstrap interval.

Source · Oxford Football Forecasting model · Bookmaker consensus (de-vigged closing odds) · 152 matches · 3 tournaments

A World Cup with 48 teams includes nations the international record has barely seen. A per-team maximum-likelihood estimate of their attack and defence is unbiased but wildly noisy — a near-flat likelihood over a handful of matches. The hierarchical prior fixes this the way James–Stein shrinkage does: it pulls each team’s strength toward what its Elo and squad value imply (the prior mean), by an amount the data learns. On data-rich powerhouses the posterior all but equals the MLE, so the model nests Dixon-Coles and cannot do meaningfully worse; on newcomers the shrunk estimate strictly dominates the noisy one in total risk.

The same machinery emits the project’s scientific target, the decoupling g: how far a team’s squad quality sits above or below what its history predicts. Because history and squad value are badly collinear (the records correlate strongly), g is reported as a well-conditioned projection residual with a credible interval — not a fragile “history X%, squad Y%” split, which the data cannot identify.

+0.031

Decoupling slope b — squad-above-record g on stage reached

tournament-clustered SE 0.173 · n = 118 team-tournaments

includes 0

95% CI on the slope

[−0.31, +0.55] — not significant at n = 3

0.23

g vs pre-baked gap (sanity corr)

the model-based g aligns with the engineered residual

Reading g

The slope is positive — teams whose squad value runs ahead of their history do tend to over-perform their Elo-implied stage — but the credible interval [−0.31, +0.55] includes zero. With only 5 tournaments of out-of-sample history, the direction is suggestive and the effect is not statistically resolved. It is surfaced as a measured posterior quantity with its uncertainty, never as a confident structural decomposition.

Strengths

  • Handles 48-team newcomers by construction (partial pooling)
  • Quantifies the decoupling g with a credible interval
  • Pooling strength is learned, not hand-tuned

Limits

  • History h and squad value s are weakly identified (corr 0.82)
  • Posterior corr(kappa,beta) reported, not a structural % split
  • Heavier to fit (NUTS / MAP fallback)