01The match likelihood
The emission is the Dixon–Coles kernel — two log-linear Poisson rates with the low-score correction — fitted at match resolution on the 49,445-match international record. What changes is where the strengths come from.
Shrinkage Shrinkage · partial pooling for newcomers
The same Poisson goal core, but each team’s attack and defence is latent, with a prior centred on what its history and squad value imply. A short-history qualifier is pulled toward that informed prior with calibrated uncertainty instead of an overconfident guess.
OOS RPS · expanding
the headline skill (realism protocol)
OOS RPS · LOTO
optimistic ceiling (leaks future folds)
Leaderboard rank
of 7 · CI 0.1805–0.2050
§ 01
In plain English, before any mathematics.
Same Poisson goal core, but each team's attack/defence is a latent quantity with a prior centred on what its Elo (history) and squad value imply. A short-history newcomer with a near-flat likelihood is pulled toward that covariate-informed prior with calibrated uncertainty (James-Stein shrinkage) instead of an overconfident guess. It also emits the decoupling g as a posterior quantity with a credible interval.
§ 02
The same Poisson goal kernel, with attack and defence promoted to latent quantities under a covariate-informed hierarchical prior. Every prior is stated; the amount of pooling is learned, not hand-tuned.
01The match likelihood
The emission is the Dixon–Coles kernel — two log-linear Poisson rates with the low-score correction — fitted at match resolution on the 49,445-match international record. What changes is where the strengths come from.
02The hierarchical prior
History is the prior mean; squad covariates shift strength off that mean; context must earn its way in. A team’s attack is drawn from a Normal centred on what its Elo and squad imply, with a learned pooling scale — the structural reason newcomers are handled by construction.
03The stated priors
Every hyper-prior is declared. The history loading κ is a ridge — history is the prior mean and expected to matter. The context loading δ is a tight ridge: off unless the data insist. The pooling scales are half-Normal, and the dependence parameter ρ matches the Dixon–Coles clip.
04The horseshoe on squad covariates
The prior expectation is that only a handful of the squad covariates genuinely matter. The regularised horseshoe encodes exactly that: local scales let a few loadings escape to full size while the rest are pinned near zero, and the slab bounds the survivors. The global scale is set small (τ₀ = 0.05) to match a prior guess of three to five active features.
05Coverage-aware measurement error
Club-form coverage is itself a strength signal — a naive model reads “little data” as “weak team”, which is circular. The fix is a measurement-error layer: observed squad form is the true signal plus noise whose variance grows as coverage falls, so thinly-observed teams shrink automatically toward their value-implied prior.
06What shrinkage does
In the conjugate Normal approximation the posterior is a precision-weighted compromise between a team’s own record and its prior mean. Data-rich powerhouses keep their maximum-likelihood estimate — the model nests Dixon–Coles — while a short-history qualifier, whose likelihood is nearly flat, is pulled to what its Elo and squad value imply. That is the James–Stein argument: shrinkage strictly dominates the noisy per-team estimate in total risk.
07The decoupling residual
History and squad value correlate at 0.82, so a structural “history X%, squad Y%” split is not identified. The decoupling g is therefore defined as a projection residual — squad strength minus what history predicts — which stays well-conditioned even when the individual loadings are not, and ships with a posterior standard deviation.
Symbol key
§ 03
The inputs this model reads — and only these.
§ 04
A schematic of the model wired end to end.
Fig. M·Bayesian Conceptual schematic
§ 05
Where this model lands between the Elo floor and the market ceiling, on both backtest protocols.
Fig. V11 Lower is better · floor = Elo-only · ceiling = de-vigged market
On the headline expanding window this model scores 0.1927 — −0.0010 below the Elo floor (0.1938) and +0.0022 versus the market ceiling (0.1905).
Bar fills to the model’s RPS on the floor–ceiling axis; the whisker on the expanding bar is the conservative 95% CI (0.1805–0.2050). Lower (left) is better.
It clears the Elo floor; the gap to the market is small and — at n = 3 — inside the bootstrap interval.
§ 06
Why a 48-team field with debutants needs shrinkage — and what the model says about history vs squad.
A World Cup with 48 teams includes nations the international record has barely seen. A per-team maximum-likelihood estimate of their attack and defence is unbiased but wildly noisy — a near-flat likelihood over a handful of matches. The hierarchical prior fixes this the way James–Stein shrinkage does: it pulls each team’s strength toward what its Elo and squad value imply (the prior mean), by an amount the data learns. On data-rich powerhouses the posterior all but equals the MLE, so the model nests Dixon-Coles and cannot do meaningfully worse; on newcomers the shrunk estimate strictly dominates the noisy one in total risk.
The same machinery emits the project’s scientific target, the decoupling g: how far a team’s squad quality sits above or below what its history predicts. Because history and squad value are badly collinear (the records correlate strongly), g is reported as a well-conditioned projection residual with a credible interval — not a fragile “history X%, squad Y%” split, which the data cannot identify.
Decoupling slope b — squad-above-record g on stage reached
tournament-clustered SE 0.173 · n = 118 team-tournaments
95% CI on the slope
[−0.31, +0.55] — not significant at n = 3
g vs pre-baked gap (sanity corr)
the model-based g aligns with the engineered residual
The slope is positive — teams whose squad value runs ahead of their history do tend to over-perform their Elo-implied stage — but the credible interval [−0.31, +0.55] includes zero. With only 5 tournaments of out-of-sample history, the direction is suggestive and the effect is not statistically resolved. It is surfaced as a measured posterior quantity with its uncertainty, never as a confident structural decomposition.
§ 07
What this model is good for — and where it is weak.
Strengths
Limits