§ Research · the long reads

Three findings, written out in full

Most of this site reports what the forecast says. These three essays are about how it was built — and about the limits of what it can claim. Each is a real result: a squad layer extended far beyond Europe’s top five leagues, a way of reading a squad against its own history, and the quiet role of luck in a bracket. Each carries the same ceiling the whole project carries — three tournaments of out-of-sample evidence, which is enough to find an effect and rarely enough to prove it.

Oxford Football Forecasting · figures reconcile to the locked forecast

Method · global squad coverage

Seeing the whole field

Public football analytics runs on club data from Europe’s top five leagues — a lens that under-represents most of the world. This project extends the squad layer to a global panel of 68 countries and 107 leagues, and the gain lands exactly where top-five-only work is blind. A placebo test confirms the coverage itself is doing the work.

Coverage rose from 48% to 85% of the average squad; the lift was largest exactly where the top-five feed was blind, and it moved the gradient-boosted model from the worst learner to the best below the blend and the market.

+37 pt mean squad-coverage uplift

Figures Coverage choropleth · confederation uplift bars · before/after RPS

Read the essay

Finding · the projection residual g

History versus the squad

Some teams are worth more than their results; others win more than their price tag says they should. The model puts a number on that gap — the decoupling g — and reads it as a description, not a law.

Portugal sit highest (g +0.75: a squad valued above its record); Australia lowest (g −0.72: a strong record on a comparatively cheap squad). Across five tournaments the slope is positive but its interval spans zero.

+0.03 regression slope b · CI [−0.31, +0.55]

Figures The g scatter (history vs value) · the field ranked by g · the regression

Read the essay

Finding · strength meets the bracket

Power versus the draw

A team can be strong and still be unlucky. Separate raw strength from the bracket it was handed, and a second ranking appears — one where a softer draw pulls England level with Brazil, and the two title favourites can meet before the competition has properly begun.

Belgium drew the softest path (+0.41 pp), Brazil among the toughest (−0.41 pp). But almost every team sits inside the noise — the draw is close to fair.

+0.64 pp England’s draw-luck swing over Brazil

Figures Draw-luck tornado · the strength-vs-luck scatter · the top-8 collision matrix

Read the essay

§ ·

One thread runs through all three

They are different findings, but they share a discipline: state the effect, then state its uncertainty in the same breath, and never let a small sample masquerade as a large one.

The effect is real and directional

The coverage gain lands exactly off-UEFA, where it was predicted to; high-g teams really do over-run their valuations; the softest and toughest draws are the ones the bracket maths implies. None of this is noise-mining — the signs were set in advance.

The significance is not

With three out-of-sample tournaments, every confidence interval is wide by construction. The de-biasing gain, the decoupling slope and the draw-luck advantages all have intervals that include zero. We report that as a result, not a footnote.

So the verb is “matches”, not “beats”

One summary sits underneath all three pieces: the ensemble matches the de-vigged market on out-of-sample RPS (0.1891 vs 0.1905). It does not significantly beat it. These essays explain where the model earns its keep anyway.

Where the findings feed back in

Data →The coverage choropleth and the 60-league strength table, in full, with a live SQL console. Models →The model ladder where the de-biasing lands and the Bayesian g is estimated. Rankings →Power versus Reality for all 48, with the draw-luck column and significance flags. Validation →The backtest behind the n = 3 ceiling: calibration, conformal coverage, the subgroup audit.