Every forecast on this site rests on a single idea: a national team's chances come
from two things that do not always agree. The first is history —
the long record of who has beaten whom, distilled into a self-updating strength
rating that stretches back to the nineteenth century. The second is the
squad on the plane today — the clubs its players turn out for, the
minutes they are getting, the market that prices their talent. A team can be richer
than its results, or be winning more than its price tag says it should. The model's
job is to hold both pictures at once.
Those two pictures are turned into match odds by a ladder of models,
each a little more ambitious than the last. At the bottom is Elo — cheap,
transparent, reproducible from the results file alone, and the floor everything else
must clear. Above it, a goals model (Dixon–Coles) that predicts scorelines, a
Bayesian model that deliberately shrinks its opinion of teams it has barely seen, and
a gradient-boosting learner that reads 36 engineered features per
team. No single one of these is trusted on its own. They are pooled — a weighted
geometric average of their probabilities — into one ensemble, which
out of sample was both more accurate than any individual learner and steadier than a
cleverer combination rule we tried and rejected.
The ensemble only gives the odds for a single match. To get a champion, the real
48-team bracket is simulated 1,100,000 times —
every group, every tie-breaker (down to the head-to-head rule we had to correct by
hand), the eight best third-placed teams, extra time and shoot-outs calibrated to
hundreds of real ones — and the share of simulations a team wins becomes its title
probability. Because that share is an average over simulations, it carries a small,
reported margin of its own.
The headline: the model matches the market. It does not beat it.
That last point matters most. Measured the hard way — trained only on
matches before each test tournament, never on the answers — the ensemble's accuracy
(0.1891) edges the de-vigged bookmaker consensus (0.1905) and clears
the Elo floor (0.1938). But the margin over the market is a fraction of its
own uncertainty, and the test set is only three out-of-sample
tournaments. So the claim we stand behind is parity, not conquest: a model that has
earned a seat next to the market, carries genuinely independent information
(a 66%-weight blend improves on either alone), and is transparent enough
that you can check every step below.