WC 2026 · Forecasting Oxford Football Forecasting

Floor Floor · the reproducible baseline

Elo — strength you can rebuild from results alone

A self-updating rating that needs nothing but match results. It is the transparent floor every other model on this site must clear, and the history state the richer models build on.

0.1938

OOS RPS · expanding

the headline skill (realism protocol)

0.1937

OOS RPS · LOTO

optimistic ceiling (leaks future folds)

#7

Leaderboard rank

of 7 · CI 0.1546–0.2281

A self-updating strength rating: a team gains points for beating a stronger opponent and loses them otherwise, scaled by margin of victory and match importance. It is the transparent, reproducible floor every other model must clear.

Expected result (logistic on the rating gap) and the rating update

01The expected score

Two ratings give an expected result through a logistic curve on the rating gap. The 400-point scale fixes the units: a 400-point gap makes the stronger side a 10-to-1 favourite on expected score. The 100-point home-field bonus is added only when a side genuinely plays in its own country — neutral-venue matches, the bulk of tournament football, get none.

Expected score — logistic in the rating gap, base 10, scale 400

02The update

After every match the rating moves in proportion to the surprise — the gap between what happened and what the ratings expected. The step size K scales with match importance; the multiplier G lets the margin of victory speak.

The rating update after every match

03Margin of victory

G is flat for one-goal results, jumps for two, then grows gently with the rout — a big win moves the rating more than a narrow one, but not without limit.

The margin-of-victory multiplier on the goal difference δ

04Match importance

K follows the eloratings.net importance tiers: a World Cup match moves ratings three times as far as a friendly, with qualifiers and continental finals in between.

The importance-weighted step size

05From rating to forecast

A rating gap alone gives an expected score, not a draw probability. The Elo-only forecast — the floor on the leaderboard — adds a two-parameter draw curve: the draw probability decays as the gap grows, and the remaining mass is split by the logistic expected score. Both parameters are calibrated by minimising RPS on recent internationals.

The Elo-only anchor — a calibrated draw curve over the logistic split

Symbol key

current Elo ratings of the two teams
home-field bonus (100 points), applied only when a side genuinely plays at home
expected score (win probability) from the logistic map, scaled by 400
actual result of the match (1 win, ½ draw, 0 loss)
the update step size, set by match importance
the margin-of-victory multiplier — bigger wins move the rating more
the goal difference of the match
the rating gap between the two teams (host-adjusted)
draw-curve base rate and decay — calibrated by minimising RPS on recent internationals
the forecast win / draw / away probabilities of the Elo-only anchor
  • 49,445 international results (1872-2026)
  • Host / neutral-venue flag per match

Fig. M·Elo Conceptual schematic

Elo — wired end to end

Rᵢ , Rⱼ prior ratings E = σ(ΔR + H) expected result logistic, /400 S − E surprise × MoV Rᵢ′ + K·g(MoV) self-updating — next match uses the new rating
Source · Oxford Football Forecasting model · structural diagram, not a data plot

Fig. V11 Lower is better · floor = Elo-only · ceiling = de-vigged market

OOS RPS — expanding (headline) and LOTO (optimistic)

On the headline expanding window this model scores 0.1938 — 0.0000 versus the Elo floor (0.1938) and +0.0032 versus the market ceiling (0.1905).

Expanding 0.1938
LOTO 0.1937

Bar fills to the model’s RPS on the floor–ceiling axis; the whisker on the expanding bar is the conservative 95% CI (0.1546–0.2281). Lower (left) is better.

Its match-RPS sits at the floor; the gap to the market is small and — at n = 3 — inside the bootstrap interval.

Source · Oxford Football Forecasting model · Bookmaker consensus (de-vigged closing odds) · 152 matches · 3 tournaments

Strengths

  • Fully reproducible from results alone
  • No squad/market data needed — works for every nation
  • Margin-of-victory + importance weighted

Limits

  • Knows nothing about the current squad
  • Slow to react for teams with few recent matches
  • Cannot price newcomers it has barely seen