Floor Floor · the reproducible baseline

Elo — strength you can rebuild from results alone

A self-updating rating that needs nothing but match results. It is the transparent floor every other model on this site must clear, and the history state the richer models build on.

0.1938

OOS RPS · expanding

the headline skill (realism protocol)

0.1937

OOS RPS · LOTO

optimistic ceiling (leaks future folds)

Leaderboard rank

of 7 · CI 0.1546–0.2281

§ 01

The intuition

In plain English, before any mathematics.

A self-updating strength rating: a team gains points for beating a stronger opponent and loses them otherwise, scaled by margin of victory and match importance. It is the transparent, reproducible floor every other model must clear.

§ 02

Mathematical specification

The exact scheme, from the expected score to a W/D/A forecast. Ratings are recomputed from all 49,445 internationals, so a rating exists for any team on any date — built strictly from matches played before that date.

E_{home} = \frac{1}{1 + 1 0 ^{- (R_{i} - R_{j} + H) /400}}, R_{i}^{'} = R_{i} + K g (MoV) (S - E)

Expected result (logistic on the rating gap) and the rating update

01The expected score

Two ratings give an expected result through a logistic curve on the rating gap. The 400-point scale fixes the units: a 400-point gap makes the stronger side a 10-to-1 favourite on expected score. The 100-point home-field bonus is added only when a side genuinely plays in its own country — neutral-venue matches, the bulk of tournament football, get none.

E_{home} = \frac{1}{1 + 1 0 ^{- (R_{i} - R_{j} + H 1_{home}) /400}}, H = 100

Expected score — logistic in the rating gap, base 10, scale 400

02The update

After every match the rating moves in proportion to the surprise — the gap between what happened and what the ratings expected. The step size K scales with match importance; the multiplier G lets the margin of victory speak.

R_{i}^{'} = R_{i} + K G (δ) (W - E_{home}), W \in {1, \frac{1}{2}, 0}

The rating update after every match

03Margin of victory

G is flat for one-goal results, jumps for two, then grows gently with the rout — a big win moves the rating more than a narrow one, but not without limit.

G (δ) = ⎩ ⎨ ⎧ 1 \frac{3}{2} \frac{11 + ∣ δ ∣}{8} ∣ δ ∣ \leq 1 ∣ δ ∣ = 2 ∣ δ ∣ \geq 3

The margin-of-victory multiplier on the goal difference δ

04Match importance

K follows the eloratings.net importance tiers: a World Cup match moves ratings three times as far as a friendly, with qualifiers and continental finals in between.

K = ⎩ ⎨ ⎧ 6050403020 World Cup finals continental finals qualifiers \cdot Nations League other tournaments friendlies

The importance-weighted step size

05From rating to forecast

A rating gap alone gives an expected score, not a draw probability. The Elo-only forecast — the floor on the leaderboard — adds a two-parameter draw curve: the draw probability decays as the gap grows, and the remaining mass is split by the logistic expected score. Both parameters are calibrated by minimising RPS on recent internationals.

p_{D} = clip (b e^{- c ∣Δ R ∣}, 0.05, 0.40), p_{H} = (1 - p_{D}) E_{home}, p_{A} = (1 - p_{D}) (1 - E_{home})

The Elo-only anchor — a calibrated draw curve over the logistic split

Symbol key

$R_{i}, R_{j}$: current Elo ratings of the two teams
$H$: home-field bonus (100 points), applied only when a side genuinely plays at home
$E_{home}$: expected score (win probability) from the logistic map, scaled by 400
$W$: actual result of the match (1 win, ½ draw, 0 loss)
$K$: the update step size, set by match importance
$G (δ)$: the margin-of-victory multiplier — bigger wins move the rating more
$δ$: the goal difference of the match
$Δ R$: the rating gap between the two teams (host-adjusted)
$b, c$: draw-curve base rate and decay — calibrated by minimising RPS on recent internationals
$p_{H}, p_{D}, p_{A}$: the forecast win / draw / away probabilities of the Elo-only anchor

§ 03

What data it uses

The inputs this model reads — and only these.

49,445 international results (1872-2026)
Host / neutral-venue flag per match

§ 04

How it works

A schematic of the model wired end to end.

Fig. M·Elo Conceptual schematic

Elo — wired end to end

Source · Oxford Football Forecasting model · structural diagram, not a data plot

§ 05

Out-of-sample skill

Where this model lands between the Elo floor and the market ceiling, on both backtest protocols.

Fig. V11 Lower is better · floor = Elo-only · ceiling = de-vigged market

OOS RPS — expanding (headline) and LOTO (optimistic)

On the headline expanding window this model scores 0.1938 — 0.0000 versus the Elo floor (0.1938) and +0.0032 versus the market ceiling (0.1905).

Expanding 0.1938

LOTO 0.1937

Bar fills to the model’s RPS on the floor–ceiling axis; the whisker on the expanding bar is the conservative 95% CI (0.1546–0.2281). Lower (left) is better.

Its match-RPS sits at the floor; the gap to the market is small and — at n = 3 — inside the bootstrap interval.

Source · Oxford Football Forecasting model · Bookmaker consensus (de-vigged closing odds) · 152 matches · 3 tournaments

§ 06

Strengths & limits

What this model is good for — and where it is weak.

Strengths

Fully reproducible from results alone
No squad/market data needed — works for every nation
Margin-of-victory + importance weighted

Limits

Knows nothing about the current squad
Slow to react for teams with few recent matches
Cannot price newcomers it has barely seen