WC 2026 · Forecasting Oxford Football Forecasting

§ Data · the transparency layer

The data, in the open — and yours to query

Every probability on this site is drawn from one locked, fully traceable dataset. This page opens the whole store: the 20 raw sources behind the forecast, the coverage map that changed the result, a dictionary for all 35 engineered features, the league-strength normalisation, the pipeline lineage — and a live SQL console that runs in your browser over all 12 tables.

20

Raw data sources

807.9 MB across 583 files

61

Pipeline scripts

raw → processed → models → the locked forecast

35

Engineered features

per team, grouped into five families

12

Queryable tables

74,722+ cells, live in-browser SQL

Source · project data inventory — sizes and file counts measured from the raw data store; features and table cells counted from the published dataset.

History 2

International results

martj42/international_results (GitHub)

Every men's A-international — results, goalscorers and penalty shootouts.

Span
49,445 matches · 1872 → 2026
Size
6.7 MB · 4 files
Role
History spine

World Football Elo

eloratings.net

Current world Elo for 244 teams plus year-end ratings back to 1950 — the published anchor we also recompute reproducibly from results.

Span
15,383 team-years · 1950 → 2026
Size
433.8 KB · 4 files
Role
Team-rating prior

Tournament & squad 1

WC2026 structure & squads

fixturedownload.com + Wikipedia

The official 104-fixture / 12-group / 48-team structure with venues and dates, and the 26-man squads (number, position, dob, caps, goals, club) for all 1,245 selected players.

Span
104 fixtures · 48 squads · 1,245 players
Size
2.0 MB · 6 files
Role
Tournament & squad

Squad quality 3

Transfermarkt squad values

Transfermarkt

National-team squad market values for all 48 teams (2,278 players × value, age, caps, goals) — England €1.88bn down to Jordan €16m — plus the 2016→2024 value history.

Span
48/48 teams · 2,278 players · value history 2016 → 2024
Size
105.6 KB · 3 files
Role
Squad quality (primary)

Coaches

Wikipedia / federation pages

Each nation's coach, their nationality, the foreign-coach flag and appointment date — feeds the coaches page and every dossier header.

Span
48/48 coaches
Size
Role
Bench

SoFIFA / EA FC ratings

partial

SoFIFA / EA FC (via soccerdata)

Club EA-FC ratings (96 clubs) as a per-player quality reference. Player-level ratings proved redundant — fully superseded by Transfermarkt value.

Span
96 clubs (club reference)
Size
41.6 KB · 2 files
Role
Club reference

Club form & events 4

Understat club xG

Understat (via soccerdata)

Club-level expected goals (xg, np_xg, xg_chain, xg_buildup) for the top-5 European leagues, extended back to 2016-17 so each backtest tournament has its club season.

Span
8,908 matches · 27,177 player-seasons (top-5 only)
Size
11.2 MB · 3 files
Role
Player / club form

API-Football global club stats

API-Football

Global club appearances, minutes and goals across 107 leagues in 68 countries — the genuinely worldwide club panel that de-biased the squad layer away from a Europe-only feed.

Span
107 leagues · 68 countries · 4,471 club-seasons
Size
1.3 MB · 6 files
Role
Player / club form (global)

StatsBomb open data

statsbomb/open-data (GitHub)

Full event streams and lineups for recent major tournaments (WC2018/2022, Euro2020/2024, Copa2024) — the public alternative to licensed Opta/Wyscout event data.

Span
262 matches · ~530 event/lineup files
Size
771.3 MB · 530 files
Role
Event / player

FBref season stats

FBref (via soccerdata)

Tournament schedules and player/team season stats (goals, assists, shots). Season tables carry no xG — that is derived from StatsBomb instead.

Span
schedule + season stats · 34/48 nations resolve
Size
556.9 KB · 4 files
Role
Player performance

Validation & backtest 2

Bookmaker odds

the-odds-api (Pro)

WC2026 outright and full match markets (h2h, totals, spreads), plus the out-of-time closing-odds consensus for the 152 backtest matches — the de-vigged market benchmark.

Span
WC2026 markets + 52,028 historical quotes · 3 backtest tournaments
Size
7.3 MB · 9 files
Role
Validation anchor

Historical squads (backtest)

Wikipedia (past tournaments)

As-of squads for five past tournaments (WC2018/2022, Euro2020/2024, Copa2024) so every engineered feature can be rebuilt leakage-safe for the out-of-time backtest.

Span
3,226 players · 128 team-tournaments
Size
250.7 KB · 1 file
Role
Backtest enablement

Context & environment 6

World Bank macro

World Bank API

GDP, GDP-per-capita and population for 261 countries — the macro prior and the GDP-growth context features.

Span
261 countries · 1990 → 2024
Size
565.2 KB · 1 file
Role
Country prior

Venues & environment

curated venues + open-elevation + open-meteo

The 16 host venues with coordinates, elevation, June–July heat index and a full travel-distance matrix — hottest Houston 47°C, highest Mexico City 2,287m.

Span
16 venues · per-team travel / heat / altitude
Size
6.3 KB · 3 files
Role
Environment (§7)

US Census — diaspora & language

US Census (ACS B05006 / B16001)

Foreign-born population by origin and home-language speakers in the United States — the novel quasi-home-support and host-language-familiarity signals.

Span
foreign-born 38/48 · home-language 25/48
Size
2.7 KB · 2 files
Role
Diaspora / language

UN migrant-stock

UN DESA migration

Bilateral international migrant-stock — the broader diaspora mechanism behind quasi-home support, complementing the US-specific census layer.

Span
bilateral migrant-stock matrix
Size
6.1 MB · 1 file
Role
Diaspora (global)

Government sport spend (COFOG)

partial

Eurostat COFOG

Government recreation-and-sport spending as a share of GDP and its trend — an investment-context control, available for EU/EEA nations only.

Span
12/48 teams (EU/EEA only)
Size
15.8 KB · 1 file
Role
Investment context

Player-club climate

open-meteo (per club city)

The home-climate each player is acclimatised to, aggregated to a squad acclimatisation temperature and differenced against host-venue heat.

Span
per-player club climate → squad acclimatisation
Size
54.9 KB · 1 file
Role
Acclimatisation

Attempted · deferred 2

Injuries / availability

empty

Transfermarkt injury pages

All 48 national-team injury pages were scraped, but returned 0 parseable entries three days before kickoff. Availability is already encoded in the finalised 26-man squads (injured players omitted).

Span
0 parseable rows — re-runnable mid-tournament
Size
230 B · 2 files
Role
Attempted · empty

FIFA ranking history

deferred

inside.fifa.com

Deferred — the page loads its ranking history client-side, and Elo is the stronger, fully-reproducible state variable used as the primary prior.

Span
deferred — Elo substitutes
Size
Role
Deferred
How the catalog is built (and why two sources are deferred)

Sizes, file counts and spans are measured directly from the data store at build time, so the catalog cannot drift from the data it describes.

Two sources are shown as attempted-but-deferred: the FIFA-ranking history, because Elo is the stronger, fully-reproducible rating prior; and the injury feed, which returned no usable rows three days before kick-off — availability is instead encoded in the finalised 26-man squads (injured players were simply omitted). Both are re-runnable mid-tournament.

Fig. V10 World choropleth · share of each squad with matched club data

Squad club-data coverage, before and after the global panel

Toggle between BEFORE (Understat top-5-Europe) and AFTER (global API-Football). Before, almost everything outside Western Europe is pale — not because those squads are weak, but because their players play in leagues the feed never saw. After, the map fills in. The map is interactive (drag, zoom); the same numbers are in the table below.

Map loads when it scrolls into view. The confederation bars and the per-team table below carry the same finding without it.

England and Scotland share one shape on the map (both are GBR in ISO-3166); it is tinted with England’s coverage. Cabo Verde and Curaçao are small but present. The table below lists every nation individually.

Overall coverage rises from 48% to 85% — and the lift is largest exactly where the old feed was blind: OFC 15%→92%, AFC 17%→71%. UEFA, already well-covered, barely moves.

Source · Understat (top-5 Europe) vs API-Football (global club coverage) · world boundaries: Natural Earth

Fig. V10b Before → after, weakest-covered confederation first

Coverage uplift by confederation — biggest gains off-UEFA

OFC · 1 team +77pt
15 92%
AFC · 9 teams +54pt
17 71%
CONCACAF · 6 teams +50pt
30 81%
CAF · 10 teams +44pt
45 88%
CONMEBOL · 6 teams +30pt
54 85%
UEFA · 16 teams +19pt
73 92%
All 48 48% → 85%
before (top-5 Europe) after (global panel)

The gain is not a uniform shift — it lands precisely where top-5 European feeds see least. The confederations they under-cover (OFC, AFC, CONCACAF) gain the most; UEFA, already near-complete, gains least.

Source · Understat (top-5 Europe) vs API-Football (global club coverage) — mean squad coverage per confederation

Every nation, before → after

The full per-team coverage, biggest gainer first. South Africa, Qatar and Saudi Arabia were nearly invisible to a top-5-only feed (one or zero players); the global panel sees almost their entire squad. The two UEFA teams at the bottom dip a touch because the global match window is stricter than Understat’s season — coverage, not invention.

Nation In top-5 Before After Uplift
Qatar14%100%+96pt
South Africa14%96%+92pt
Saudi Arabia14%92%+88pt
New Zealand415%92%+77pt
Korea Republic415%85%+70pt
Egypt415%85%+70pt
Panama28%77%+69pt
Cabo Verde415%81%+66pt
Mexico831%92%+61pt
Czechia1038%96%+58pt
Canada624%76%+52pt
Türkiye831%81%+50pt
Uzbekistan00%50%+50pt
Morocco1246%96%+50pt
Curaçao519%65%+46pt
IR Iran415%58%+43pt
Haiti935%77%+42pt
Paraguay935%77%+42pt
Jordan00%42%+42pt
Bosnia and Herzegovina1246%85%+39pt
Australia1142%81%+39pt
Ecuador831%69%+38pt
Algeria1350%88%+38pt
Uruguay1350%85%+35pt
Iraq312%46%+34pt
Brazil1662%96%+34pt
USA1765%96%+31pt
Norway1765%96%+31pt
Tunisia935%65%+30pt
Colombia1558%85%+27pt
Côte d'Ivoire1869%96%+27pt
Japan1662%88%+26pt
Scotland1662%88%+26pt
Sweden1662%85%+23pt
Senegal2077%100%+23pt
Ghana1765%88%+23pt
Congo DR1869%88%+19pt
Croatia2181%96%+15pt
Belgium2077%92%+15pt
Austria2184%96%+12pt
Spain2388%100%+12pt
Portugal2181%92%+11pt
England2388%96%+8pt
France2285%92%+7pt
Argentina2392%96%+4pt
Switzerland2388%92%+4pt
Germany2596%96%−0pt
Netherlands2596%88%−8pt

Source · Understat (top-5 Europe) vs API-Football (global club coverage). Sort any column; the bars show the share of each squad with matched club data.

History (h) 8 features

Current Elo rating

Current Elo rating (eloratings.net method, end-2026).

1,421 med 1779.50 2,155

src Elo ratings (eloratings.net method)elo_now

Elo change over 5 years

Elo change over last 5 years.

-242 med 32 194

src Elo ratings (eloratings.net method)elo_change_5y

Elo change over 10 years

Elo change over last 10 years.

-160 med 53.50 338

src Elo ratings (eloratings.net method)elo_change_10y

Linear Elo slope per year (10y)

Linear Elo slope per year over 10y.

-25 med 5.35 31.90

src Elo ratings (eloratings.net method)elo_slope_10y

Points per game, last 15 NT matches

National-team points-per-game, last 15 matches.

0.933 med 1.87 2.47

src International match results, 1872–2026nt_last15_ppg

Goal difference per game, last 15

National-team goal-difference per game, last 15.

-0.530 med 0.835 2.60

src International match results, 1872–2026nt_last15_gd_pg

Win rate, last 15 NT matches

National-team win rate, last 15.

0.200 med 0.533 0.800

src International match results, 1872–2026nt_last15_winrate

NT recent-form trend

Recent NT form trend (slope).

-1.70 med -0.350 1.30

src International match results, 1872–2026nt_form_trend

Squad quality (s) 9 features

Mean squad age

Mean squad age.

23.70 med 25.60 27.60

src Official squad announcementsmean_age

Share of squad aged 25-29 (peak)

Share of squad aged 25–29 (peak).

0.200 med 0.387 0.526

src Official squad announcementspct_peak_25_29

# players over 30

Players over 30.

1 med 6 15

src Official squad announcementsn_over30

# players under 23

Players under 23.

1 med 12 31

src Official squad announcementsn_under23

Mean caps (experience)

Mean international caps.

12 med 24.75 43.60

src Official squad announcementsmean_caps

Largest single-club bloc

Largest same-club bloc size.

1 med 3 10

src Official squad announcementslargest_club_bloc

# distinct clubs in squad

Distinct clubs in squad.

8 med 22 26

src Official squad announcementsn_distinct_clubs

Total squad market value (EUR)

Total squad market value (Transfermarkt, EUR).

16M med €0.29bn €1.88bn

src Transfermarktsquad_value_eur

Squad-value growth (8y CAGR)

Squad value growth, 8y.

-0.011 med 0.091 0.324

src Transfermarktvalue_cagr_8y

Form & fitness 10 features

Squad xG+xA per 90 (Understat top-5)

4 n/a

Squad club xG+xA per 90 (Understat top-5).

0.030 med 0.246 0.476

src Understatsquad_form_xgxa_per90

# squad players with Understat top-5 data

Coverage of top-5-league form data (0–1).

0 med 12.50 25

src Understatsquad_form_coverage

Squad fitness readiness (top-5)

4 n/a

Top-5-league minutes-based readiness.

0.020 med 0.513 0.814

src Oxford Football Forecasting modelsquad_fitness_readiness

Mean club minutes per game

4 n/a

Mean club minutes per game.

6.50 med 59.80 75.50

src API-Football (global club coverage)mean_minutes_per_game

Squad fitness (global)

GLOBAL (de-biased) fitness readiness.

0.158 med 0.707 0.886

src API-Football (global club coverage)squad_fitness_global

Global fitness coverage (fraction)

Coverage of global fitness data (0–1).

0.420 med 0.880 1

src API-Football (global club coverage)fitness_coverage_global

Squad club form (global, league-weighted)

GLOBAL (de-biased) squad club form.

0.052 med 0.211 0.410

src API-Football (global club coverage)squad_form_global

Squad club form (global, unweighted)

Global form before league normalisation.

0.072 med 0.148 0.250

src API-Football (global club coverage)squad_form_global_raw

Mean league-strength coefficient of squad clubs

Mean league-strength coefficient of squad's clubs.

0.502 med 1.41 1.80

src Oxford Football Forecasting modelmean_league_coef

Global club-form coverage (fraction)

Coverage of global form data (0–1).

0.420 med 0.880 1

src API-Football (global club coverage)form_coverage_global

Context (c) 6 features

GDP compound annual growth, 5y

GDP compound annual growth, 5y (World Bank).

-0.047 med 0.046 0.119

src World Bankgdp_cagr_5y

GDP compound annual growth, 10y

GDP compound annual growth, 10y.

-0.019 med 0.022 0.064

src World Bankgdp_cagr_10y

Squad familiarity / chemistry index

Shared-club chemistry index.

0 med 0.015 0.172

src Oxford Football Forecasting modelfamiliarity_index

# shared-club player pairs

Count of same-club player pairs.

0 med 5 56

src Oxford Football Forecasting modelshared_club_pairs

Minute-weighted familiarity index

Co-tenure-weighted chemistry.

0 med 0.026 0.434

src Oxford Football Forecasting modelfamiliarity_weighted_index

Climate-adaptation gap to host venues

1 n/a

Squad acclimatisation vs host max-temp (°C).

-18 med 1.70 9.60

src Open-Meteo & venue recordsclimate_gap

Decoupling (g) 2 features

z(NT form) - z(squad value): over/under-performance

Form residual vs market value.

-2.79 med 0.115 1.75

src Oxford Football Forecasting modelform_vs_value_gap

z(Elo) - z(squad value): history vs market

Elo residual vs market value.

-1.71 med 0.113 1.23

src Oxford Football Forecasting modelelo_vs_value_gap

Source · Oxford Football Forecasting model — the engineered feature panel (48 teams × 35 numeric features). Sparkline = a 12-bin histogram across the 48 nations.

League-strength normalisation

A player’s club form means more in a stronger league. Each of the 60 club leagues gets a relative-strength coefficient, estimated from market-implied club quality and shrunk toward a confederation prior for thin leagues. England tops it; the coefficient is what re-weights every player’s minutes and goals before they roll up into the squad-form signal. The top 12 are shown — query the explorer below for all 60.

League (country) Clubs priced Strength (z) Form coef. Example clubs
England236+2.211.8Fulham, Brentford, Manchester City
Spain111+2.131.8Rayo Vallecano, Real Betis, Sevilla
Germany141+1.841.8SV Darmstadt 98, Eintracht Frankfurt, Bayer Leverkusen
Italy82+1.701.8Empoli, Atalanta, Cittadella
France95+1.701.8Marseille, Lille, Nice
Portugal44+1.141.49Famalicao, Benfica, Sporting CP
Brazil25+1.031.433Palmeiras, Santos, Atletico Paranaense
Netherlands55+0.741.297Feyenoord, Twente, Ajax
Turkey44+0.491.186Beşiktaş, Sivasspor, Bursaspor
Croatia14+0.341.124Dinamo Zagreb, HNK Rijeka, HNK Hajduk Split
Russia21+0.271.098Lokomotiv, Rubin, Baltika
Austria10+0.261.096Grazer AK, Lask Linz, Red Bull Salzburg

Source · Oxford Football Forecasting model — league-strength estimates for all 60 leagues. Strength (z) is standardised; the form coefficient re-weights club form in the squad roll-up.

How the data becomes a forecast

The lineage, end to end: 20 raw sources are harmonised and engineered by 61 scripts into the processed panels, which feed the model ladder, which the simulator plays 1.1M times into the locked forecast this whole site reads.

Fig. V20 One direction, one path

Data lineage — raw to locked forecast

  1. Raw 20 sources results · Elo · squads · API-Football · Transfermarkt · odds · World Bank · environment …
  2. Engineer 61 scripts crosswalk · reproducible Elo · feature build · de-biasing · backtest panel
  3. Processed ~35 tables derived_features · league_strength · squad_form_global · country_context …
  4. Models 7 + ensemble Elo · Dixon-Coles · Bayesian · LightGBM → log-pool ensemble
  5. Simulate 1.1M draws real 48-team bracket · full FIFA tiebreakers · MC-SE
  6. Locked forecast this site one locked forecast → every page; SHA-256 stamped in the footer

There is exactly one path from data to page: every figure on this site is produced from the same locked forecast — which is why every number reconciles.

Source · Oxford Football Forecasting model
Try:

SQL runs in your browser — the engine loads on your first query.

Press Run (or ⌘/Ctrl + Enter) to execute the query above. Results — sortable, exportable — appear here.

Or browse a table:

What the explorer is (and what it is not)

The console runs a full SQL engine inside your browser — it loads on your first query, and nothing you type or compute ever leaves your machine. It is read-only: the published tables cannot be altered, and a stray UPDATE simply returns an error. Result tables are capped at 1,000 rows in the page for speed — the CSV export carries the full set. If the engine cannot load, the schema below still documents every table, column and type.

The schema — 12 tables

Every queryable table, its row count, what it holds, and (where documented) each column with its type and meaning. This is the no-JavaScript-complete reference for the explorer above.

forecast 48 rows · 46 cols The locked champion / stage probabilities + group-stage odds (the single source of truth).
ColumnTypeMeaning
rank int64
code str
cca3 str
team str
p_R32 float64 P(reach Round of 32 = clear group stage).
se_R32 float64
p_R16 float64 P(reach Round of 16).
se_R16 float64
p_QF float64 P(reach quarter-final).
se_QF float64
p_SF float64 P(reach semi-final).
se_SF float64
p_Final float64 P(reach Final).
se_Final float64
reality_champ float64 P(champion), fixture-aware (the LOCKED forecast). = LOCKED p_Champion.
reality_champ_se float64 Monte-Carlo standard error of champion prob.
conformal_level str
conformal_coverage_0p90 float64
conformal_mean_set_size float64
group str
confederation str
power_rank int64
power_champ float64 Fixture-FREE 'Power' champion prob (mean over constrained re-draws).
power_champ_se float64
power_se_mc float64
draw_luck float64 actual_champ − power_champ; >0 = soft draw, <0 = tough draw.
draw_luck_se float64
bracket_half int64
bracket_quadrant int64
confed_color str
is_host bool
gs_p_pos1 float64
gs_p_pos2 float64
gs_p_pos3 float64
gs_p_pos4 float64
gs_p_win_group float64
gs_p_top2 float64
gs_p_best_third float64
gs_p_advance float64
gs_exp_points float64
gs_mean_gd float64
gs_mean_gf float64
gs_se_win_group float64
gs_se_top2 float64
gs_se_best_third float64
gs_se_advance float64
rankings 48 rows · 35 cols Power vs Reality ranks, champion odds and draw-luck per team.
ColumnTypeMeaning
power_rank int64
team str
group str
confederation str
actual_champ float64
actual_champ_se float64
power_champ float64
power_se float64
power_se_mc float64
draw_luck float64
draw_luck_se float64
bracket_half int64
bracket_quadrant int64
bracket_half_if_runnerup int64
bracket_quadrant_if_runnerup int64
power_p_R32 float64
power_se_R32 float64
actual_p_R32 float64
power_p_R16 float64
power_se_R16 float64
actual_p_R16 float64
power_p_QF float64
power_se_QF float64
actual_p_QF float64
power_p_SF float64
power_se_SF float64
actual_p_SF float64
power_p_Final float64
power_se_Final float64
actual_p_Final float64
power_p_Champion float64
power_se_Champion float64
actual_p_Champion float64
code str
reality_rank int64
teams 48 rows · 127 cols The full 48-team panel — forecast + every engineered feature + context, one wide row per nation.
ColumnTypeMeaning
rank int64
code str
cca3 str
team str
p_R32 float64
se_R32 float64
p_R16 float64
se_R16 float64
p_QF float64
se_QF float64
p_SF float64
se_SF float64
p_Final float64
se_Final float64
reality_champ float64
reality_champ_se float64
conformal_level str
conformal_coverage_0p90 float64
conformal_mean_set_size float64
group str
confederation str
power_rank int64
power_champ float64
power_champ_se float64
power_se_mc float64
draw_luck float64
draw_luck_se float64
bracket_half int64
bracket_quadrant int64
confed_color str
is_host bool
gs_p_pos1 float64
gs_p_pos2 float64
gs_p_pos3 float64
gs_p_pos4 float64
gs_p_win_group float64
gs_p_top2 float64
gs_p_best_third float64
gs_p_advance float64
gs_exp_points float64
gs_mean_gd float64
gs_mean_gf float64
gs_se_win_group float64
gs_se_top2 float64
gs_se_best_third float64
gs_se_advance float64
elo_now int64
elo_change_5y int64
elo_change_10y int64
elo_slope_10y float64
gdp_cagr_5y float64
gdp_cagr_10y float64
nt_last15_ppg float64
nt_last15_gd_pg float64
nt_last15_winrate float64
nt_form_trend float64
squad_form_xgxa_per90 float64
squad_form_coverage int64
mean_age float64
pct_peak_25_29 float64
n_over30 int64
n_under23 int64
mean_caps float64
largest_club_bloc int64
n_distinct_clubs int64
squad_value_eur float64
form_vs_value_gap float64
elo_vs_value_gap float64
value_cagr_8y float64
familiarity_index float64
shared_club_pairs int64
familiarity_weighted_index float64
climate_gap float64
squad_fitness_readiness float64
mean_minutes_per_game float64
squad_fitness_global float64
fitness_coverage_global float64
squad_form_global float64
squad_form_global_raw float64
mean_league_coef float64
form_coverage_global float64
g_s_hat float64
g_h_hat float64
g_mean float64
g_sd float64
primary_language str
n_languages int64
shares_lang_with_host bool
region str
subregion str
capital_lat float64
capital_lon float64
home_utc_offset int64
gdp_usd float64
gdp_per_capita float64
population int64
dist_nearest_venue_km int64
mean_dist_played_km int64
max_timezone_shift_h int64
max_venue_altitude_m int64
max_venue_heat_index_c float64
coach str
coach_nationality str
appointed float64
foreign_coach bool
n_players int64
mean_value_eur float64
value_top3_share float64
n_in_top5_league int64
share_in_top5_league float64
n_abroad_in_top5 int64
top5_npxg_plus_xa float64
top5_minutes int64
top5_league_breakdown str
tm_value_coverage float64
n_clustered_players int64
n_club_blocs int64
top_blocs str
co_tenure_seasons int64
pairs_with_shared_history int64
diaspora_usa float64
diaspora_can float64
diaspora_mex float64
diaspora_hosts_total float64
diaspora_per_1000_pop float64
squad_acclim_tmax float64
host_tmax float64
players 1,245 rows · 22 cols All 1,245 squad players with their global club season (minutes, goals, league strength).
ColumnTypeMeaning
player_id str
number int64
position str
player str
dob str
caps int64
nt_goals int64
club_squad str
team str
group str
club_apifootball str
league str
league_country str
club_minutes float64
club_apps float64
club_goals_season float64
stats_matched bool
league_strength_z float64
league_log_q float64
league_form_coef float64
code str
confederation str
coaches 48 rows · 8 cols Each nation’s coach, nationality and the foreign-coach flag.
ColumnTypeMeaning
team str
coach str
coach_nationality str
appointed float64
foreign_coach bool
code str
confederation str
group str
matches 104 rows · 13 cols All 104 fixtures with venue, round and (where played) score.
ColumnTypeMeaning
match_no int64
round int64
date_utc str
venue str
home str
away str
group str
home_score float64
away_score float64
winner float64
stage str
home_code str
away_code str
features 48 rows · 39 cols The 36-column 2026 feature panel the models read (team key + 35 numeric).
ColumnTypeMeaning
team str
elo_now int64 Current Elo rating (eloratings.net method, end-2026).
elo_change_5y int64 Elo change over last 5 years.
elo_change_10y int64 Elo change over last 10 years.
elo_slope_10y float64 Linear Elo slope per year over 10y.
gdp_cagr_5y float64 GDP compound annual growth, 5y (World Bank).
gdp_cagr_10y float64 GDP compound annual growth, 10y.
nt_last15_ppg float64 National-team points-per-game, last 15 matches.
nt_last15_gd_pg float64 National-team goal-difference per game, last 15.
nt_last15_winrate float64 National-team win rate, last 15.
nt_form_trend float64 Recent NT form trend (slope).
squad_form_xgxa_per90 float64 Squad club xG+xA per 90 (Understat top-5).
squad_form_coverage int64 Coverage of top-5-league form data (0–1).
mean_age float64 Mean squad age.
pct_peak_25_29 float64 Share of squad aged 25–29 (peak).
n_over30 int64 Players over 30.
n_under23 int64 Players under 23.
mean_caps float64 Mean international caps.
largest_club_bloc int64 Largest same-club bloc size.
n_distinct_clubs int64 Distinct clubs in squad.
squad_value_eur float64 Total squad market value (Transfermarkt, EUR).
form_vs_value_gap float64 Form residual vs market value.
elo_vs_value_gap float64 Elo residual vs market value.
value_cagr_8y float64 Squad value growth, 8y.
familiarity_index float64 Shared-club chemistry index.
shared_club_pairs int64 Count of same-club player pairs.
familiarity_weighted_index float64 Co-tenure-weighted chemistry.
climate_gap float64 Squad acclimatisation vs host max-temp (°C).
squad_fitness_readiness float64 Top-5-league minutes-based readiness.
mean_minutes_per_game float64 Mean club minutes per game.
squad_fitness_global float64 GLOBAL (de-biased) fitness readiness.
fitness_coverage_global float64 Coverage of global fitness data (0–1).
squad_form_global float64 GLOBAL (de-biased) squad club form.
squad_form_global_raw float64 Global form before league normalisation.
mean_league_coef float64 Mean league-strength coefficient of squad's clubs.
form_coverage_global float64 Coverage of global form data (0–1).
code str
confederation str
group str
features_historical 128 rows · 17 cols The same features rebuilt leakage-safe for 128 past team-tournaments.
ColumnTypeMeaning
tournament str
team str
elo_asof float64
elo_change_5y float64
nt_last15_ppg float64
nt_last15_gd_pg float64
squad_form_xgxa_per90 float64
squad_fitness_readiness float64
familiarity_index float64
familiarity_weighted_index float64
understat_coverage float64
squad_fitness_global float64
fitness_coverage_global float64
squad_form_global float64
squad_form_global_raw float64
mean_league_coef float64
form_coverage_global float64
model_results 2,128 rows · 14 cols Every out-of-sample match prediction (7 models × W/D/L) — the backtest evidence.
ColumnTypeMeaning
protocol str
fold str
model str
match_id str
home str
away str
pH float64
pD float64
pA float64
actual str
understat_tier str
subgroup str
p_over float64
total_actual int64
collisions 28 rows · 8 cols The top-8 “earliest possible meeting round” pairs from the bracket draw.
ColumnTypeMeaning
teamA str
teamB str
groupA str
groupB str
earliest_round str
detail str
codeA str
codeB str
league_strength 60 rows · 9 cols The 60-league relative-strength normalisation (England top → Qatar bottom).
ColumnTypeMeaning
league_country str
n_priced int64
raw_log_q float64
shrunk_log_q float64
strength_z float64
prior_backed bool
form_coef float64
example_clubs str
confederation_hint str
elo_current 336 rows · 3 cols Current world Elo for 336 teams — the reproducible rating spine.
ColumnTypeMeaning
team str
elo float64
rank int64

Keep reading