The dataset

Data

A long run of results, with richer modern data layered on top, covering five leagues and both divisions from 1993/94 to 2025/26. Every source is public, and each is used only over the years it genuinely covers.

01Sources

SourceProvidesWindow
Football-Data.co.ukresults, shots, bookmaker odds (5 leagues, both tiers)1993/94–
ClubElopreseason strength ratings (20,585 snapshots)1994–2026
Understatexpected goals (xG), top flights2014/15–
FBrefadvanced team/player stats, squad age, minutes2010–
Transfermarktsquad age, managers, transfer fees2014–
SoFIFAFIFA ratings, transfer budget, club worth2014–26
Wyscoutraw event data for playing style (3M events)2017/18

02Coverage

A row of data is one club's season in the top flight. The long run of results, Elo and odds covers the whole history. The richer layers, such as expected goals, squad make-up and transfers, only start in 2014 and only cover the top divisions.

Which seasons the study uses

The study uses every season from 1993/94 to 2025/26, the full 33 years, not a single one. All 3,204 club-seasons are pooled together. 2025/26 turns up a lot only because it is the most recent finished season. It brings the data up to the present and gives the model a fresh test, since we now know how the clubs promoted for 2025/26 got on. The 2026/27 forecast is trained on the whole history through 2025/26, with that latest finished season included, and then applied to the clubs promoted for 2026/27, using their 2025/26 form in the division below. The only place 2025/26 is held back is the back-test, which trains on earlier seasons so that 2025/26's known outcomes can serve as a clean check. In short: every year feeds the history and the models, 2025/26 is both the newest training season and the back-test target, and 2026/27 is the one the forecast looks ahead to.

Leagues
5
England, Spain, Germany, Italy, France
Club-seasons
3,204
top flight, both eras
Promotions
439
the subgroup of interest
Matches
122k
results + odds backbone

03Feature layers

The features come in four layers. The first two make up the lean, balanced set that every model uses. The other two are a robustness layer, available from 2014 onward.

L1How dominant they were below. Points, goal and shot difference, and finishing position in the division below. Football-Data
L2Strength and setting. Pre-season Elo, the gap to the rest of the top flight, and bookmaker odds. ClubElo
L3The shape of the club. Squad age, how much of the squad stayed on, how long the manager has been there, and transfer spend. Transfermarkt, SoFIFA, FBref
L4Playing style. A picture learned straight from on-the-ball event data. Wyscout, Understat

04The panel

Each row is one club's season in the top flight, and every feature describes what the club brought into that season. For a promoted club, the form we use is its last season in the division below; for an established club, it is its previous top-flight season. Pre-season Elo is the rating as it stood around 1 August. Nothing from the season being predicted goes into the features. The strongest link between any single feature and the outcome is about 0.36, which is reassuringly modest and tells us the model is not quietly reading the answer.

We work out whether a club is promoted from its record in the tier below, rather than trusting a separate flag. That alone fixed 55 wrongly labelled early-Spanish seasons, and it leaves 439 promoted and 2,632 established club-seasons.

05Data dictionary

ColumnMeaning
promoted / incumbententered from the second / first tier (prior-tier record)
prior_ppg, prior_gd_pgprior-division points and goal difference per game
prior_shot_diff_pgprior-division shot difference per game
elo_preseasonClubElo rating on ~1 August (pre-season)
elo_gap_to_medianclub Elo − median Elo of that season's top flight
PromotionRouteAutomatic · Play-off/Other · Incumbent
target_survived1 = remained in the top flight · 0 = relegated
target_points / target_bandfinal points · finishing band (1 top … 5 bottom)
transfer_spend / squad_agesummer fees in € · mean squad age (2014+)

06Downloads

FileContentsGet
panel_primary.csv3,204 club-seasons · primary features + targetsCSV ↓
panel_enriched.csv+ squad age, continuity, spend, prior xGCSV ↓
club_season_targets.csvsurvival / points / finishing-band labelsCSV ↓
style_features.csvper-club style vectors and componentsCSV ↓
forecast_2627.json2026-27 forecast, bands, back-testJSON ↓
panel.parquetthe panel as ParquetParquet ↓

Derived from public sources; licences and attribution respected.