Interactive · out-of-time evaluations

Adding features does not close the gap

Each dot below is a real test of the survival model with a given set of features, scored only on seasons it had not seen. The score is AUROC, which you can read as how well the model tells the clubs that stayed up from the clubs that went down: 1.0 is perfect, and 0.50 is a coin toss. Watch what happens as features are added. The established line climbs and stays up, while the promoted line stays flat, barely above a coin toss.

01Survival AUROC by feature set

EVALUATION ERA
EstablishedPromoted·· coin toss (0.50)

02Our model

Each card is a feature set added to the model, with its out-of-time AUROC for both groups. Selecting one highlights it.

03Result

Normally, giving a model more to work with makes it better. Not here, and not for promoted clubs. The fullest version, with transfer spend, squad structure and expected goals, does no better on promoted clubs than the strength rating on its own, and sometimes it does worse, because a heavy model just overfits such a small group. So the gap is not about a missing feature. It is built into the problem.