Interactive · out-of-time evaluations
Each dot below is a real test of the survival model with a given set of features, scored only on seasons it had not seen. The score is AUROC, which you can read as how well the model tells the clubs that stayed up from the clubs that went down: 1.0 is perfect, and 0.50 is a coin toss. Watch what happens as features are added. The established line climbs and stays up, while the promoted line stays flat, barely above a coin toss.
Each card is a feature set added to the model, with its out-of-time AUROC for both groups. Selecting one highlights it.
Normally, giving a model more to work with makes it better. Not here, and not for promoted clubs. The fullest version, with transfer spend, squad structure and expected goals, does no better on promoted clubs than the strength rating on its own, and sometimes it does worse, because a heavy model just overfits such a small group. So the gap is not about a missing feature. It is built into the problem.