Earlier this week we had published our probabilistic UEFA Euro 2020 forecast that combines the expertise of football modelers from four different research teams with the flexibility of machine learning. To explain which data and methods were used exactly, we have also written a working paper, now published in the arXiv.org e-Print archive.
Moreover, we take the opportunity and provide further insights that can be obtained from our forecast for the results of the group stage, that starts at the end of this week with the opening match between Italy and Turkey in Rome in Group A. More precisely, predicted probabilities for a win, draw, or loss in each of the 36 group stage matches are provided in interactive heatmaps for all groups.
Groll A, Hvattum LM, Ley C, Popp F, Schauberger G, Van Eetvelde H, Zeileis A (2021). “Hybrid Machine Learning Forecasts for the UEFA EURO 2020.” arXiv:2106.05799, arXiv.org e-Print archive. https://arxiv.org/abs/2106.05799
Three state-of-the-art statistical ranking methods for forecasting football matches are combined with several other predictors in a hybrid machine learning model. Namely an ability estimate for every team based on historic matches; an ability estimate for every team based on bookmaker consensus; average plus-minus player ratings based on their individual performances in their home clubs and national teams; and further team covariates (e.g., market value, team structure) and country-specific socio-economic factors (population, GDP). The proposed combined approach is used for learning the number of goals scored in the matches from the four previous UEFA EUROs 2004-2016 and then applied to current information to forecast the upcoming UEFA EURO 2020. Based on the resulting estimates, the tournament is simulated repeatedly and winning probabilities are obtained for all teams. A random forest model favors the current World Champion France with a winning probability of 14.8% before England (13.5%) and Spain (12.3%). Additionally, we provide survival probabilities for all teams and at all tournament stages.
Predicted match probabilities for the group stage
Using the hybrid random forest an expected number of goals is obtained for both teams in each possible match in the group stage. As there are typically more goals in the group stage compared to the knockout stage, a different expected number of goals is fitted for the two stages by including a corresponding binary dummy variable in the regression model. While the heatmap shown in our previous blog post contained the probabilities for all possible matches in the knockout stage, we complement this information here by showing different heatmaps for all groups.
The color scheme visualizes the winning probability of the team in the row over the team in the column. Light red or orange vs. dark green or blue signals low vs. high winning probabilities. The tooltips for each match in the interactive version of the graphic also print the probabilities for the match to end in a win, draw, or loss.
|Group A||Group B||Group C|
|Group D||Group E||Group F|