Sonntag, 24. Mai 2015

Backtesting Bundesliga 2014/2015

This year's Bundesliga season was a special one. Not in the sense that we saw a surprise winner, but it had some features that made it feel unique. Most of all maybe Dormund's roller-coaster favorite on rank 2, to the rock bottom of the league ending rank 17 after the first season half and then rising to rank seven that allows for qualification round to UEFA Europa League.

A similar ride took Werder Bremen, that after replacing its trainer Robin Dutt by Victor Skripnik underwent a transformation from a sure relegation team to a team that nearly qualified to the Europa League. They actually played Dortmund on the last day and had still some chance to play in Europe next season, but lost it.

The season will also be remember for its close relegation battle, the closest since years. On the last match day, no team was relegated for sure and even the complete outsider SC Paderborn was only a victory away from staying in the league.

How does all this look in numbers? Let's start with the preseason probabilities of the teams. (Those numbers are according to the new version of the algorithm which didn't exist when the season started. Hence this is back calculated and not factual preseason)


The table shows the probability for each team to end on each rank. If a cell is empty, the team never finished on the rank in any of the 50,000 simulated season that we used to generate this. While the chance to end on that rank is not theoretically really zero, for practical reasons it is. The black squares indicate the rank the team actually did finish the season.

From the start, Bayern Munich was set to win the league. They are so much better than the other teams, that they even never finished below rank 11 in any of the 50,000 tries in the simulation.

More interesting is the rank of Borussia Dortmund. They ended up rank 7. Preseason we gave this only a 2.7% chance to happen. But this was only the second most surprising thing to happen: FC Augsburg to end up on rank 5 had only a 1.5% chance according to Goalimpact. Actually, they were expected more to be engaged in a battle against relegation than in a battle for Europe.

The bad performance of HSV was the next biggest surprise. Preseason, we had only a 3.3% chance that it would be that bad. Despite changing their trainer so often that you could on this HSV season's data alone conduct a study that changing has little impact, they ended on rank 16 and will fight in play-offs against relegation just like last season.

The opposite evidence was provided by Werder Bremen. After the change to Victor Skripnik Werder rose from acute relegation risk (as predicted preseason) to a final rank 10. This was the fourth most surprising outcome given the preseason estimates. But given that the first half of the season was to large parts really really bad, just how surprising was the rescue by Skripnik? The following table shows the predicted outcome half way through the season.


Given the performance in the first half, ending up on rank 10 had only a probability of 2.4%. This Bremen miracle wasn't a small one, albeit still not in the dimensions of Augsburg's qualification to Europa League. Borussia Dortmund's race to Europe wasn't that unexpected. Actually, despite being on the bottom of the table after half the season, rank seven and eight were the most probable ones for Dortmund to end on. Apart from Bremen's winning streak and Hanover falling apart, not really many surprising things happened in the second half of the season. Nearly all teams ended close to the likely ranks.

Let's move away from ranks and look at the predicted points.


Goalimpact explained the actial points this Bundesliga season with an R² of 60%.  Deviations are randomly distributed above and below. The overall calibration seems good indicated by a regression slope close to 1. After half of the games were played, things got more settled. At that time, the final results were explained already by 82%.


But this includes actual results from the first half of the season and hence part of the correlation stems from there. How good was the second half stand-alone explained?


The R² for the second half was 46%. Well beyond assuming the same number of points like in the first half which leads to a R² of only 27%. The dot at (31; 31) is Dortmund. They did earn exactly as many points in the second half as you would expect given their strong players. Hence the qualification to Europe is hardly a surprise. The extraordinary few points in the first half of the season were the real surprise. And there they were very unlucky.

If there is a team that seems to constantly outperform Goalimpact's predictions, it is the FC Augsburg. As shown above, them entering the Europa League was the most unexpected event in this Bundesliga season. However, all over-performance was in the first half of the season. The 22 points they earned in the 2nd half were close to the low expectations of 19.7. In the first half of the season they were expected to earn only 18.5 points, but earned 27. So we feel still undecided. Is Augsburg really that good in forming a team stronger than its parts, or were they as lucky in the first half of the season as Dortmund was unlucky? Maybe a bit of both.

Thank you for bearing us so long, only one more chart. If we look at the expected distribution per rank that were predicted pre-season, we have nearly no surprises whatsoever.


Predicting how many points one would need to stay in the league, turn's out to be very easy even preseason. We predicted rank 16 to have 34.5 points and it turned out to be 35. We predicted rank one to finish with 74.6 points and Bayern earned 79.

This is important when considering if in a particular game a draw might be sufficient or if a team should play for a win. If the point distribution is so predictable, this might matter even early in the season.

Summary

There have been quite some surprises in the Bundesliga, especially in the first half of the season. Dortmund having the fewest number of points after 17 matches was very very unlikely - as was Augsburg having 4th most points . Their both subsequent qualifications to Europe were, in contrast, expected. The resurrection of Werder Bremen was the biggest surprise in the second half.

Dienstag, 5. Mai 2015

Reader's Notice: Publication of Goalimpact

We are happy to announce that the results of the new Goalimpact algorithm are published for the Premier League and for the Bundesliga on our partner sides

Especially on PremierInsider there might still be some missing charts. We are working on it.

Have fun while browsing!

Montag, 4. Mai 2015

How good is Red Bull's Stefan Ilsanker?

One of the most surprising results of the latest top-50 list of football players, was the high ranking of Stefan Ilsanker. He was rated despite playing in the mediocre Austrian Bundesliga, albeit at the league dominating Red Bull Salzburg. So the question of this post is: How much Red Bull is in Ilsanker's Goalimpact?

One way to look at Goalimpact, is to think of it how good a team plays with a player compared to the team without. In this case, the Goalimpact is calculated by the difference in goals scored, the difference in goals conceded and the average Goalimpact of the replacements. This is only simplified, because Goalimpact corrects for other factors such as the home field advantage, too, but it is a good starting point that is reasonably good if calculated over many games.

The following chart shows Red Bull Salzburg's goal difference with and without Ilsanker starting from July 2012 until today.


As you can see, with Ilsanker Salzburg had an average goal difference of 1.96. Without him only of 1.56. So there is a strong improvement of results of 0.4 goal difference per game if Ilsanker plays. To put this in perspective. If Red Bull was to play all league matches with Ilsanker, Red Bull would be expected to end up with a total goal difference of +71. A season without him would be less dominating and ending with a goal difference of 'only' +56.

This is only an indication that Ilsanker does improve the team significantly. As argued before, there might be other factors that correlate with Ilsanker playing that create this improvement in goal difference. One example would, e.g., be the quality of opposition. If Ilsanker would only play against bad opposition then Salzburg's goal difference would be good because of that rather than Ilsanker's brilliance. However, given that Goalimpact corrects for this it may not be the case here.

Another caveat of our analysis is that we showed that Ilsanker adds goal difference to the team, but maybe the team as such is overvalued? Let's perform another test trying to address both points. If we redo the analysis, but only on the UEFA games of Salzburg, we can see if they are strong there, too. Additionally, we can assume that Salzburg will play its best players in European matches.

In total, we have 28 matches of Salzburg on European level since July 2012. In the 2036 minutes with Ilsanker, Salzburg had a goal difference of +25 (51 to 26). In the 568 minutes without him it was +5 (12:7). In goal difference per minute this makes 1.14 with him and 0.82 without him. Even in this subsample he added 0.32 of goal difference. Slightly less than on the whole sample, but still a handsome adder.

Since Salzburg's goal difference is positive even in the European matches, it looks like they are not per se overvalued. However, they didn't meet very big teams too often, so it is difficult to tell for sure, but they met

  • Fenerbahce: one 1:1 draw and a  1:3 defeat. Both with Ilsanker
  • AFC Ajax: two victories. 3:0 and 3:1. Both with Ilsanker
  • FC Basel: one 0:0 draw and a 1:2 defeat. Both with Ilsanker
  • Dinamo Zagreb: two victories. 4:2 and 5:1. Both with Ilsanker
  • Celtic FC: one 2:2 draw and a 3:1 win. Both with Ilsanker 
  • Villarreal CF: two defeats. A 1:2 on road with Ilsanker and a 1:3 without at home.

All defeats came against teams that were lower ranked than they. This indicates that there might be an overvaluation of Red Bull Salzburg, but we are talking very small N now. Other team ratings, rank Salzburg considerably lower. So there is an indication, that Salzburg is overvalued and, in turn, Ilsanker is. but the uncertainty is significant. However, there is strong evidence that Ilsanker is pivotal to Salzburg's performance and hence sticks out in the team.

Let's look forward to the next European season and Salzburg's next try to play Champions League. We will get a clearer view then on where they stand. Hopefully they'll play with Ilsanker.

Samstag, 2. Mai 2015

How fast does Goalimpact converge?

If you saw only few games of a player, it is hard to tell if he is good or not. If you saw all games of a player in his career after he retires, you will have a pretty clear picture if he was any good. In this article we test after how many games Goalimpact is giving a good estimate of the player's ability.

Before we can test the algorithm, we need an estimate of the true ability of the player. We do this, by restricting the sample on players that finished their career already. For those we proxy the true skill by their career end Peak Goalimpact. To eliminate players where this isn't a good proxy for true skill, we further restrict the sample to players that had at least 20000 minutes of playing time at the end of the career. The average player remaining in the sample had 32,000 minutes playing time at career end.

Now we will compare the predicted PeakGI after a limited number of minutes, early in the career, with the career end PeakGI. We quantify the quality of prediction by R² in the following table.

Minutes Field Player Goalkeeper
1000 8.30% 5.05%
2000 15.20% 8.87%
4000 28.70% 19.99%
8000 50.87% 40.88%

So after 1000 minutes of a field player, slightly more than ten games, the then estimated PeakGI explains 8.3% of the variance of the career end PeakGI. That is still a pretty uncertain prediction, but given that this is based on only 1000 minutes, the information content is surprisingly high. Goalimpact actually does separate good and less good players after just 10 games to some extend.

After twice as many minutes the explained variance is already more than 15%. This is a very good result because 2000 minutes is just a bot more than half a season of input. So very early in a players career Goalimpact shows his discriminatory power.

Another doubling of the number of observed minutes and the R² raises to nearly 30%. And it becomes more than 50% after just 8000 minutes or about two seasons worth of observations. Many players will not be even 22 by then. In fact, if we further restrict the sample to players that reached 8000 minutes of observation before turning 22 years, the R² is still an outstanding 34%.


For goalkeepers the results are consistently lower, but they stay in the same order of magnitude. The prediction quality for goalkeepers with 8000 minutes of playing time is still a very good 40%.

Summary

We showed that the PeakGI early in the career is predictor for the future career path of the player. After as few as 10 games, we already found some predictive power. After 8000 minutes, a large part of the true skill difference between players has been identified - even for very young players. For goalkeepers results are consistently lower, but in the same order of magnitude.

Samstag, 25. April 2015

Young Talent Challenge

"Predictions are difficult, especially about the future." coined Nils Bohr once famously. The equivalent in football is the scouting of future top players. Goalimpact is designed to predict future performance - as opposed to describing past performance. Hence we would assume that players identified as future top players by Goalimpact have a higher than average chance of indeed become a football star.

The holy grail of identifying talents is identifying them before everybody did and hence before they become expensive. In this post, we want to start a series of tracking young players on their quest to become a top player. Some of the players are known talents, some are yet to be uncovered. As proxy for if a player is known or not, we use the market values as published by Transfermarkt. Any player with a value above 1.5M€ is declared to be known (although maybe still very cheap), any player below or equal to that number is defined as unknown. We will not publish the names of unknown players, but we assign numbers to them to be able to track them in future post.

Ok, here is the list of the most talented players according to Goalimpact that are yet to turn 20. If a player is shown as 20, then this is a rounded number.

Player Transfermarkt Goalimpact PeakGI Age
Julian Brandt, Bayer Leverkusen 5,000,000 € 132.5 171.3 19
Max Meyer, FC Schalke 04 15,000,000 € 122.9 154.8 20
Anonymous Talent 1 300,000 € 99.6 145.9 18
Anonymous Talent 2 1,000,000 € 96.9 145.3 18
Anonymous Talent 3 1,000,000 € 98.4 144.8 18
Richairo Živković, Jong Ajax 4,000,000 € 101.7 144.3 19
Rubén Neves, FC Porto 5,000,000 € 95.7 144.0 18
Anonymous Talent 4 500,000 € 96.2 143.5 18
Yoeri Tielemans, RSC Anderlecht 12,000,000 € 95.4 143.3 18
Andrija Živković, Partizan 3,000,000 € 101.9 142.8 19
Anonymous Talent 5 350,000 € 102.2 142.2 19
Anonymous Talent 6 50,000 € 111.5 141.8 20
Anonymous Talent 7 500,000 € 107.0 141.5 19
Breel-Donald Embolo, FC Basel 8,000,000 € 93.9 141.4 18
Anonymous Talent 8 500,000 € 97.8 141.3 19
Anonymous Talent 9 750,000 € 103.0 141.1 19
Anonymous Talent 10 200,000 € 107.3 140.7 19
Anonymous Talent 11 50,000 € 92.0 140.4 18
Anonymous Talent 12 125,000 € 96.9 140.1 19
Anonymous Talent 13 125,000 € 101.5 139.7 19
Anonymous Talent 14 250,000 € 105.1 139.4 19
Teddy Bishop, Ipswich Town 2,500,000 € 98.3 139.3 19
Pierre-Emile Højbjerg, FC Augsburg 4,000,000 € 108.5 139.0 20
Anonymous Talent 15 1,000,000 € 91.6 138.8 18
Anonymous Talent 16 0 € 95.7 138.8 19

Most staggering result is Julian Brandt that is rated as future football god. Between him and the May Meyer on number two is already a huge gap. Given that talent, I wonder how long he will be playing at Leverkusen. Big teams must be queuing already to get him on board. On rank three is already an unknown player. He is playing in the second division of a non Top5 country.

We will post an update of this list maybe once per quarter tracking the players. Let's see how far we get. Buying all unknown players costs us 6.7M€. Let's see how this value changes over time.


Julian Brandt still just a demigod, but soon will be grown up.

Sonntag, 12. April 2015

How to read a Goalimpact Chart?

Following the updated algorithm and the latest Top-50 players list, we received a lot of questions to individual players and on how to read the chart. Let's answer both in one post.

Zlatan Ibrahimovic

Easily the most requested player following the Top50 post. Probably, because he was not on the list. That doesn't mean that Goalimpact isn't rating him, he is just not in the Top50. Main reason is that he dropped out of the list due to aging. Here is the chart.


The thick line shows the Goalimpact at that time. This is the original estimate not using any future games. Clearly, with hindsight, we may give him another rating, because his team outperformed or underperformed original expectations. The expectations on how good a Zlatan will be in future, are derived from the Peak Goalimpact (thin dashed line) and the aging curve of field players.

If the team results of all of Zlatan's games are better than expected by his Goalimpact, the PeakGI line will raise. It will do that whenever the player overachieves original expectations, independent if he passed his peak already or not. In the recent year, for example, his PeakGI raised because his performance dropped less than we would have expected given the typical aging effect in football. Since 2011, the peak rose nearly 15 points. Without that raise, his Goalimpact would have been 15 points lower than it is today, so 140 instead of 154 points.

To summarize, Zlatan is a world-class player that is gradually weakening due to aging. But this aging process is much slower with him than with the average football player. His current performance still is outstanding at 154.

Theo Walcott

The high rating of Theo Walcott raised some virtual eyebrows on Twitter. So this is how his chart looks like:

Theo was rated "world-class one day" since he was very young. Given his young age, he was still rated as "ok for Premier League, but not outstanding". But this is his actual performance and not his "potential performance" or "talent", which both were rated as very high. Walcott delivered as expected until 2012. Therefore his PeakGI was more or less stable and the Goalimpact rose along expectations given the aging curve. From 2012 on, the team strongly outperformed expectations when he played and hence his PeakGI rose consistently until it reached a level of about 190 when match outcome and Goalimpact were in agreement.

Back to the Twitter question of why Walcott is rated that good: Because the team consistently outperformed prior (already very high) expectations with him on the field.

Cristiano Ronaldo

"Cristiano Ronaldo is only 21st? Lol, he would walk into any team of the world" was a typical comment. We understand the critic because he is one of the best players of the world (also in according to Goalimpact) and many see him as the best (also Goalimpact before the change of the algorithm). So why did he drop in the new algorithm?


He actually didn't drop in the new algorithm, he maintained the world-class level he obtained in his Manchester United times seamlessly at Real Madrid. Real's performance with him on the field was fully in line with these very high expectations and thus there was no need to raise or lower the rating.

"Yes, ok, but he was rated higher in the old version". True, the reason is that Goalimpact adjusts expectations for all players of the field. The new version rates Karim Benzema higher than the old. This, in turn, raises the needed outcome for all his team mates to increase their respective scores. Hence, with Benzema rated higher, his team mates including Ronaldo were rated lower. That said, Ronaldo is rated as absolute world-class and would also according to Goalimpact walk easily into any team of the world.

Karim Benzema

So? And why is Benzema that good?


We can't answer that. Goalimpact just measures how good a team plays with the player on the field. It can't tell why a player is adding to the team success., because it does not even look on what he is doing. It just relates the team success to the player being present on the field. If we define a "good player" as a player that make his team have a very good goal difference, then Benzema is extraordinarily good.


Top50 Football Players - April 2015 Edition

It is nearly one year ago that we published the last "Top50" list and hence we should expect many changes. But as we also just updated the algorithm there are even more changes and the list is a bit difficult to digest. Here are the main findings so far

  • Most players on this list have been rated high one year ago in the previous version of the algorithm already. On rank 17, Georginio Wijnaldum is the first player that hasn't been Top100 before already.
  • Compared to the old algorithm, the players seem distribute now stronger over leagues and teams. Previously, it was very much focused on the big teams.
  • In the old list, there has been a few players with exceptionally high Goalimpact. The new list doesn't have skill gaps apart of the one that separates Thomas Müller from the rest of the world. This results in many players having a higher Goalimpact and only few players a lower in this list. Overall, the rating scale didn't change. The average is still 100.
  • Despite Austrian leagues having suffered a lot from the new league adjust in the algorithm, two Red Bull Salzburg players made it to this list. Three, if you consider Kampl's breed.
  • Can somebody please buy Vladimir Stojković for 1M€ and let him play in a league where we can watch him to access if he is still really that good?
  • Former Top50-Players no longer that dropped out of the Top100. That is partly due to the change in algorithm, partly due to age or lower performance: 
    • Philipp Lahm (156.94)
    • Xabi Alonso (156.52)
    • Vincent Kompany (154.17)
    • Gianluigi Buffon (153.86)
    • Scott Brown (152.23)
    • John Terry (150.84)
    • Patrice Evra (145.12)
    • Wesley Sneijder (142.09)
    • Ashley Cole (141.81)

Where curious to hear your thoughts.

Rank Player Team Goalimpact Age PeakGI Nationality Previous
Rank
GI Diff
1 Thomas Müller Bayern München 217.30 25.58 218.49 Deutschland 8 +44
2 Mesut Özil Arsenal FC 201.95 26.50 202.86 Deutschland 15 +38
3 Karim Benzema Real Madrid 199.45 27.25 201.99 Frankreich 18 +41
4 Lionel Messi FC Barcelona 199.33 27.75 202.83 Argentinien 4 +17
5 Robert Lewandowski Bayern München 198.36 26.58 199.57 Polen 37 +53
6 Cesc Fàbregas Chelsea FC 196.06 27.92 199.83 Spanien 5 +17
7 Theo Walcott Arsenal FC 192.75 26.08 192.83 England 57 +52
8 Manuel Neuer Bayern München 191.69 29.00 192.79 Deutschland 11 +22
9 Luiz Gustavo VfL Wolfsburg 190.94 27.67 194.28 Brasilien 55 +49
10 Marcelo Real Madrid 189.24 26.92 190.99 Brasilien 29 +39
11 Busquets FC Barcelona 187.84 26.67 189.24 Spanien 26 +35
12 Fraser Forster Southampton FC 185.76 27.00 187.97 England 65 +46
13 Pedro FC Barcelona 185.10 27.67 188.41 Spanien 49 +42
14 Mario Götze Bayern München 183.74 22.83 193.00 Deutschland 36 +37
15 Toby Alderweireld Southampton FC 183.06 26.08 183.22 Belgien 54 +41
16 Piqué FC Barcelona 182.54 28.17 186.72 Spanien 22 +27
17 Georginio Wijnaldum PSV Eindhoven 182.11 24.42 186.33 Niederlande 185 +53
18 Kevin Kampl Borussia Dortmund 182.06 24.50 186.04 Slowenien 70 +43
19 Stefan Ilsanker RB Salzburg 181.49 25.83 181.83 Österreich 106 +47
20 Jeremain Lens Dinamo Kiev 181.21 27.33 183.88 Niederlande 66 +41
21 Cristiano Ronaldo Real Madrid 180.74 30.17 188.42 Portugal 1 -16
22 Gaël Clichy Manchester City 180.69 29.67 187.05 Frankreich 21 +25
23 Andriy Pyatov Shakhtar Donetsk 179.64 30.75 179.77 Ukraine 69 +40
24 Antonio Valencia Manchester United 178.31 29.67 184.63 Ecuador 310 +55
25 Wayne Rooney Manchester United 178.19 29.42 184.20 England 9 +6
26 Mats Hummels Borussia Dortmund 178.15 26.25 178.73 Deutschland 23 +23
27 Andreas Ulmer RB Salzburg 178.01 29.42 183.99 Österreich 61 +37
28 Ángel Di María Manchester United 177.48 27.17 179.70 Argentinien 35 +30
29 Bastian Schweinsteiger Bayern München 177.41 30.67 188.08 Deutschland 3 -6
30 Eden Hazard Chelsea FC 177.36 24.25 181.99 Belgien 122 +45
31 Arturo Vidal Juventus 177.05 27.83 180.73 Chile 48 +34
32 Nani Sporting CP 176.99 28.33 181.47 Portugal 117 +44
33 Arjen Robben Bayern München 176.72 31.17 190.43 Niederlande 19 +19
34 Petr Čech Chelsea FC 175.97 32.83 179.86 Tschechien 17 +17
35 Daniel Agger Bröndby IF 175.88 30.33 184.44 Dänemark 182 +47
36 Gregory van der Wiel Paris Saint-Germain 175.85 27.17 178.14 Niederlande 27 +24
37 Jonny Evans Manchester United 175.51 27.25 177.97 Nordirland 213 +48
38 Douglas Dinamo Moskva 175.01 27.25 177.42 Niederlande 125 +43
39 Toni Kroos Real Madrid 174.34 25.25 176.33 Deutschland 68 +35
40 Salomon Kalou Hertha BSC 172.43 29.67 178.75 Elfenbeinküste 50 +30
41 Joe Hart Manchester City 172.10 27.92 173.80 England 132 +40
42 Vladimir Stojković Maccabi Haifa 171.99 31.67 173.41 Serbien 283 +48
43 Willian Chelsea FC 171.32 26.67 172.58 Brasilien 58 +30
44 Kwadwo Asamoah Juventus 170.90 26.33 171.51 Ghana 139 +40
45 Ezequiel Garay Zenit St. Petersburg 170.88 28.50 175.51 Argentinien 124 +39
46 Alexis Sánchez Arsenal FC 170.69 26.25 171.24 Chile 75 +33
47 Alex Song West Ham United 170.69 27.58 173.77 Kamerun 64 +30
48 Jérôme Boateng Bayern München 169.29 26.58 170.43 Deutschland 33 +20
49 Danijel Subašić AS Monaco 169.06 30.42 169.38 Kroatien 340 +47
50 Javier Mascherano FC Barcelona 168.94 30.83 180.47 Argentinien 14 +4


Georginio Wijnaldum picked up in summer 2011 when he
moved to PSV. Since then, it has been a continues increase
to world-class level. Not only in Goalimpact terms, but also
shown by a third place in the World Cup and the this years' 
victory of the Dutch league.