Leading on from “Expected Goals, Part One“, I would like to touch on the pros and cons of xG and its use in FPL. Hopefully it produces some discussion within the community and can help give a better perspective in some areas.
xG as a predictive model
Taking all of the information from Part One, what we get is a method of showing the sum total of shots and their quality in a given period (one match, series of matches, etc). For example, if a team had an xG rating of 1.89 from all of their shots in a particular football match but only scored one goal, we can conclude that they underperformed in terms of their xG.
If the xG showed 1.02, then we can conclude it was a somewhat fair representation of the chances they were presented with. If they scored one goal from a value of 0.25 then we can conclude that they overperformed in relation to their chance(s). So can we take these findings from the xG calculator and use it to try and predict future outcomes? In essence, no. We cannot use an xG calculator as a predictive tool on its own and here is why.
As a predictive model, many have tried to show correlations of Expected Goals per 90 minutes (xG/90) and Actual Goals per 90 minutes (aG/90), using small data points and small sample sizes in their findings. Through a statistical concept known as the Linear Regression Model, it attempts to quantify the findings almost in terms of binary (1 being absolute perfection and 0 being useless) using a process called R-squared.
The data points used in many xG regression models are so small (maybe 10-60 data points) that once again, large inaccuracies can be given – from a statistical point anyway. So what does this mean? Can we use xG as a predictive model at all? Well not without first looking at factors outside of the realms of xG.
The xG calculator can only portray what is happening. In isolation, it is not a predictive tool. The calculator is not saying “Liverpool scored one goal from an xG of 4.5, hence they underperformed and will correct this next week by scoring four goals”. These kind of assumptions are made by FPL managers, not by xG calculators. If we really wish to try and use xG as a predictive tool, then we need to first abandon the xG calculator and look at a wider range of factors in play.
One particular factor to examine is the nature of the shots. How many shots were there in a 1-0 game with a rating of 0.90 xG? Perhaps they were 10 mediocre shots with an xG of 0.09 and one happened to go in. Perhaps they were two close-range chances of 0.45 and one went in. Context and clarity are essential when trying to work out a predictive model, otherwise it becomes very misleading and problematic to us who wish to try and apply it to FPL (particularly when picking players).
There are many other factors at play if we perform some meta-analysis of an xG graph. A player may be over-performing in terms of xG. For example, Harry Kane may have 30 goals in a season when his xG says he should have 25 goals based on the quality of his shots. Is this good or bad? Well it may be a case of a high conversion rate that didn’t regress to an expected mean. This is where the metric of ‘shots on target’ comes into play to help us get a better understanding of individual players’ performance and conversion rates. We can look at historical examples of a particular player’s conversion rates and whether he tends to overperform, underperform or return close to the mean. In doing so, we can gain a better idea of sustainability by looking at past conversion rates from certain players and using the data it in conjunction with what the xG calculator is telling us.
Think of it like this. We mentioned earlier that a freekick has a conversion rate between 0.05 and 0.06 (5-6%) on an xG calculator. Let us assume a 30-yard freekick yields a 5% conversion rate, or 1 in 20 chance. Yet, over the long term, I would expect Lionel Messi to perform better on 100 of these freekicks than “John” from Sunday morning 5-aside. So player quality has a huge impact; both the person taking the shot and the person between the posts attempting to save it. The xG calculator can only do its job and give us the figures based on the information it receives. If this computer cannot be used as a predictive model, then we must consult another computer to do so; the human brain.
Again, going past the realms of xG, other factors needed to be taken into account including injuries, change in personnel, additional competitions, change in morale, rotation. An xG calculator tells us what has happened in a match in terms of shot quality and the chance of converting those shots. If Man City produce an xG of 4.2 in a given game, and a key player like Kevin de Bruyne picks up an injury midweek, it will likely have a huge impact on the amount of chances created in the next game based on history. If we wish to attempt to use xG as part of a predictive model in FPL, we need to go beyond isolated xG and look at external variables that the calculator cannot (and should not have to) take into account.
For example, if we look at a team over their last six games and notice they had a high xG but low conversion rate during a tough run of fixtures, can we expect more goals in the future when their schedule improves? This is not a question that the xG calculator can answer, hence we cannot use it to predict what might happen. The xG calculator tells us what was happening without the context of 1) the playing style during this period and 2) the opposition. What we need to do as FPL managers is to go beyond this and look at the reasons why something happened or did not happen as depicted by the xG graph.
Perhaps the team played a counter-attacking game against bigger sides and suffered when playing more openly against teams of a similar stature, e.g. Wolverhampton last season. Or vice versa. As mentioned in the first paragraph of this section, we need to know a variety of other factors – the context of the game. Did a defender slip and allow a huge chance that was saved? Is it likely an error of this calibre happens again? Was there a goal that was wrongly given due to an undetected offside? Were there countless shots from outside the box due to a stubborn, deep-lying defence? Perhaps the opposition were winning 3-0 and sat back for the end of the game, allowing chances to be presented to the opposition for a period? Context is vital.
The problem with rebound goals
Somewhat linked to the point above, a fellow FPL-enthusiast (Joe Greenwood, aka Scoredlario) raised this issue that is a stumbling block in the xG calculator. Rebound goals.
As a side note, I discussed this next section with Will Timbers, a.k.a. TopMarx and had a great conversation that revealed a lot about xG in general to me. What I would like to say is that this next part depends on what you define xG as. If you view “Expected Goals” as a means of telling the tale of a football match and how many goals could have realistically been scored, then you might agree with me. If however, you view it as a means of mapping the shot quality of all of the shots in a football match, then you will probably side with TopMarx.
So, if we take a scenario where a player is one on one with the goalkeeper. The attacker shoots a few yards out and the shot is saved by the keeper and parried away. Another attacker runs onto the rebound and scores. In essence, the calculator is measuring the xG from the first attempt (that was saved) and is combining it with the xG of the second attempt (that resulted in the goal). It treats both actions of that phase as separate shots, but does not take into account that the second shot would not exist if the first attempt had been converted.
Basically, the problem here is that we have a combined xG of greater than 1.0 from a single phase of play. The one on one attempt might be for example 0.6, and the rebound might be 0.8 (sum 1.4 xG). This is impossible in reality, because we can’t score more than one goal from a single phase of play. The rebounded shot fails to exist if the first attempt is converted in the first place. In this situation, perhaps calculating the odds of missing both chances would be a better way of looking at it, rather than adding the probability of scoring both chances onto an xG graph.
Let us expand a little more on the previous example. Imagine a scenario where a big chance is saved by a goalkeeper, and two rebounded saves occur. Or a defender manages to make a goal-line clearance after the initial save. On an xG chart, from that single phase, a team might be represented as having an xG of 2.0 or higher from those chances when the latter was dependant on the former. If the first shot is scored, the second attempt doesn’t exist, ergo the third attempt doesn’t exist. From the xG graph, the team’s expected goals is misrepresented by the data in this instance. It follows, that we can’t say more than one goal should have been scored from that phase of play, only that the defending team were extremely lucky not to concede. Measuring rebounds as part of a metric like shots in the box is perfectly fine, but as part of an xG graph it is extremely problematic in my opinion. This is why we should always scrutinise and analyse the data further.
As we can see, xG is a tool by which we can see the quality of shots being taken in a match, and their probability of being scored. But as mentioned, we must first have a basic understanding of it and how it is calculated if we are to use it effectively. Just like a very detailed picture of a car depicts the car, it doesn’t explain to us how a car works. Similarly, xG shows us the detailed picture of what is happening in terms of shots and shot quality, but not necessarily the context of the match.
We as FPL managers have to go beyond this if we are to ever try and use xG as part of a predictive model. By incorporating the external variables into what we find from the xG tool, we might stand a chance of doing this.
This ‘meta-analysis’ ultimately allows us to make assumptions via that we cannot otherwise make from using the xG tool in isolation. In doing so, it may give us an edge in FPL over the long-term.