What are Expected Goals (xG)?
Expected Goals, or xG, are the number of goals a player or team should have scored when considering the number and type of chances they had in a match. It is a way of using statistics to provide an objective view to common commentaries such as: ”He shouldn't miss that!” "He's got to score those chances!" "He should have had a hat-trick!”
Goals in football are rare events, with just over 2.5 goals scored on average per game. Therefore, the historical number of goals does not provide a large enough sample to predict the outcome of a match. This means that shots on target and total number of shots are now being used as the next closest stats to predict number of goals. However, not all shots have the same likelihood of ending up in the back of the net.
This is where xG comes into play. Expected Goals uses various characteristics of the shots being taken together with historical data of such types of shots to predict the likelihood of a specific shot being scored. Since xG is simply an averaged probability of a shot being scored, a team or player may outperform or underperform their xG value. This means that they could be scoring chances that the average player would miss or that they could be missing chances that are often scored.
xG is often used to analyse various scenarios:
To predict the score of an upcoming match using historical data of the teams involved.
Assess a team’s or player’s “true” performance on a match or season, regardless of their short-term form or one-off actions on a pitch. It provides a data point on the number and quality of chances being created regardless of the final result.
Identify performing players in underperforming teams, or those who receive less playing minutes, by assessing which ones are more effective than the quality of their chances they receive would suggest.
Understand the defensive performance of a team by assessing how effectively are they preventing the opponent team from scoring their chances.
Origin of the ExpectedGoals Model
In April 2012, Advanced Data Analyst Sam Green from sport statistics company Opta first explained his innovative approach to assessing the performance of Premier League goalscorers, inspired by similar models being used in American sports. However, it was not until the beginning of the 2017/18 season when BBC’s Match of The Day debut their use of xG by their popular football pundits to make xG a focal topic of conversation by many football fans.
Over the years, Opta has collected numerous data points of in-game actions in all of the top football leagues. When creating the xG model, Sam Green and the Opta team analysed more than 300,000 shots and a number of different variables using Opta’s on-ball event data, such as angle of the shot, assist type, shot location, the in-game situation, the proximity of opposition defenders and distance from goal. They were then able to assign an xG value, usually as a percentage, to every goal attempt and determine how good a particular type of chance is. As new matches are played new data is collected to continuously refine the xG model.
There is no one specific model to calculate xG. When looking at xG it is important to consider that the xG value would depend on the factors that the analyst creating the xG model wants to incorporate in the calculations. Since its release to the public, the xG theory raised considerable attention in the analytics community, with many enthusiasts working and adjusting the model in their own ways in an attempt to perfect it. This means there are now several different xG models out there, each of them considering different factors. Some would consider whether it was a goal scored with their feet or with their head, other consider the situation that led to the shot and so on, but the final prediction each model outputs have shown to only vary slightly across different models.
How is xG calculated?
Opta’s xG model is based on the fact that the most basic requirement to score goals is to take shots. However, not all strikers score goals from the same number of shots. As Sam Green identified, in the 2011/12 season Van Persie only needed 5.4 shots to score a goal, while Luis Suarez took 13.8 shots for each goal he scored. However, they both shot the same number of times per game they played.
This is why Opta decided to look deeper into the quality of chances each striker received by adding the average location from which each shots was taken. However, they soon realized that location on its own was not enough. A penalty spot chance could come from a penalty kick, a header from a corner or a 1 on 1 against the goalkeeper, each with a very different likelihood of ending up in a goal. That is why Opta decided to incorporate additional data points to the model. Unfortunately, the exact model with all the factors considered by Opta has not been made public but a number of analyst have attempted to replicate or improve the model since its first release.
The xG model was designed to return an xG value for each player, team or chance depending on the dimension that the data is being analysed in: a full season, a particular match, a specific half in a game or group of goal attempts. Let’s say a player like Harry Kane takes 100 shots from chances that, based on historical Premier League data, have a probability of being scored of 0.202 (or 20.2%). Kane's xG value would be 20 expected goals scored (100 shots x 0.202). This xG number would contain an average of some ‘big scoring chances’ Kane took, such as penalties with 0.783xG, other non-penalty shots inside the box with varying xG values such as 0.387xG and maybe even shots outside the box with an 0.036xG value. The models attempts to balance the number of shots a player takes with the quality of these chances. For example, a player may get himself into very dangerous attacking positions inside the box in 23 occasions with high xG value and score the same number of goals than a player that continuously tries his luck from outside the box with 81 shots attempts that have a lower xG value.
Once an xG value has been calculated, a player or team’s performance can be evaluated on whether they are over or under-performing such value. In the above example, Harry Kane may actually score 25 goals during the full season, 5 goals above his 20 xG value, suggesting that his ability of converting chances is above-average and he can find the net in difficult scoring situations. Similarly, a player with a 20 xG value who has scored 15 goals suggests that he is missing chances that he probably should have scored.
Opta took xG a step further and assessed the impact the player had to a specific chance using their shot quality. They did so by factoring into the xG calculation the propensity to hit the target a shot taken by the player has and then comparing the former xG(Overall) value against this new xG(On Target) one. Their analysis showed that at the time Van der Vaart’s shooting saw his xG increase from 6.9xG to 10.3xG(On Target), suggesting that the type of shots he took were of higher quality than the average when xG was calculated before he took the shot. xG(OT) when compared to actual goals may also indicate how much a player was affected by the quality of goalkeeping he had to face. In the same season, Mikel Arteta scored 7 goals with just 3.5xG(OT) suggesting he got ‘luckier’ in front of goal as his shooting quality should have only given him just over 3 goals.
xG(OT) can be used to assess goalkeeping quality when used in reverse. Since it only takes into consideration shots on target, a keeper’s participation in these sort of chances is crucial to the final outcome of the play. De Gea conceding 22 goals with an 27xG(OT) suggests that he has blocked goals in situation were they are normally conceded.
Why are Expected Goals important in today's football?
Luck and randomness influences results in football more often than any other sports. We have all seem teams being dominated throughout a match and manage to score a last minute winning goal while having a lower number of chances than their opposition. But how sustainable is that? We have also seen world class strikers become out-of-form and spend a few games without seeing the back of the net. Is the player not taking advantage of the chances being provided by his teammates? xG allows us to assess the process over the results of a match, or performance of a player or team, by rating the quality of chances instead of the actual outcome.
The most used example to explain xG’s efficiency is the Juventus season of 2015/16. Juventus only won 3 out of their first 10 games but the difference between their actual goals and xG was considerably high. This meant that the had the chances but were not converting them, suggesting that their negative run of results might not last if they just get a bit luckier in front of goal. Sacking manager Massimo Allegri could have been a mistake, since after match day 12 their luck changed and ended up winning the league title with 9 games spare.
xG gives us a more accurate way of predicting match outcomes than by simply using individual stats. In the Premier League, only 71.6% of teams that had the most shots won the fixture, while close to 81% of teams that obtain a higher xG score win games. It eliminates historical assumptions that popular tradition in football has created and provides a statistically relevant point of argument to whether the performance of a player or team is above or below the average given a number of historical data points.
When using expected goals to see which players are hitting the target more or less than the numbers suggest they should, teams can scout promising prolific goalscorers if they consistently score more goals than the quality of chances they get. On the other hand if a player surpasses his expected goals for a few games but has no history of doing so in the past, it might come down to his form and luck rather than goalscoring talent, and he might struggle to sustain that over a long period of time.
Limitations of the Expected Goals model
The xG model is only as good as the factors being input into its calculations. These data inputs are limited by the data we possess today from companies such as Opta. Other factors, such as shot power, curl or dip on the shot or whether the goalkeeper is unsighted or off balance might not be considered in most xG models out there. Due to model being based on averages, the random nature of a football match and the rarity of goals in the sport makes it almost impossible to consider with enough statistical significance all historical factors that can cause a goal to be scored. xG should be used as indicative and supportive information for decision making purposes and generating opinions rather than a finite answer to the performance of a team or player.
As the model’s creator Sam Green puts it: “a system like this will also fail to predict a high scoring game. Since it is based on averages and with around half of matches featuring fewer than 2.5 goals, this is to be expected”. We also need to consider that a shot taken by a Manchester United striker should have a higher xG than one taken by a Stoke City player, suggesting that on average Man Utd would outperform their xG on a chance by chance basis while Stoke City would underperform it if the xG is calculated using averages from all English teams' shot history.
Criticism and the Future of xG models
The recent misuse of Expected Goals as a analysis metric during pundit commentary has encouraged numerous criticism. A team may score one or two difficult chances early in a game and sit back for the remaining of the 90 minutes, allowing their opponents to take many shots from different positions, thus increasing the opponents xG. One could then claim that the losing team achieved a higher xG therefore deserves the win. This is why xG should always be taken with additional context of the game before creating a verdict. Statistics can just tell us what happened in a game but a wider view is necessary to show you how it happened and give you a clearer idea on what’s yet to come. Certain in-game actions by players cannot be measured with a statistical model today, such as the ability of a defender in getting in front of a shot attempt despite never touching the ball.
There is also a strong resistance from the football community to the use of data. Football is a traditional and emotional sport by nature, with experience and accepted wisdom dominating people’s opinions. Most fans see the use of statistics as intrusive and challenging their popular and historic knowledge of “the beautiful game”. After experiencing their team lose, most of them are not interested in listening to television pundits discuss how their team performed against their expected goals. Despite analytics having plenty to offer to football performance analysis, there are still doubters. xG’s debut in Match of the Day shaked social media with instant mentions of “stat nerds” and claims that the numbers in football are “pointless” and “bollocks”. However, it has been made clear by Opta that xG is not intended to ever replace scouts and pundits but simply aid them in their analysis of a game.
Despite all this resistance and criticism by some pundits and football fans to accept this new era of football analysis, Opta and various sport analysts continue to evolve the use of statistics to analyse performance in numerous areas in football. Models such as xG are the first round of statistical systems and will soon be followed by upcoming ones such as Defensive Coverage, which will assess tackles, blocks, interceptions, man-marking and clearances. Football’s data revolution has started and will continue to see developments every season.