If you’re reading this article, you’ve probably already seen Jeff Stelling, host of Sky Sports’ Soccer Saturday, launching an attack on the new en vogue stat in football, ‘expected goals’.
Stelling, much to the pleasure of his panel guests, criticised Arsène Wenger’s reference to the stat after a match: “He’s the first person I’ve ever heard to take any notice of expected goals. It has to be the most useless stat in the history of football.”
If you haven’t watched the video, it’s a real treat:
Jeff Stelling shared his thoughts over the 'Expected Goals' statistic at the weekend ?? pic.twitter.com/SP2MgMuuoQ
— Soccer AM (@SoccerAM) November 21, 2017
To be fair to Stelling, he’s done his job. The clip has been widely shared, derided, and applauded since, getting the show more attention than a balanced take might have. We can ignore the hypocrisy of Sky Sports’ own blog using xG, or their transfer window coverage using stats from Football Manager. Stelling, like Barry Glendenning in his support of him or Andy Dawson in his dismissal of all stats, is hardly trying to make a coherent argument.
In the aftermath, though, some legitimately interesting questions about xG have been raised in a less polarised fashion, by journalists and fans alike who have started to see some of the uses of xG while also having fair questions about its limitations.
It’s often claimed by people outside the analytics community that those within it are xG evangelists who can see no issues with it. But the truth is that all of its proponents have problems with it, and even more damaging criticisms than the banal devil’s advocacy of Old Men Shouting at Clouds.
To draw a more mainstream football analogue, who do you think knows more about the strengths and weaknesses of zonal marking: someone who vaguely understands it, or someone who has coached it?
So, let’s attempt to answer some of the most common questions about xG.
What’s the point?
The purpose of an expected goals model is to estimate the quality of the chance that a team or player gets, in its most basic form with information about the location of the shot. It allows us to compare chances across players and teams, giving us a baseline expectation for how difficult it was. That is nothing new to people who have heard of it, but isn’t a particularly satisfying answer.
It’s what this allows us to do that is far more convincing: xG is better at predicting future goals than shots or actual goal numbers, which is especially useful when we are trying to figure out how good teams and players are at scoring goals.
So, xG is not to predict what should have happened in a match?
The truth is, the best use case for xG isn’t single matches. The beauty of football is often in its randomness — the team who has the better chances will in theory win the match more often, but not always.
Sure, it is useful to look at for a match, as it gives you more insight than shots, possession or other stats may do. No one who has created an xG model, though, will tell you that it perfectly summarises the story of a match.
Okay, so why doesn’t xG account for who takes the shot?
This is a really good question, and one that leads us into some of the genuine insights about football that xG can provide.
The truth is, it’s not impossible to introduce the player, or their historical conversion rate, as a variable into an expected goals model. It does actually improve predictiveness, but only by a small amount.
In other words, shot location tells you a lot about whether a chance will be a goal, whereas the identity of the player taking it tells you a little bit.
Elite finishers nonetheless exist, and can most easily be identified as those who tend to over-achieve their xG for a sustained period of time. Introducing player identity makes us a little bit better at predicting a goal, but makes the chances less comparable and so is probably not worth it unless you are betting and need the extra predictiveness.
Wait, so expected goals isn’t about predicting as perfectly as possible? Doesn’t that make the name stupid?
A quirky paradox of expected goals models is that if they were perfect at predicting goals, they would be entirely useless. We want comparability and repeatability, which is why it’s so important that xG is a better predictor of future goals than goals.
I personally think that, like Churchill on democracy, ‘expected goals’ is the worst name, apart from all the alternatives.
I’ve written before about how I prefer the term ‘chance quality’, but the issue is that this doesn’t scale from the singular to the plural easily. Which of these sounds more intuitive?
‘Ten shots of 0.5 expected goals equals a total of 5 expected goals.’
‘Ten shots of 0.5 chance quality equals a total of 5 chance quality.’
So teams or players who underachieve their xG are just ‘unlucky’?
This question causes some disagreement in the analytics community. The errors of an expected goals model are not due to just luck, where we mean the streakiness of finishing, but also due to the fact that there are factors they do not account for. One example is the amount of defenders in between the shot and the goal, which is coded by some data providers such as Stratagem, but not by Opta.
In general, it is best to think of this in terms of likelihood. If a player or team is going through a five-game rough patch in terms of goals scored, but is getting good chances, they are more likely to be ‘unlucky’ than if this continues over a two-season period.
Other factors, such as missing variables, or player and team psychology — ‘confidence’ is the pundits’ descriptor of choice, and something I have tried to test before — are harder to nail down.
Ok, I think I get it. Is xG as good as it gets?
While xG is the poster child of football analytics for the moment, there is — in my opinion — much more exciting stuff out there and in development, especially when it comes to tactics. For a taster, read this from Euan Dewer and Mark Thompson on Manchester City’s tactics under Pep Guardiola.
‘Expected goals’ can provide lots of insight, and is quite obviously the opposite of the ‘most useless stat in football’ (‘total distance run’, by the way). It is strongest in the hands of writers, coaches, and pundits who understand its strengths and weaknesses. The reality is that no single stat can encapsulate football, and that is what makes it the best sport in the world.