Saturday, April 21st - It's the eighty-fourth minute of the sixth Clasico of the season, and Barcelona know they're dead. They're down 2-1 against arch-rivals Real Madrid at the Camp Nou, and need a win to get back into the title race. A win is not forthcoming, and they're getting desperate.
From a corner, Pedro passes back to centre back Javier Mascherano, who's standing just outside the centre circle. The former Liverpool man rumbles forwards, taking a couple of touches before spanking a shot over Iker Casillas' crossbar from all of forty yards. It would have been a spectacular goal had it gone in, but it was never, ever going in.
Wednesday, April 4th - Stamford Bridge is rocking. Chelsea hold a 2-0 aggregate lead against Benfica and look like advancing to the Champions League semifinal. The visitors are down to ten men and for a fun spell early in the second half, the Blues look like they're going to run riot. Then, the moment comes. Salomon Kalou squeezes free of his man on the left side of the area, bears down on Artur's goal, and slips in a pass to Ramires, lurking at the far post.
And then Ramires comes up with a once-in-a-lifetime miss. Whoops.
If I asked all y'all which constituted a better goalscoring chance, I'm betting that ninety percent of you will say it was the Ramires miss, and ten percent of you will be trolls. In fact, I'd argue that the Mascherano shot was not a chance at all - historical data from the Premier League suggests that shots from that range have such a low chance of going in that they might as well never be taken.
By my reckoning, that's one chance created for Salomon Kalou and Chelsea, and zero for Pedro and Barcelona. What says Opta about these chances?
Assists plus Key passes.
Ok, we know what assists mean. Key passes?
The final pass or pass-cum-shot leading to the recipient of the ball having an attempt at goal without scoring.
Since Ramires' weird little effort never got counted as a shot, his chance was never recognised, despite being such a good goalscoring opportunity that you'd expect it to go in virtually every time. Pedro, meanwhile, recorded a chance created for a pass that went to his own centre circle.
Want another example of weirdness? Let's go back to the Benfica match again, for a video I'm sure you'll recognise:
That is, of course, a great solo goal by Raul Meireles. He nicked the ball off Pablo Aimar's toes, ran half the length of the field, and buried a brilliant shot past Artur to secure Chelsea's passage to the next round. Here's Opta's take, via WhoScored:
Goal! Chelsea 2, Benfica 1. Raul Meireles (Chelsea) right footed shot from outside the box to the top right corner. Assisted by John Obi Mikel following a fast break.
John Obi Mikel got an assist (and therefore a chance created) from a header designed to clear a free kick away from his own penalty area. Meireles's goal had basically nothing to do with him, and it wasn't even a pass. However, he gets credited with a chance created anyway. Remember that Kalou did not.
Remember how I laid into Opta for mis-labelling passing volume as 'possession'? This is another example of that sort of behaviour - their definition for 'chance', which I would argue most of us would recognise as some variation of 'a situation from which a goal is significantly more likely than usual to be scored' is actually a shot. That's not right, because not all shots are created equal, and not all chances even result in shots, as we've shown above.
The fundamental flaw here is that whoever designed the chances created statistic completely failed to recognise that a chance in football is a situational event rather than a result-based one. The assumption that shot equals chances is a half-baked shortcut, and it, like the error in the way possession is calculated, creates a fundamental design flaw in the statistic.
This makes it incredibly frustrating to see chances created thrown around as though it's representative of a player's creativity. Yeah, it's great that Juan Mata leads the Premier League and all, but all it's actually saying is that Mata passes lead to shots a lot. Considering Chelsea have made a habit of taking low-percentage long-range shots when they're frustrated and that Daniel Sturridge is on the team, it's not hard to figure out why Mata might be doing well by this metric.
When we're reporting a figure as 'chances created', it had better actually mean it, because creating chances (actual chances, not Opta chances) is one of the fundamental goals of football teams. We want to know which individual players are good at it, and teams use it as an aid to making transfer decisions. It's not completely crazy to suggest that, if the rumours about Liverpool heavily targetting chance creators last summer are true, that the structural failures of the statistic are partially to blame for the failure of Damien Comolli's recruits this year. This isn't merely an oddity, like the possession error - this is big bucks.
And once again, Opta could be doing much better, by doing little bit of thinking and then a little bit of work rather than taking stupendously lazy shortcuts. When you're designing a statistic, this first thing you should be doing is figuring out a platonic version of it. Don't start with limitations on what you can and can't actually achieve - figure out what you want to see.
For chances created, here's my definition, which I'd guess most would agree with:
A chance is said to be created at the point in a passage of play where the likelihood of a goal being scored during that passage of play surpasses a pre-defined threshold (ζ).
Seems reasonable enough to me. It's based on the likelihood of scoring rather than a binary outcome of shot/no shot, and that's a far better definition of what a chance actually entails. It also comes with the added bonus of not requiring a pass - you can create a chance here with an interception, dribbling past a defender, or playing a through ball. Each is perfectly valid in actual football, and each should be recognised as a chance created by any statistic worthy of the name.
Now comes the tricky part - making it work. There are two major questions that arise from the above definition:
- How do we measure the likelihood of a goal being scored at any point during a given passage of play?
- What value do we use for ζ?
The first question fascinates me, because at its heart it's about game state. Although football is clearly a flowing series of events, it's not particularly difficult to abstract it as a Markov chain that takes ball location and defenders behind the ball as inputs. Unfortunately, as best as I can tell, nobody's tracking defensive shape adequately, so we're going to have to improvise.
had a fabulous piece a few weeks back in which they used three factors to define the 'quality' of a chance. Those three factors were the distance from the goal, the angle from centre and the number of defenders pressuring the ball. Something similar is perfectly viable with the statistics on hand, although I would use a simple goals per shot gridded heat map, adjusted by the number of defenders pressuring, rather than try to come up with an angle/distance relationship. I'd also add a fourth factor - whether or not the goalkeeper is in position to make a save - because Fernando Torres.Column
Using the Chance Quality Index (or our slightly modified version) means that we really can get a measure, if s slightly flawed one, of how likely a team is to score in a given possession. We can use an equivalent to the principle of virtual work here - if a hypothetical shot is taken at any moment in a possession, we can run the Chance Quality Index to figure out how likely it is that that player will score.
Setting ζ is an interesting problem. The common definition of 'chance' is pretty fuzzy and arbitrary, which makes ζ essentially impossible to set analytically. However, since we're trying to come up with a statistic that's both faithful to the vernacular definition rather than a shortcut, we can use traditional wisdom to point us in the right direction. If you asked enough experts (and I'm thinking managers and the like here) how many chances there are per game, you could pretty easily set ζ to match their median answer. Yes, it's subjective, but it's uniformly subjective, and introducing some subjectivity when you're defining a nebulous term like 'chance' is inevitable*.
*If this answer doesn't satisfy you, remember that the key figures you'd be examining would be 'chances created above league average' and 'chances converted above league average', in which the ζ term would cancel anyway.
An aside: You'll note that I haven't adjusted for team or personnel at here. That's not an oversight but a basic feature of sports analysis - if you adjust perfectly for a player and his teammates, you end up with zero information. When Lionel Messi's goalscoring figures are adjusted for him being Lionel Messi, you end up with an average player; so too goes for Fernando Torres. We're trying to find the difference between Messi and Torres here.
We can obviously tweak our platonic definition of chances created to take all of the above into account. Sure, we lose a little bit by having to adapt to the data we have, but we've got the shell of a viable statistic in placem and the entire framework is extensible for when more data becomes available. It's also far more faithful to the common definition of 'chances created' than Opta's.
That doesn't mean it's a straightforward project - fixing this is significantly harder than fixing the problems with possession. But it's almost certainly worthwhile despite that. We're looking for insight from the likes of Opta, and this sort of thinking would make their product infinitely more valuable (and quite a lot less misleading). It's a huge shame that given their place in the market they're doing such a poor job in advancing the state of football analytics.
Many are talking about the current fetish for statistics in football as equivalent to the Jamesian revolution in baseball. It isn't. The sabermetric revolution was powered by members of the public breaking away from the statistics fed to them by companies holding proprietary data and failing to use it in a sensible way. The search for objective data in football needs scientists, but it also needs data, and right now, we don't have it.
Until we do, there is virtually zero reason to take the numbers we're given at face value. We're presented with possession that isn't and chances that aren't*. Until that changes, the statistics we see cited so often aren't helpful in understanding the game and may even be actively hindering it. In thirty years time we'll remember 'chances created' and its kin as ill-conceived dead ends on the road to proper analysis. In the interim, please stop pretending it has any special meaning.
*That doesn't stop us buying them hook line and sinker, of course - people want to understand this stuff, and they gravitate to the perceived authorities.