Football Analytics: Finding Bill James' Cipher - Part I

I didn't care about the statistics in anything else. I didn't, and don't pay any attention to the statistics of the stock market, the weather, the crime rate, the gross national product, the circulation of magazines, the ebb and flow of literacy among football fans and how many people are going to starve to death before the year 2050 if I don't start adopting them for $3.69 a month; just baseball. Now why is that? It is because baseball statistics, unlike the statistics in any other area, have acquired the powers of language.

-Bill James, 1985 Baseball Abstract.

It happens like clockwork. Whenever a match is 0-0 and one team is dominating possession or shots, the halftime commentary will invariably include a remark about how the only statistic that matters is goals. It's easy to take this as an attack on the number-crunchers, and in many cases, it probably is. But it's also entirely true, and to shrug it off as a cliche misses the point entirely.

There seems to be a perception that football statistics are entering some sort of golden age. With the proliferation of statistics sites such as WhoScored and EPL Index, not to mention the massive popularity of Opta's Twitter feeds and the Guardian's sadly-discontinued Chalkboard service, it's not hard to see why. Information is available where previously there was none.

But anyone claiming that the Moneyball revolution is underway in football is sadly mistaken. The current statistics fail (and fail utterly) at passing Bill James' language test. If a player makes two fewer tackles than average but one more interception with more completed passes, for example, we have no way of figuring out how to put those statistics into context. What we currently have are numbers, not meaning.

The basic unit of football's natural language is, of course, the goal. Goals are, as it were, the point. Teams want wins, and therefore they look to maximise goals scored and minimise goals conceded. Taking more shots, winning more challenges and completing more passes are only relevant statistics for teams if they can be converted into goals.

And, right now, we're missing that bridge. We need meaning. Instead, we have numbers. It doesn't matter how many dribbles Gokhan Tore converts per game; it matters how he contributes to his team's goal differential in each match. If there's no way of telling how those dribbles impact Hamburg, there's no way to put Tore's statistics into their proper context.

In order to progress to the point where football statistics are actually helpful to those looking to really understand the game, we'll need to find the language of football, not just the letters. Finding that bridge should be the main focus of every analytically-inclined football researcher.

That leads us to a couple of interesting questions:

  1. Is it possible?
  2. If so, what is the best method of attack?

For many, the answer to the first question will be 'no'. Some might argue that football is too difficult to analyse, computationally intractable. Some might claim that putting numbers on human endeavour is inherently futile. With respect for those with said views (and with apologies for over-simplifying a long-running argument), I would disagree.

While it's clear that football is significantly more difficult to analyse than, say, chess, or baseball, that's not enough to drive it into 'impossible' territory. Every single time we watch a play develop, we're running some sort of subconscious mental algorithm as to the likelihood of success. As we watch more games, we refine our ability to predict what will happen, with some people carrying around superior mental models to others.

Football is a complicated game, certainly, but unless it's truly chaotic (which it obviously is not), we can and will analyse it, whether that be through numbers or in our collective gut. After all, we know that on any given day, we should expect Barcelona to beat, say, Preston North End, or that Cristiano Ronaldo is more likely to score a goal than Tim Howard. There's obviously some structure in the sport, and that alone is proof that we're not looking at an impossible problem.

But we are looking at what seems to be a very difficult one. Researchers aren't going to stumble upon some sort of footballing Rosetta Stone - the language metaphor starts to fall apart when you consider that football is a sport designed to entertain rather than to communicate - and it's far from clear how to decode it. Perhaps another sport can help us. What would Bill James do?

