Everyone's writing their previews right now, so I had the bright idea to go down a completely different path that I may or may not be able to finish before the season actually arrives. These are the perils of having just a week to plan things out!
Did you know that you can predict team goals scored and conceded simply based on the previous year's results? It should be obvious, really - strong attacking teams don't go hurtling back towards league average like they're Everton incarnate, and defensive powerhouses don't lose their backbone overnight. I grabbed data from the past few Premier League seasons (2004/2005 up to last year's) and decided to try my hand at a quick correlation analysis, whereby I put data into pairs (e.g. goals scored above average in year one and then goals scored above average the next year) and tell the computer to do my work for me.
A correlation coefficient r describes the linear dependence between two variables. In our case we're trying to see how strongly last year influences the next in terms of goal scoring, defence, and ultimately goal differential. I took goals scored/conceded relative to the average of the 17 teams that remained in the league rather than including the relegated set or using raw goal totals; this was to take into account environmental changes - higher scoring league, etc. - without having to do a bunch more work on my part. Let's take a look at what we come up with.
Figure 1: Goals scored above average (GSAA) year Y vs. year Y+1; r = 0.77.
Figure 2: Goals conceded below average (GCBA) year Y vs. year Y+1; r = 0.72
Those are pretty good results, and from this we can build a quick, basic sketch of what the league table might look like. Our predicted goals scored in 2010 (I denote this as GS') might be (GSAA-2009avg)*0.77+2009avg, and there's a similar formulate for predicted goals conceded.
It's always good to get a feel for your results, as well, so I backchecked the entire set with these predictions, coming up with a standard deviation of around ±10 goals for scoring and ±9 for conceding. By squaring the sum of our errors squared, we can also get an expected standard deviation for goal differential; this result is ±13.
What does our league table look like now? Well, it looks very much like the previous year's sorted by goal differential, hardly a surprise considering the methodology:
Figure 3: Predicted goal differential for top 17 Premier League teams last year.
We'll leave it here for now. In part two I'll try to introduce a slightly more intelligent away of predicting goal differential, which should smooth out some of the error.