/cdn.vox-cdn.com/photo_images/1056726/GYI0061307317.jpg)
While we don't have a good measurement of most events on the field of play, we do have some information. The most obvious is the timing of the goals scored in a match. What can we do with that?
If you make a few assumptions, getting a measure of win and draw probability isn't so difficult. First, in order for this to actually be a viable statistic, we have to idealise two completely average teams playing each other - this is best for ease of computation and allows us to compare any given team to average. Second, and probably more important, is the assumption of a Poisson distribution for goal-scoring. This is statistical-speak for an approximation for how often a certain number of goals will score in a game given an average amount of goals scored. Third is that home field advantage doesn't matter. As per usual, I'm going to gloss over this one, but it's mainly to do with keeping each hypothetical team on an equal footing.
So, if we know that the average team scores about 1.39 goals per game, we can get the likelihood of the average team scoring any number of goals in a set amount of time. Do some mathematical/excel tricks (I used brute force, being the awful mathematician that I am), and suddenly you have the probability of each team winning as well as for draws. Neat, huh?
Let me demonstrate this after the jump in graphical form, using last weekend's game against West Bromwich Albion as an example.
Figure 1: Win/Draw Probabilities for the Chelsea vs. West Bromwich Albion game on 8/14/2010
Neat, huh? This is a cool visual representation of the game and can also help put proper weight on goals in key moments - Didier Drogba's third goal and Florent Malouda's second are exposed as completely worthless in terms of winning the match, whereas each of their first goals was critical. I've always loved the Win Probability graphs in baseball, and I'm really excited about deriving something similar for soccer.
Note that while I've worked through the math and am fairly satisfied by the outcome, I'm not completely positive that these graphs are 100% accurate. My values look reasonable across the board, but feel free to grill me with questions about methodology; that will help identify and correct any mistakes.
I hope you guys think this is as cool as I do!