Sunday, February 17, 2013

Belated Post-Bowl Debriefing

A natural question for anyone trying to model college football (or anything for that matter) is, "How well does this model do?" The answer in this case is, "Not was well as I hoped." True, the model was able to correctly pick 24 winners out of the 35 bowl games, which is statistically significantly better than picking randomly and would be enough to beat 89% of those participating in ESPN's College Bowl Mania. However, things don't turn out so well when instead trying to predict the margin of victory:


Plotted above are all 35 bowl games with the predicted margin of victory on the horizontal axis and the actual margin of victory on the vertical axis. If the model was perfect, all the points would lie on the gray line; instead a handful of points lie close to the line, but others are worryingly far from it. In fact, the correlation coefficient (R2) for this data set was only 0.075. In other words, the model only explained 7.5% of the variation we see in margins of victory in bowl games.

The natural follow-up for anyone who just asked themselves how well they can model something is to ask how the model can be improved. To this end, I looked at the five games where the model performed worst:

Bowl Game
Opponents
Prediction
Outcome
Hawaii Bowl
Fresno State vs. SMU
Fresno State by 14
SMU by 33
Independence Bowl
Ohio vs. LA-Monroe
LA-Monroe by 13.5
Ohio by 31
Pinstripe Bowl
W. Virginia vs. Syracuse
W. Virginia by 7.5
Syracuse by 24
Sugar Bowl
Louisville vs. Florida
Florida by 19.5
Louisville by 10
Cotton Bowl
Texas A&M vs. Oklahoma
Texas A&M by 1
Texas A&M by 28

Looking at these games, one stat stood out: rushing yards. Except for the Louisville-Florida matchup, the victors of these games out-rushed their opponents, sometimes by margins over 200 yards. But could this have been predicted? Sure, the winners were better rushers during their bowls, but were they better during the season? And was this not captured in the margin of victory already incorporated in the model? To find out, I incorporated each of the teams' average rushing yards per game into the model and redid the analysis. The results:


It may not look much better, but running the numbers shows that it is. The number of winners correctly picked increased from 24 to 25 (out of 35). More importantly, the correlation coefficient doubled to 0.151. That's still not great, but it's a nice improvement for adding one statistic. So look for rushing yards to be more formally incorporated into the model next year and look for another round of analysis following the bowls of the 2013 season.