/cdn.vox-cdn.com/photo_images/1123667/GYI0061477086.jpg)
AUTHOR'S NOTE: Special thanks to Zach Rosenfield at AccuScore for providing me with their win projections for 2009, which, although publicly available when I started this analysis, were replaced by their 2010 projections by the time I finished it. Also, thanks to everyone on NN who suggested prognosticators, and thereby helped me populate the data set I used in my analysis.
In the current NFL, there are 3 things that are certain every year: (1) Mike Singletary will talk about a player "working his tail off," (2) Brett Favre will make a mockery of "retirement," and (3) NFL "experts" will make your local weatherperson look like a meteorological Nostradamus. Indeed, expert prognostication seems to be the one, sure-fire aspect of the NFL experience in which we'd all be better off emulating Jerry's advice to George: "If every instinct you've ever had is wrong, then the opposite would have to be right."
Of course, I'm not exactly reinventing the wheel here by having fun at the expense of the so-called experts. Plenty of commentary by writers and bloggers has already been devoted to this end. For instance, here's Gregg Easterbrook of ESPN with his yearly recap of NFL expert predictions gone wrong. On the statistical side of things, Brian Burke of Advanced NFL Stats has made it an annual ritual to point and laugh at Football Outsiders (FO); while Vegas Watch showed just how much of your posterior would have been handed to you in 2009 had you based your NFL futures bets solely on FO's win projections. Finally, even in the realm of fantasy football, Sara Holladay (aka the Fantasy Football Librarian) made it all the way to the New York Times' Fifth Down blog by quantifying the dart-throwing exercise that is preseason player rankings.
And it appears that Burke isn't the only one who's sworn off making win predictions. In perusing the magazine racks of my local grocers and book stores while preparing for my fantasy drafts this season, I came to the realization that predicting NFL team wins is such a fruitless endeavor, and so inviting of unnecessary ridicule, that most of the popular (and heavily promoted) NFL preview publications don't even bother providing them to readers. Instead, most have copped out by offering up much safer "order-of-finish" predictions for each division. Essentially, what these publications have been reduced to saying is, "Extra, extra! The Rams are going to finish last in the NFC West! The Chargers will win the AFC West! The Lions will come in 4th in the NFC North! This hard-hitting analysis can be yours for only $9.99!"
So, let's just say the general consensus among shrewd observers is that NFL "experts" might just manage to inaccurately predict tomorrow's sunrise if given the chance. That much we already know. But just how inaccurate are they? Which NFL experts, if any, are relatively clairvoyant, and which are whatever the polar opposite of "clairvoyant" is? How does the accuracy of NFL expert predictions fare against the accuracy of non-experts? Generally speaking, does knowledge - whether fed into a computer or stored in human memory - equal power for a team-win prognosticator?
My hope is that this post answers some of these questions, and thereby prepares you to be a more-discerning consumer of expert predictions as the dawn of the 2010 NFL season arrives this week. To that end, I examined the accuracy of 28 Jimmy-the-Greek wannabes based on their team win predictions for the 2009 NFL season, and compared accuracy rates both within and between 5 types of prognosticator groups:
- Stat geeks
- Professional pundits
- Handicappers
- Amateur pundits
- Metaphorical members of the wild kingdom
After the jump, a little bit more detail about my methods, and a lot more detail about my results...
THE PROGNOSTICATORS
In terms of collecting my data, I relied on 4 sources. First, I asked for your help via this post. Second, I scoured the websites of major online sports hubs like ESPN, CBS Sports, FOX Sports, NBC Sports, Sports Illustrated, Yahoo, SB Nation, etc. Third, I googled every possible permutation of the search terms "2009, NFL, wins, win totals, standings, predictions, projections," etc., and clicked through from the 1st to the 50th page of results for each search (Aside: Why 2009 only? Just try finding a meaningful number of pre-2009 team win predictions that are available on the internet, and you'll arrive at the frustrating answer.). These first 3 sources accounted for 27 of the 28 sets of win predictions, with my personal hard copy of The Sporting News' Pro Football '09 accounting for the 28th set.
Once I had my data collected, it became pretty apparent that each set of predictions could be logically grouped into the types of prognosticators I mentioned earlier. Stat Geeks based their predictions on sophisticated statistical analyses (e.g., AccuScore). Professional Pundits were members of the mainstream NFL media or NFL bloggers who write for sites with their own domain names (e.g., Peter King, Walter Football) . Handicappers based their predictions on handicapping analyses, which involves both objective and subjective factors, and were explicitly aiming to win NFL futures bets (e.g., Vegas Watch). Amateur Pundits were bloggers who didn't have inside NFL access (e.g., Stampede Blue). Finally, taking a cue from Brian Burke's FO fun, I created a 5th prognosticator group, Metaphorical Members of the Wild Kingdom ("MMWKs" for short; resemblance to Peter King's "MMQB" column was totally unintended...seriously!), which included these 3 sets of predictions made by various anthropomorphized animals:
- Rover - this is my pet dog. I've trained him to tap his paw to indicate how many games he thinks a team is going to win. Problem is that I've only trained him to tap it 8 times, so he predicts 8 wins for every team.
- Polly - this is my pet African Grey Parrot. I trained her to repeat everything I say, and what I told her was the number of games each NFL team won in 2008. So, her predictions for 2009 were that each team would win the same number of games it won in 2008.
- Dim - this is that annoying beetle - whose name is an allusion to A Bug's Life - that randomly dive-bombs me every time I sit out on my balcony. His behavior in a given airspace is seemingly arbitrary, so his 2009 win prediction for each NFL team was a random number from 0 to 16.*
Theoretically, no NFL expert should be worse than any of these MMWKs because, as Burke wryly puts it, such a result would be "literally worse than having no football knowledge at all."
THE ACCURACY MEASURES
I'm not going to bore you with a statistical debate about the advantages and disadvantages of various accuracy/error measures. All you really need to know is that (a) the most basic options are mean absolute error (MAE) and root mean squared error (RMSE), and (b) I chose MAE for two reasons. First, MAE is more forgiving of really bad predictions, and, as you'll see, many of these NFL prognosticators needed all the forgiveness they could get. Their predictions were already so error-prone that I didn't need to go clubbing baby seals by using an accuracy measure that makes things look even worse to the untrained eye than they already are.
Second, and more importantly, MAE is expressed in a number that makes a lot more intuitive sense than the number spit out by RMSE. For example, if I told you that Peter King's 2009 predictions had an MAE of 1.00 - which definitely was not the case - the straightforward interpretation of "1.00" is that King's average prediction was 1 win off. This interpretation is especially convenient if, in the future, you'd like to attach a margin of error to a given expert's prediction. Say Peter King just picked the 49ers to win 9 games. Well, he's usually off by 1 win either way, so I can expect the Niners to win 8-10 games based on his prediction. In contrast, RMSE doesn't easily lend itself to these kinds of painless real-world applications. Although a seemingly elementary exercise, adjusting a prognosticator's current predictions based on his/her historical accuracy actually forms the fundamental core of much more sophisticated projection models (e.g., Nate Silver's various election projection models at fivethirtyeight.com). And, like Shaq once said, "statistical adjustments are fuuuuuuuuuundamental!"
To supplement the broader level of accuracy that's measured by MAE, I also used a couple of simple counting stats that provide more specific information about each prognosticator. First, there's hits, which was the total number of team win predictions a prognosticator nailed exactly on the number. Naturally, the difference between 32 and hits is misses, so I ignored misses to prevent redundancy. Instead, I used near misses and barn misses. A near miss was a win prediction that was off by no more than 2 wins either way. On the other end of the spectrum was a barn miss, which, as its name implies, was a prediction so inaccurate that it would have missed the broadside of a barn if one happened to be in the vicinity of the prognosticator. More precisely, barn misses were 4 or more wins off either way.
THE RESULTS
Let's cut right to the chase. Below is a table showing accuracy stats for the 31 prognosticators in my sample, who I've ranked from lowest to highest MAE (because lower error = better accuracy). Also, for your convenience, I've also attached links to all of the publicly available, online sources:
So, the race for 2009's most accurate win prognosticator ended in a tie between a handicapper, Football Locks, and an ESPN pundit, Mike Greenberg. Interestingly enough, not too far behind one Mike was the other Mike of Mike and Mike in the Morning fame. Based on 2009, then, it seems ESPN has quite the prognosticating pair manning their morning-drive microphones.
However, if I were to base these rankings on the overall picture painted by the various stats, I'd have to crown Vegas Watch's Prospective Line Estimate as the king of 2009. Although ostensibly a Stat Geek, this set of predictions was more of a way to identify specific outliers in the Vegas win-total futures and individual game lines than it was to produce team win predictions that were maximally accurate in the aggregate; hence, its categorization as a handicapper. Well, a happy coincidence of not focusing so much on the specifics of each team was ending up with the highest hits, highest near misses, fewest barn misses, and 2nd-best MAE.
The most amazing (and unexpected) thing to me about Vegas Watch's accuracy, however, was the manner in which it arrived at the win total estimates. Specifically, SportsBetting.com put out "prospective lines" for all 256 regular season games 15 days before the regular season even started. Vegas Watch took those prospective lines, assigned win probabilities to each team in each of the 256 games by utilizing some line-to-win-probability conversions that are widely known in the handicapping community, and then simply added up each team's 16 individual-game win probabilities to come up with an expected win total. For instance, here's a table showing how the procedure worked for the 2009 49ers, Vegas Watch's 5th most accurate win prediction:
Game |
Opp |
Prospective Line |
Win Probs |
1 |
vs. ARI |
6.5 |
0.294 |
2 |
vs. SEA |
-4 |
0.643 |
3 |
@ MIN |
7.5 |
0.264 |
4 |
vs. STL |
-8.5 |
0.748 |
5 |
vs. ATL |
1 |
0.475 |
6 |
@ HOU |
4 |
0.357 |
7 |
@ IND |
10 |
0.223 |
8 |
vs. TEN |
2.5 |
0.438 |
9 |
vs. CHI |
0 |
0.500 |
10 |
@ GB |
6 |
0.307 |
11 |
vs. JAC |
-3.5 |
0.619 |
12 |
@ SEA |
1 |
0.475 |
13 |
vs. ARI |
0 |
0.500 |
14 |
@ PHI |
11 |
0.206 |
15 |
vs. DET |
-9.5 |
0.762 |
16 |
@ STL |
-2.5 |
0.562 |
|
|
Estimated Ws = Sum of W Probs |
7.373 |
|
|
Actual Ws |
8.000 |
|
|
Absolute Error |
0.627 |
See? So easy a caveman can do it. ® In fact, Sportsbetting's prospective lines are up right now if you have 15 minutes of free time and access to MS Excel. What's even funnier than GEICO commercial references - and in less danger of copyright infringement - is that, at the time, Vegas Watch noted that the prospective lines for 2009 seemed to have relied too heavily on 2008 win totals. It turns out the only thing "too heavy" was my emotional reaction to its accuracy. Think about it. Win total predictions based only on game odds posted between 1 and 4 months prior to the actual game were the most accurate overall among the 28 non-MMWK prediction sets I evaluated. That's just astonishing, but I'll have more on the impact of up-to-the-minute knowledge - or lack thereof - a little later.
One last thing I'll mention in this section is that, contrary to my dismissive headline and comments thus far, the fact that the Top 10 prognosticators were "experts" in NFL statistics, NFL journalism, or NFL handicapping suggests that - perhaps - they actually know what they're talking about when they make their win predictions. However, before we start falling all over each other praising the experts, we should keep 2 things in mind. First, even the best experts were still about 2 wins off per team with their predictions. A couple of unwitting slip-ups here or there could have easily put them in Polly-and-Rover territory. Second, there were 3 amateur pundits in the top half of the rankings; which suggests that being an NFL expert is not necessary for making relatively accurate win predictions.
THE ELEPHANT IN THE ROOM
Knowing that I'm a dyed-in-the-wool fan of FO, many of you read the last section patiently anticipating a comment about how FO's 2009 projections were worse than a metaphorical canine and a metaphorical avian. As long as we're dealing in metaphors, there's really no way to put lipstick on this metaphorical pig, so I'm going to devote this entire section to an attempted extreme makeover.
Despite FO's valiant attempts at investigating - and thereby explaining - the poor performance of their 2009 projections, it still boggles my mind that they did this poorly. Generally speaking, when you have a wealth of statistical information at your disposal, you've spent the better part of a decade refining your prediction model, and you've simulated the NFL season 10,000 times, there's basically a double-lightning-strike chance that you'd be worse at predicting team wins than a video game; a video game that predicted a 5-10-1 record for the NFC Championship game host Minnesota Vikings, I might add. Probably even more disheartening to FO was that they did about half-a-win-per-team worse than their weekly punch line, Peter King.
One of the benefits of having done the analysis I'm presenting here is that lends itself to pure apples-to-apples comparisons, which render some of the savvier statistical explanations less convincing. Here's what I mean. Back in Part 2 of my interview with Bill Barnwell, he made a persuasive argument - at least it was persuasive at the time - that football prediction is pretty difficult for various reasons; chief among them being the short, 16-game season. Well, armed with the MAEs above, we can see that, even when all of the prognosticators are dealing with the same 16-game-season problem, FO still did relatively poorly. Similarly, Bill pointed out that football predictions are relatively difficult because statisticians are forced to rely on inferior data. Well, even among Stat Geeks who rely on essentially the same inferior raw data, FO still did relatively poorly. Indeed, their competitors at WhatIfSports and AccuScore were about three-quarters-of-a-win-per-team more accurate despite utilizing superficially similar play-by-play-based regression and simulation procedures. So, again, 2009 was unequivocally bad for FO any way you slice the statistical pie.
Based on all I've learned to date, my personal view is that there can be only 2 possible explanations for FO's inaccuracy in 2009: inadequate statistical methods or bad luck. Given that, contrary to popular belief, I'm not one of the few people on Earth who are intimately familiar with the precise methods behind FO's win projections, it would be presumptuous of me to critique them from a methodological standpoint. I have my ideas (Hints: mishandling of clustered data, overreliance on ordinary least squares regression, and potential overdetermination), but I'll just have to lean towards bad luck until they give me the keys to the kingdom.
Going forward, the way I'd approach FO's win projections is to heed the words of Bill Barnwell himself. As he stated in our interview, we should make sure not to avoid "confusing two different concepts - DVOA, the play-by-play analysis metric, and (their) projection system, which is based on DVOA." In other words, just because FO's win projection system was essentially unreliable in 2009, you shouldn't throw the baby out with the bathwater by unfairly jettisoning DVOA altogether. Indeed, DVOA based on past performance remains a reliable measure of play-by-play efficiency; and there's no inherent contradiction in lauding a primarily descriptive stat like DVOA on the one hand, and panning a DVOA-based prediction on the other.
IGNORANCE IS BLISS
The discussion above actually provides a nice segue into the final piece of information I'm going to present in this post. Specifically, one curious finding related to FO's 2009 win projections is that their accuracy actually got worse between their initial projections in Football Outsiders Almanac 2009 (FOA09) and their revised projections on the eve of Week 1. Indeed, if you refer back to the table, you see that FO's projections in FOA09 were off by an average of 2.59 wins, whereas the updated projections they published in September - which were based on information gleaned from training camps and preseason games - were off by an average of 2.69 wins. Conjuring up the sentiments of Brian Burke once again, not only were FO's win projections worse than those of someone (or something) having no football knowledge at all; the massive influx of football information that arrives every year from July to September - and did so in 2009 - actually made their projection models even less knowledgeable than that. So what gives?
Well, the very simple answer is that - at least in 2009 - knowledge did not equal (predictive) power. Thankfully for them, this wasn't a phenomenon specific to FO. As the chart below shows, there was no relationship whatsoever between the accuracy of a given set of win predictions and the temporal proximity of those predictions to the start of the regular season:
For those who are statistically inclined, check out that R-squared and the slope of the regression equation! For those who aren't statistically inclined, the essentially flat trendline means there was no relationship between accuracy and information. Furthermore, the R-squared value means that knowing when a given prognosticator made his/her/its picks only gets you about 1/500th of the way towards perfectly predicting that prognosticator's accuracy.
I've highlighted a few specific data points to drive this point home. First, let me focus your attention on what the trend would look like if there actually was the expected, common-sense-driven relationship between accuracy and time-to-season. Basically, the white trendline would connect the data points for FOA09, Walter Football, Mike Greenberg and Football Locks. That is, the trendline would illustrate that predictions made closer to the season were systematically more accurate than predictions made farther out from the season. If all the data points were bunched along such a trendline, we'd end up with a large regression slope in the expected direction and a really high R-squared value; thereby allowing us to objectively conclude that knowledge actually mattered.
Surprisingly, that simply wasn't the case in 2009. For instance, check out the data point for Stampede Blue, an esteemed blog on our own SB Nation network. Despite making its predictions 129 days prior to the start of the regular season, it still managed to end up smack dab in the middle of the accuracy pack (i.e., MAE = 2.13).
Now, let's look at the extreme outliers, 18 to 88, FO's eve-of-the-season projections, and WhatIfSports. It's pretty easy for you to draw a trendline connecting each of these data points, but the kicker is that this trendline would suggest the exact opposite of what we'd expect. In other words, win projections would be systematically less accurate as the season grew near. Indeed, as the earlier table showed, WhatIfSports was in the Top 5 in accuracy despite publishing its projections 83 days before the season, whereas 18 to 88 was dead last in accuracy - even worse than all 3 of my unknowing MMWKs - despite publishing its projections the day the season began.
Taken together, the data points that I've highlighted in the chart above illustrate a quintessential feature of unrelated pairs of phenomena. Namely, you can draw any number of trendlines that you want, and they all seem to fit the data equally well (or poorly). In the current context, we can draw the flat trendline displayed in the chart, we can draw it sloping downward from FOA Pre to WhatIfSports to show that ignorance is bliss, or we can draw it sloping upward from Football Locks to FOA09 to show that knowledge is power. Unfortunately, all three lines, and therefore all 3 hypotheses (no relationship, positive relationship, negative relationship) would appear to be equally possible given the data. We might have had fun drawing, but we wouldn't have discovered anything along the way.
Of course, my overarching aim in this section was not to give you a statistics lesson (although, I enjoy that too). Rather, it was to tell you that the accuracy of expert projections in 2009 had seemingly nothing to do with differences between the amounts of preseason information that were available to various experts. And if we add this conclusion to the one earlier about expertise not being a prerequisite for accurate predictions, we've finally arrived at the general takeaway message of this post. Despite all of the kabuki, self-proclamations, and digitally enhanced viewing experiences suggesting the contrary, NFL experts are only minimally more accurate at predicting NFL wins - if at all - than is your average, NFL-informed blogger. Hey, whaddya know? We here at Niners Nation are average, NFL-informed bloggers! Does that mean we're as accurate as the experts? Well, I don't want to give anything away, but stay tuned this upcoming season to find out.
BOTTOM LINE
So, in conclusion, how do I suggest you approach the yearly NFL tradition that is "Experts A, B, C, and D on WXYZ network sit around piece of furniture E, and bestow upon the masses their win predictions for teams 1, 2, and 3"? Based on the accuracy rankings and statistics for 2009 that I presented in this post, here's how:
- They're called "professional handicappers" for a good reason.
- AccuScore was hired by CBS Sports and ESPN to do predictions for a good reason.
- WhatIfSports calls itself "the sports simulation destination" for a good reason.
- Listen to Mike and Mike in the Morning during the months of August and September.
- Along with peace, give FO a chance. They probably were the victims of bad luck last season, and the non-projection side of their operation remains beyond reproach. However, over on the projection side, if their 2010 predictions end up being just as inaccurate as their 2009 projections, it might be time to start thinking about a mad dash for the lifeboats.
- Knowledge ≠ predictive power. Ignorance is bliss. All that glitters isn't gold. Pick your favorite overused idiom to articulate that style does not necessarily mean substance; just not that annoying "lies, damned lies, and statistics" one, OK?
p.s. There are a lot of specific discussion points I couldn't fit into this post. Some of the most interesting to me include but are not limited to
- NFL expert, Gregg Easterbrook, making predictions when, ironically, he's the first one in line to poke fun at the predictions of NFL experts;
- How much you would have profited had you used Vegas Watch's Prospective Line Estimate to make your futures bets prior to the 2009 season;
- How Vegas Watch could be so accurate with one set of predictions and so inaccurate with another (Hint: FO's win projections were involved);
- The NFL teams that ruined or saved a given expert's or type of experts' accuracy in 2009;
- The laughable inaccuracy of expert playoff predictions
- Explanations for some of the methods behind the Stat Geeks' projections; and
- More about the methods I used in my analysis.
If you want to discuss any of this stuff, hit me up in the comments section.
*Technically, Dim's prediction for each team was a random number between 0 and 16, but constrained by the expected number of teams with a specific win total given the binomial distribution for n = 16 games and p = .50. Specifically, in a 16-game season where all teams have a 50% chance of winning any given game, the random distribution of team win totals according to the binomial distribution is as follows: 1 team is expected to win 12 games, 2 are expected to win 11 games, 4 are expected to win 10 games, 6 are expected to win 9 games, 6 are expected to win 8 games, 6 are expected to win 7 games, 4 are expected to win 6 games, 2 are expected to win 5 games, and 1 is expected to win 4 games. If I had ignored the binomial distribution, and just let Dim select 32 random numbers between 0 and 16, he would have basically had no chance of meeting my strict 256-win requirement, and a not-much-better chance of meeting my lenient 253-to-259-win requirement.