AUTHOR'S NOTE: This is Part 1 of a 5-part series on predicting the career performance of NFL QBs. I'll be posting a new installment every day this week at 4 p.m. PDT.
A couple of years ago, I did a post detailing my re-analysis of David Lewin's QB prediction system, known as the Lewin Career Forecast (LCF), which projects the Defense-Adjusted Yards Above Replacement per Game (DYAR/G) for a QB taken in the first 2 rounds of the NFL draft based on that QB's college completion percentage (Comp%) and the number of games he started in college (GS). To spare you the week of living you'd lose if you went back and read that novel-length post, we basically learned that, although college Comp% and GS do predict NFL performance, the margin of error for the predictions is pretty huge because of the small sample size, and there's the slight user-unfriendliness problem of "What the heck is DYAR/G, and what are the odds anyone's going to know what I'm talking about if I bring it up in an argument about QB draft picks?" To solve these problems, I suggested (a) heavy dose of statistical humility when making LCF-based predictions, and (b) using fantasy points per game (FFPts/G) as the prediction metric instead of DYAR/G.
Well, it's been 2 years, and there's a decent chance that the 49ers might actually take a QB with one of their first 2 picks next week, so I figured now is as good a time as any to take another look at the LCF, to check out if some other variables besides Comp% and GS are predictive, and to step back and have a more philosophical discussion about the general idea of trying to predict QB performance in the first place. First up, let's talk about the Wonderlic.
After the jump, I tell you why Wonderlic scores should not have any place in a discussion about whether or not high QB draft picks are going to be successful in the NFL...
I'm going to go ahead and assume you all know what the Wonderlic (full name, Wonderlic Cognitive Ability Test nee Wonderlic Personnel Test) is. If you don't, click on that handy link. In the context of projecting college QBs into the NFL, the Wonderlic invariably seems to find its way into the conversation; and I don't just mean the conversation between Hyperactive TV Draft Expert Y and Hyperactive TV Draft Expert Z. Rather, I'm talking about conversations in the football stats world.
For instance, ignoring the LCF for a moment, the QB prediction model du jour is the so-called Rule of 26-27-60, which was developed by Sports Illustrated's John Lopez. In this rule of thumb, the "26" means a college QB should have a Wonderlic score of at least 26, the "27" means he should have started at least 27 games, and the "60" means he should have completed at least 60% of his passes. For those familiar with the LCF, you'll immediately notice that the 26-27-60 rule is basically just LCF + Wonderlic, with the 37-start LCF threshold changed to 27- possibly for some convenient reason I'll get into later this week. The 26-27-60 rule has caught fire around NFL draft time the past couple of years, as indicated by the 20,000-plus Google results you get when searching "Rule of 26-27-60." It's even shown up on Cincy Jungle, SBN's Bengals blog.
Another QB prediction model that incorporates the Wonderlic is more geared towards the fantasy football side of things, and was developed by Fantasy Football Metrics', R.C. Fisher. This one is much more thorough than a simple rule of thumb, so I suggest you read Fisher's article if you're interested in the details. For our purposes, though, the point is that Wonderlic scores are mathematically incorporated into the rating spit out by the model.
So, given that some stat geeks seem to think the Wonderlic's useful in a QB projection context, the question I have is, "Is it, really?" The answer, it turns out, is incredibly simple, straightforward, and - apparently unbeknownst to the vast majority of fans and pundits - has almost reached conventional-wisdom status.
CUTTING TO THE CHASE
OK, let me first show you what a statistical relationship looks like in the context of QB performance. Below is a graph that plots college GS (horizontal axis) and NFL FFPts/G (vertical axis) for the 57 NFL QBs who were drafted in the first 2 rounds from 1992-2008:
That white line sloping running through the middle of the graph shows the trend in FFPts/G as college GS increases, and "r = .398" is the correlation between the two. For the non-technical folks out there, the trend line and the correlation show a clear positive relationship, such that the higher number of games that a QB starts in college, the better he'll end up being during his NFL career. This relationship is statistically significant at the .001 level, which means we can be over 99.9% confident that it's not some happy accident. For the poker players out there, think about it this way: Hitting a 1-outer on the river is over 20 times more likely than finding a relationship this strong through dumb luck.
Now, let's compare that with the graph below, which plots the Wonderlic scores (horizontal axis) and NFL FFPts/G (vertical axis) for the same group of QBs (except the 6 for which I couldn't find Wonderlic scores):
As you can see, there's just some random blob of dots up there that aren't trending in any direction. In contrast to the GS graph earlier, the trendline and correlation show no statistical relationship whatsoever in this one. To put it simply, having a better Wonderlic score does not predict a better NFL career for QBs.
THEN WHY DO WE STILL TALK ABOUT THE WONDERLIC AT DRAFT TIME?
What's interesting to me is that it's not like the utter uselessness of Wonderlic scores is some groundbreaking discovery I've just introduced to the world here. Rather, there's this 2005 study by McDonald Mirable and this 2009 study by Lyons et al., both of which found no relationship between Wonderlic scores and NFL QB performance. In fact, the Lyons et al. study found no relationship for the performance or salary of any position on the field. If this information has been out there since 2005, then why do we still talk about the Wonderlic at draft time, especially in the context of QBs? Not surprisingly, I have a couple of hypotheses.
First, I think the idea that above-average intelligence is an important attribute for a QB has kind of been baked into the cake ever since football scouting - and the popular interest in it - emerged from the Stone Age. In a business where finding the slightest edge is a valuable commodity, people are going to latch onto the latest and greatest commodity to appear on the horizon (See California Gold Rush of 1848). So, it's not surprising to me that the history of the Wonderlic in NFL player evaluation circles went something like, "Tom Landry looks for edge around 1970. Tom Landry finds Wonderlic a little after 1970. Tom Landry's Cowboys win a lot during the 1970s. Other NFL teams think Tom Landry's discovery of the Wonderlic is what made his teams win. Other NFL teams start copying Tom Landry because they, too, want to win. Intrepid journalist finds out that most NFL teams rely on a little known intelligence test called the Wonderlic. Intrepid journalist spreads the news. Cake put in oven. Widespread popular use of the internet, and all the easily disposable information that comes with it, does not emerge until about 15 years later. Cake baked."
This is purely speculation on my part, but evidence of the downward informational spiral can be seen in that 26-27-60 rule I mentioned at the beginning of the post. In essence, what you had here was a writer for (site decorum) SI coming along 5 years (!!!) after that Mirabile study (and after the LCF I might add), noticing that JaMarcus Russell bombed his Wonderlic and busted out of the NFL, and doing what columnists on deadline do best: developed something in a pinch that sounded good and was timely to current events, sophisticated fact-finding be damned. Again, this is speculation, but I'm guessing a story about the "we should have seen this coming" predictiveness of Russell's Wonderlic score wouldn't have gone over well with SI's editors if, you know, the Wonderlic actually didn't matter. Just saying.
This kind of "but...but, it just has to matter!" incredulity isn't just reserved for SI writers. It also appears in the fantasy-oriented prediction model I mentioned at the top. What Fisher did for that one, though, is a little bit savvier statistically speaking. Here's just a taste, from his list of model variables (my emphasis):
WONDERLIC/IQ - an unavoidable and a key data point we have access to on basic IQ and problem solving. There is a definite correlation to low Wonderlic scores and QB disappointment.
Did you catch that rose popping out of the magic wand? It's not that there's a definite correlation between Wonderlic scores and QB performance, it's that there's a definite correlation between low Wonderlic scores and QB disappointment. Again, a write-up of a system that incorporates Wonderlic scores wouldn't be as interesting if Wonderlic scores didn't matter. So, rather than acknowledging that they don't - which I just definitively demonstrated to you - he instead says Wonderlic scores only matter at the low end of the scale! Tada! Oh, and here's a rabbit!
At this point, almost everyone who's reading Fisher's post just accepts this to be true and moves on, unaware that they've just been duped by something masquerading under the guise of statistics; not to mention that football statisticians everywhere end up having to spend additional years of their lives defending their field thanks to magic tricks like these. Thankfully, you have me here at Niners Nation to prove to you that Wonderlic scores don't matter at the low end of the scale, at the middle of the scale, or at the high end of the scale.
If you take a look again at the Wonderlic graph above, you'll notice I've highlighted the data points for 7 specific QBs. The ones I chose were totally by design in that they represent QBs who had incredibly different career performances despite essentially being at the same point on the Wonderlic scale. For instance, low-Wonderlic-scorer Donovan McNabb has averaged over 13 FFPts/G thus far in his career, whereas low-Wonderlic-scorer Heath Shuler averaged less than 5 FFPts/G during his. On the other end of the spectrum, you have Wonderlic wunderkind Aaron Rodgers doing pretty well for himself, whereas another wunderkind, Kellen Clemens, has been so bad that he forced Brett Favre back into our lives after 4 months of peace.
Of course, then there's this. If you look at the absolute best FFPts/G stats and Wonderlic scores among the QBs, you'll notice that Peyton Manning has had the best career despite apparently being only moderately intelligent. In addition - and this is specifically for Niner fans - Alex Smith has had an average NFL career despite scoring 9 points higher than the average chemist.
My second hypothesis about why the Wonderlic lingers in draft conversation is loosely related to the first. Namely, take a look at the outlets at which the "Wonderlic doesn't mean diddly for QB performance" write-ups were published, and then compare that to the "Wonderlic means diddly" sources. Both the Mirabile study and the Lyons et al. study were published in peer-reviewed journals, the Lopez article was published in SI, and the Fisher model was posted on a fantasy football site. Now, I ask you, "How would you rank these sources in terms of the number of casual football fans that have read them?" OK, that's a rhetorical question. Obviously, SI and fantasy football sites attract just a tad more casual football-watching eyeballs than academic journals; not to mention that secondary sources like ESPN, CBS, etc. rely heavily on primary sources like SI, and now feature weekly (and sometimes daily) fantasy football coverage. So, is it unreasonable for anyone to suggest that the tens of millions of people who hear Wonderlic talk have driven the "Wonderlic matters" narrative? I think it's pretty darn reasonable, actually.*
And this brings me back to a comment I made earlier, which I'll expand on in closing. Here on NN, we've had our internecine battles about the value of stats in football. Although I'm obviously partial towards their value, I've been trained to recognize when they're of no value. Therefore, I'm also sympathetic to that side of the argument when the anti-stats view is offered as a sincerely held informed opinion. The problem that this whole Wonderlic thing highlights is that, in my mind, the anti-stats view is very often totally misinformed. Media Outlet X talks about Irrelevant Predictor Z as if it's relevant, normal people soak that up through osmosis because they have better things to do with their lives than evaluate statistical claims, notice the spectacular failures of Irrelevant Predictor Z, and then just throw their hands up dismissing stats altogether.
Here's the thing. In the same way not all scouting methods are good, not all statistical methods are good. The same way not all football pundits know what they're talking about, not all armchair football statisticians know what they're talking about. The curious reality I've found, though, is that when we notice Mel Kiper's failures, we just call him an idiot, and proceed to ignore anything Mel Kiper says after that. However, when we notice a statistical failure, we call the entire field of football statistics idiotic. For that reality, and the years of blogging I'll never get back thanks to it, I totally blame the Wonderlic. For the love of Peyton, here's hoping you will too.
*This national media trend may be shifting thanks, in part, to SBN. There's now my post, this post by Joel Thorman, and this one by Andy Hutchins, which actually sites the Lyons et al. study. Of course, even Thorman seems close to still falling victim to the "but...but, it just has to matter" trap.