clock menu more-arrow no yes mobile

Filed under:

2011 NFL Draft: An Expanded Model for Predicting QB Career Performance (Cont'd)

AUTHOR'S NOTE: This is the continuation of yesterday's post. I'm just starting it from where I left off. Click here to read the first half. Oh, and again, WARNING: EXPLICIT STATISTICAL CONTENT!

All in all, I tested 11 models:

 

Sample

Predictors

Result

 Test

Name

Rounds

N

A

B

C

R-squared

N

M Error

SD Error

FD1

4NR

84

GS

Div1A?

Pick

0.385

15

3.49

1.67

FD2

3R

62

GS

Comp%

Pick

0.388

14

3.49

1.95

LCF1

4NR

84

GS

Comp%

 

0.093

15

3.67

2.22

LCF2

3R

62

GS

Comp%

 

0.150

14

3.95

2.52

LCF3

2NR

44

GS

Comp%

 

0.360

12

4.64

2.58

LCF4

2R

44

GS

Comp%

 

0.359

12

4.77

2.70

LCF1Rev1

4NR

84

GS

Comp%

Pick

0.374

15

3.36

1.62

LCF1Rev2

4NR

84

GS

 

Pick

0.359

15

3.33

1.41

LCF2Rev

3R

62

GS

Comp%

Pick

0.388

14

3.49

1.95

LCF3Rev

2NR

44

GS

Comp%

Pick

0.431

12

3.81

2.23

LCF4Rev

2R

44

GS

Comp%

Pick

0.431

12

3.94

2.38

After the jump, I'll discuss the table...

Here's how you'd read this table. In the "Name" column, FD stands for me, LCF stands for the Lewin Career Forecast, and "Rev" means "revised." In the "Rounds" column, the number means how many rounds of data that were in the sample, "R," means "with replacement," and "NR" means, "without replacement." In the "N" column, that's just the number of QBs that were in the sample. For the predictors, "GS" stands for "college games started," "Comp%" stands for "college completion percentage," and "Div1A?" stands for "did QB enter NFL draft from Div1A school?" Finally, if you see a predictor crossed out, it means that it did not significantly predict FFPts/G in that particular model.

So, for instance, Model LCF1Rev1 was a revised version of the LCF wherein I used non-replaced data from the 84 QBs drafted  in Rounds 1-4 from 1993-2006 to predict QBs' FFPts/G from their college games started, college completion percentage, and/or pick number. In that model, it turned out that Comp% did not have a significant impact on FFPts/G, so I then tested Model LCF1Rev2, which did not have Comp% in it. Capisce?

Basically, the general idea here was to (a) test a 4-round model and a 3-round model that were native to my analysis, (b) test 2 models that are exact replicas of the LCF, (c) test 4-round and 3-round versions of the LCF, and (d) test various revised LCF models that throw the pick variable into the mix given that it was the most significant predictor of FFPts/G back when I ran simple correlations.

OK, now for the good stuff. Each model test results in a linear equation that relates the predictors to FFPts/G. R-squared measures how well that regression equation fits the data. It goes from 0 to 1, can be expressed as a percentage, and the closer to 1 it is, the better.

After running all the model tests, I then used the linear equation spit out by each model spit to test how well they predicted career FFPts/G for QBs drafted from 2007-2009. In the last 3 columns of the above table, "N" is again how many QBs were included in a particular test. The actual test measures here were the mean (M) and standard deviation (SD) of absolute error. Lower M Error means more accurate prediction, and lower SD Error means less varied prediction (i.e., fewer huge misses).

In essence, R-squared tells you how well the model explains the past, whereas the error values tell you how well the model predicts the future. Ideally, what we want is a model that does both well, but we lean a little bit towards the prediction side of things if the results are close.

MODEL EVALUATIONS

The first thing you probably notice in the table is that the 2 models that simply extend the LCF past 2 rounds (i.e., Models LCF1 and LCF2) were atrocious at both explanation and prediction. Basically, this represents the very reason why you're only supposed to use the LCF for QBs taken in the first 2 rounds. After that, GS and Comp% do a horrible job by themselves. Of course, you might also notice that even the basic LCF 2-round models predict badly despite explaining the past well (i.e., high errors, but high R-squared). Taken together, these results seem to suggest that the LCF, as originally specified, is an inadequate model if we want to (a) predict future FFPts/G, and (b) do so for players drafted after the 2nd round.

The next model I'll bring your attention to is my 4-round model, FD1, which seems to do a pretty good job at both explanation and prediction. One reservation I have about it, though, is that it's the only model to have identified Div1A? as a meaningful predictor, and I think that result was the simple byproduct of it's 4-round nature. This is because, if you look at all the non-D1A QBs that were taken in the first 4 rounds from 1993-2006, most were taken in the 4th round, and the only one to have ever really amounted to anything was Steve McNair, who also happened to be the only one taken in the 1st round. Basically, the result is telling us that non-D1A QBs are going to suck. However, it should really say, non-D1A QBs taken in Rounds 2-4 are going to suck, especially because the only such QB taken from 2007-2009 was Joe Flacco, who also happens to have been a non-sucking 1st-rounder. All in all, I don't think Model FD1 is the best one.

So that leaves the revised LCF models (you'll notice my Model FD2 actually ended up being identical to Model LCF2Rev). Looking at these models, you see right off the bat that the 2-round versions (i.e., Models LCF3Rev and LCF4Rev) suffer the same fate as their non-revised counterparts. Namely, they're really good at explaining the past, but horrible at predicting the future (i.e., high errors, but high R-squared). So, out they go.

Of the remaining 3, I'm going to choose LCF1Rev2 as the winner because it's the most predictive model, and it explains the past nearly as well as any of the others. Another reason I prefer it to Model LCF2Rev is that it achieves our goal of extending the LCF as deeply into the draft as possible. This goal was achieved, however, at the expense of Comp%, the statistical importance of which washes out after the 3rd round. You can see this pretty clearly in the table, which shows that Comp%, although important in all of the 3-round models, was not a meaningful predictor in any of the 4-round models.

So, without further ado, the equation for Model LCF1Rev2, which we can now use to predict the career FFPts/G for QBs drafted in the first 4 rounds is

FFPts/G = 7.10 - 0.05*Pick + 0.08*GS

Basically, this equation says that, FFPts/G decreases by 0.5 for every 10 additional picks farther into the draft, and it increases by 1 for every 12 additional college games started. For example, Mark Sanchez started only 15 games at USC, but was the 5th pick in the draft. This translates to a career prediction of 8.01 FFPts/G, which happens to be ony 1 FFPt/G off from his actual career average thus far (9.01). Incidentally, this prediction is 1.50 FFPts/G better than the original, 2-round LCF prediction, presumably because it heavily penalized him for having started only 15 games. In our model, however, pick is the best predictor, so the fact that he was drafted 5th overall ends up (correctly) being more important.

Well, hope you all enjoyed that. After the draft, I'll come back and use this equation to make predictions about the QBs taken in the first 4 rounds, especially if the 49ers happen to take one.