AUTHOR'S NOTE: This is the continuation of yesterday's post. I'm just starting it from where I left off. Click here to read the first half. Oh, and again, WARNING: EXPLICIT STATISTICAL CONTENT!
All in all, I tested 11 models:

Sample 
Predictors 
Result 
Test 

Name 
Rounds 
N 
A 
B 
C 
Rsquared 
N 
M Error 
SD Error 
FD1 
4NR 
84 
GS 
Div1A? 
Pick 
0.385 
15 
3.49 
1.67 
FD2 
3R 
62 
GS 
Comp% 
Pick 
0.388 
14 
3.49 
1.95 
LCF1 
4NR 
84 
GS 


0.093 
15 
3.67 
2.22 
LCF2 
3R 
62 

Comp% 

0.150 
14 
3.95 
2.52 
LCF3 
2NR 
44 
GS 
Comp% 

0.360 
12 
4.64 
2.58 
LCF4 
2R 
44 
GS 
Comp% 

0.359 
12 
4.77 
2.70 
LCF1Rev1 
4NR 
84 
GS 

Pick 
0.374 
15 
3.36 
1.62 
LCF1Rev2 
4NR 
84 
GS 

Pick 
0.359 
15 
3.33 
1.41 
LCF2Rev 
3R 
62 
GS 
Comp% 
Pick 
0.388 
14 
3.49 
1.95 
LCF3Rev 
2NR 
44 
GS 
Comp% 
Pick 
0.431 
12 
3.81 
2.23 
LCF4Rev 
2R 
44 
GS 
Comp% 
Pick 
0.431 
12 
3.94 
2.38 
After the jump, I'll discuss the table...
Here's how you'd read this table. In the "Name" column, FD stands for me, LCF stands for the Lewin Career Forecast, and "Rev" means "revised." In the "Rounds" column, the number means how many rounds of data that were in the sample, "R," means "with replacement," and "NR" means, "without replacement." In the "N" column, that's just the number of QBs that were in the sample. For the predictors, "GS" stands for "college games started," "Comp%" stands for "college completion percentage," and "Div1A?" stands for "did QB enter NFL draft from Div1A school?" Finally, if you see a predictor crossed out, it means that it did not significantly predict FFPts/G in that particular model.
So, for instance, Model LCF1Rev1 was a revised version of the LCF wherein I used nonreplaced data from the 84 QBs drafted in Rounds 14 from 19932006 to predict QBs' FFPts/G from their college games started, college completion percentage, and/or pick number. In that model, it turned out that Comp% did not have a significant impact on FFPts/G, so I then tested Model LCF1Rev2, which did not have Comp% in it. Capisce?
Basically, the general idea here was to (a) test a 4round model and a 3round model that were native to my analysis, (b) test 2 models that are exact replicas of the LCF, (c) test 4round and 3round versions of the LCF, and (d) test various revised LCF models that throw the pick variable into the mix given that it was the most significant predictor of FFPts/G back when I ran simple correlations.
OK, now for the good stuff. Each model test results in a linear equation that relates the predictors to FFPts/G. Rsquared measures how well that regression equation fits the data. It goes from 0 to 1, can be expressed as a percentage, and the closer to 1 it is, the better.
After running all the model tests, I then used the linear equation spit out by each model spit to test how well they predicted career FFPts/G for QBs drafted from 20072009. In the last 3 columns of the above table, "N" is again how many QBs were included in a particular test. The actual test measures here were the mean (M) and standard deviation (SD) of absolute error. Lower M Error means more accurate prediction, and lower SD Error means less varied prediction (i.e., fewer huge misses).
In essence, Rsquared tells you how well the model explains the past, whereas the error values tell you how well the model predicts the future. Ideally, what we want is a model that does both well, but we lean a little bit towards the prediction side of things if the results are close.
MODEL EVALUATIONS
The first thing you probably notice in the table is that the 2 models that simply extend the LCF past 2 rounds (i.e., Models LCF1 and LCF2) were atrocious at both explanation and prediction. Basically, this represents the very reason why you're only supposed to use the LCF for QBs taken in the first 2 rounds. After that, GS and Comp% do a horrible job by themselves. Of course, you might also notice that even the basic LCF 2round models predict badly despite explaining the past well (i.e., high errors, but high Rsquared). Taken together, these results seem to suggest that the LCF, as originally specified, is an inadequate model if we want to (a) predict future FFPts/G, and (b) do so for players drafted after the 2nd round.
The next model I'll bring your attention to is my 4round model, FD1, which seems to do a pretty good job at both explanation and prediction. One reservation I have about it, though, is that it's the only model to have identified Div1A? as a meaningful predictor, and I think that result was the simple byproduct of it's 4round nature. This is because, if you look at all the nonD1A QBs that were taken in the first 4 rounds from 19932006, most were taken in the 4th round, and the only one to have ever really amounted to anything was Steve McNair, who also happened to be the only one taken in the 1st round. Basically, the result is telling us that nonD1A QBs are going to suck. However, it should really say, nonD1A QBs taken in Rounds 24 are going to suck, especially because the only such QB taken from 20072009 was Joe Flacco, who also happens to have been a nonsucking 1strounder. All in all, I don't think Model FD1 is the best one.
So that leaves the revised LCF models (you'll notice my Model FD2 actually ended up being identical to Model LCF2Rev). Looking at these models, you see right off the bat that the 2round versions (i.e., Models LCF3Rev and LCF4Rev) suffer the same fate as their nonrevised counterparts. Namely, they're really good at explaining the past, but horrible at predicting the future (i.e., high errors, but high Rsquared). So, out they go.
Of the remaining 3, I'm going to choose LCF1Rev2 as the winner because it's the most predictive model, and it explains the past nearly as well as any of the others. Another reason I prefer it to Model LCF2Rev is that it achieves our goal of extending the LCF as deeply into the draft as possible. This goal was achieved, however, at the expense of Comp%, the statistical importance of which washes out after the 3rd round. You can see this pretty clearly in the table, which shows that Comp%, although important in all of the 3round models, was not a meaningful predictor in any of the 4round models.
So, without further ado, the equation for Model LCF1Rev2, which we can now use to predict the career FFPts/G for QBs drafted in the first 4 rounds is
FFPts/G = 7.10  0.05*Pick + 0.08*GS
Basically, this equation says that, FFPts/G decreases by 0.5 for every 10 additional picks farther into the draft, and it increases by 1 for every 12 additional college games started. For example, Mark Sanchez started only 15 games at USC, but was the 5th pick in the draft. This translates to a career prediction of 8.01 FFPts/G, which happens to be ony 1 FFPt/G off from his actual career average thus far (9.01). Incidentally, this prediction is 1.50 FFPts/G better than the original, 2round LCF prediction, presumably because it heavily penalized him for having started only 15 games. In our model, however, pick is the best predictor, so the fact that he was drafted 5th overall ends up (correctly) being more important.
Well, hope you all enjoyed that. After the draft, I'll come back and use this equation to make predictions about the QBs taken in the first 4 rounds, especially if the 49ers happen to take one.
Loading comments...