What follows is a layman’s tour through STATISTICS PARK, in which I, the layman in question, at first marvel at the DVOAlbertasaurs and later, when the electric fences fail, try not to be eaten by any of the DYARodactyls that somehow evolved in this god-forsaken land. The truth is, this whole adventure sounded fun at first. But if there’s one thing I’ve learned along the way, it’s that Stats Will Find A Way. Stats. Will. Find. A. Way.
Actually, I’m not going to take on any of Football Outsiders hallowed ground. I’ll leave that to Florida Danny. But I am inviting you into the land of mildly advanced stats. See, last season I began to develop a simple system for projecting QB productivity for the coming season (lots of detail you may or may not want to deal with in that link). The original system lacked polish, but produced good enough results to leave me encouraged about it’s usefulness. The principles were simple:
- At up to and over 600 plays (attempts + sacks) per season, the quarterback is one of the only positions on the football field to accrue a significant enough number of attempts in a single season to approach the point at which critical rates become stable.
- The above being the case, a three-year sample should provide enough stability over a short enough period of time to develop reliable projections for the coming season. Confining the sample to three years avoids the problem of using, say, Tom Brady’s rookie season to project his 2013 stats, while also limiting the damage that a clear outlier, such as Josh Freeman’s 2010 season, can do to the projection.
That’s it. Use a QB’s statistical averages from the last three seasons to project the upcoming one. To start, that’s almost all you need to know. For those more interested in the details, I’ve gone ahead and tried to explain things -- and particularly some of the quirks of the system -- more fully after the big picture.
Quickly, though: Yes, these projections do show 44 QBs with 500 pass attempts. No, I do not project that 44 QBs will end the year with exactly 500 pass attempts. The pass attempt number is not what I’m interested in projecting here. The trick is that I’m not trying to answer the question, "What will this guy do in 2013?" Rather -- and it’s a subtle distinction -- I’m trying to answer the question, "If this guy plays enough to qualify in 2013, how will he perform?" So, if he does not play enough to qualify, then the projection doesn’t matter. And if he does, then I can simply adjust the pass attempts to match. More on all of that lower down.
What I’m hoping you’ll help me with is figuring out how to adjust the projections for guys who do not have the reliability of a full three-year sample to project from (rookies, first-, second-, or third-year starters, guys coming off of injuries, guys who haven’t started in a while etc.), and to try to help me identify players who may be in line for abrupt changes in productivity due to changes in the coaching or personnel around them (Sam Bradford and Matt Ryan were good examples of this artificial bump from last season).
For some of these low-sample players, I want you to know where I’m coming from, though, because they may seem odd. For rookies, I simply made projections for all QBs drafted in rounds 1-4. I did that to simplify the selection process, not because I actually expect Nassib to supplant Manning. To project rookies, I tried to find similar rookies based on Football Outsiders’s yearly Lewin projections. I also based the projections for rookies on the assumption that if they do play enough to qualify, and especially if they’re not first-round picks, it’s because they’re playing better than a particular awful rookie floor that would get them benched.
Similarly, I wanted context to help me project second-year starters, and the like, so I went through a while ago and pulled together some historical comps, some of which worked out better than others, as you might see in that link. I used those comps to guide those projections.
So: Comment Starters:
- How do you think I should adjust the projections for the rookies, first-, second-, and third-year starters, based on your opinions of these players? These are inherently my worst projections, and they could use a healthy dose of eyeball-test input to be improved.
- What players do you believe are in line for a coaching- or personnel-related bump to their averages, like Bradford or Ryan saw in 2012? Also, why?
- Am I missing anybody who might contend for starting snaps? Last year, I almost missed Tannehill, so this is a legitimate concern of mine.
Here’s the full chart, complete with all the rates. Feel free to ask me about anything if it looks like I’ve made a mistake -- that is possible. It’s pretty interesting. At least, I think it is. And it can be improved with your help. Please help me improve it.
Click to enlarge.
Now, the meat of this post is out of the way, and I invite you to just jump to the comments and go at it. If, however, you don’t mind reading more, the rest of the post is going to be dedicated to more fully explaining the system, its quirks, its challenges, and the solutions I’ve tried to implement.
There were other questions to be answered, though.
- How will overall productivity be measured?
DVOA and DYAR were both off the table because, for one, they are too difficult for me to calculate in a Google doc, and, for two, they apply weights to situations and, when possible, assuming a large sample of attempts, I prefer to apply weights to events. One solution to both of these problems is passer rating. A better solution is Adjusted Net Yards per Attempt (heretofore ANY/A). ANY/A applies yard values to touchdowns and interceptions (positive and negative values respectively), includes sacks as pass attempts and sack yards as negative passing yards, sums all of it with the QB’s total passing yards, and averages it per adjusted attempt (again, attempts + sacks). Simple. Elegant. Useful.
Now, ANY/A is not perfect. The yard values it uses for touchdowns and interceptions are rounded up and are not adjusted seasonally. This means that the yard value of a touchdown from 2001 is the same as the yard value of a touchdown from 2013 which is the same as the yard value of a touchdown in 1971 (and, worse, that the yard value of a touchdown in college is the same as the yard value of a touchdown in the pros). This seems unnecessarily messy on both counts all for the sake of convenience to me. But it remains, so far as I can tell, the best non-situationally weighted measure of production available.
Specifically, I’m using something approximating the averages from the last three years for completion percentage, touchdown percentage, interception percentage, yards per completion, sack percentage, and yards per sack. With those, I just plug in whatever "pass attempts" number I want, and the rest of the totals calculate automatically, including ANY/A.
- What happens when a player does not have three years of data to project from?
Good question. The answer is: I use the most recent two-, -one, or zero-year available sample and do my best with it. For young players with only one or two years of pro starting experience, I’ve begun working on that historical comp system to assist me in projecting development. For players just out of college, I’m trying to use FO’s Lewin Forecasts to produce historical comps for the same purpose.
- Doesn’t this system prohibit you from projecting development in young players?
It would, but I’ve made exceptions. I can project development in a player under the age of 27 if the upward trend is already apparent within the three year sample (i.e., if the TD rate has increased incrementally each year, I may use a developmental model rather than the straight three-year average).
- Okay, smartypants. What about the old guys?
Same deal. Players older than 30 with a decline trend apparent in the three-year sample can have a decline model applied to their projections as opposed to simply using the average.
- And what about coaching or personnel changes?
A coaching or personnel change that appears significant enough to either improve or diminish a QB's overall rates may be factored into the system, but the increases or decreases must remain modest. This came up with Sam Bradford last season. You may note that I did not give Bradford his coaching bump, as I should have, and my projection for him suffered as a direct result.
- That said, how do you account for playing time?
I don’t. I just ... don’t. I assume 500 attempts for the initial projection and change the attempts total in-season to account for actual play time. The rates take care of the rest.
The reason? I don’t care about how the projections turn out for guys who don’t play. I’m projecting what I think will happen if a guy does play. This complicates the concept. I have, for example, over 40 full-season projections. This is not because I believe 40 guys will be qualified passers. Far from it. It’s because if a likely backup does end up taking over a starter’s role and getting significant numbers, I want to have a projection available to compare against him. So, Mark Sanchez might start 16 games, or Geno Smith might start 12. I will be prepared for both of these events, and the ones that don’t happen are irrelevant. I don’t have a projection here for two-start Mark Sanchez. If Mark Sanchez only gets two starts, he’s irrelevant.
But I’m prepared for the possibility that Mark Sanchez will not be irrelevant. That’s how prepared I am!
There’s another way that this complicates the concept, and this is important: I have projections for all the rookies through round 4. Some of these guys will not play. Some of these guys will play a little and be terrible. The projection, though, is interested in players who get to play a lot, and it now officially assumes that a rookie who plays a lot generally has something of a floor to how bad he can be. Because if (guy you hate) is as terrible as you think he’s going to be, he’s not going to play enough to qualify (except in extreme first-overall-pick situations like Alex Smith and JaMarcus Russell). And if he doesn’t play enough to qualify, he’s irrelevant to the system. So the system assumes that if he qualifies, he’s, at a minimum, good enough to qualify. Circular, no?
That level of good doesn’t need to be very good, as Mark Sanchez has proven often enough. But it does need to be reasonably better than total crap.
- So, wait. What happens to the projections you have for guys who don’t end up qualifying?
They disappear. They don’t matter. I’m not projecting play time. I’m projecting performance in the event of play time. Play time is a pre-condition for the projection to be meaningful.
- Doesn’t that seem a little disingenuous?
Maybe. I’ll cop to that readily. But I’ll also say this: The entire concept -- the seed that this system grew out of -- was the idea that quarterbacks are projectable given ample samples. A projection that only plays out over the course of 50 attempts is going to have such a large margin of error to render it meaningless. And I’m interested in meaning.
- I guess that’s fair.
Not really a question there, but thanks.
And if you’ve come with me this far, thanks for reading. I’ll leave you as I started you, in Statistics Park.
After a harrowing night in which many cliche-spouting pseudo-analysts were lost to the vicious jaws of the terrible statisaurs that broke loose the evening before, our hero emerges unscathed. That is, unscathed physically, minus a few scratches. The truth is, he’ll never fully recover from the trauma of those events, but at least he made it out alive... for now.