In April of 2008 former football coach turned analyst Terry Bowden wrote a piece for Yahoo Sports entitled "Coaching by the Numbers" that centered upon what metrics mattered the most for college football programs and how best to win.
The gist of the article centers around his chart that depicts how the Top 10 college football teams in 2007 finished in a variety of statistical categories. The matrix highlights to the right how many top programs finished in the Top 10, Top 25, etc. for each category (e.g. Rushing Offense, Passing Offense, etc.)
Bowden provides a more detailed chart as well for those interested (click here).
Per Bowden's analysis he concludes that Rushing Defense is the single most important statistic in explaining the success of a college football program. His rationale is that five of the overall Top 10 teams in 2007 finished in the Top 10 for Rushing Defense and all ten of the Top 10 finished in the Top 25.
Second most in terms of importance (based off of the Top 25 Teams) is Scoring Defense according to Bowden.
The chart is an impressive compilation of data organized in an effective visual manner. I applaud Terry Bowden for compiling the list and sharing it with public.
I even tend to agree with the general conclusion that metrics such as rush defense, scoring defense, and turnover performance "tend" to be a good predictor of success in college football. However those areas are no guarantee of success and of course Bowden does not make that claim.
An aspect of Bowden's analysis that concerned me however was the problem that it only included data for the Top 10 teams and not the entire set of Division I college football teams. Thus the sample size seems too small to draw any conclusions and at the very least opens up some further questions.
In order to be more robust, I thought the analysis should look at the middle and bottom teams as well as see if the opposite is true for the extreme case, etc. Out of curiosity I decided to see what would happen to the hypothesis of Mr. Bowden in two different dimensions.
Number one I wanted to see what would occur if the data set were expanded to include all 119 Division I teams in 2007. Number two I wanted to statistically correlate each of the categories above versus "wins" to see what type of quantitative relationship was exhibited.
In other words, instead of just noting that five of the Top 10 Teams were in the Top 10 for Rushing Defense I wanted to see how strongly that metric correlated with wins using simple regression analysis.
For an example of what I mean, here are the resulting scatter plots of rush offense, rush defense, pass offense, and pass defense versus wins for comparison. The other categories I'll summarize in a table below for more convenient viewing.
1. Regression Analysis Plot for Rush Offense (Yards per Game)
As you can see from the above plot there is a fair amount of variation in the data when rushing yards per game are compared to wins. Teams win with both low rushing totals and high rushing totals. Overall the fit is not particularly good and the r-squared comes out as 0.15 when calculated. (Note: 0 = no correlation, 1 = perfect correlation).
2. Regression Analysis Plot for Rush Defense (Yards per Game)
What happens when the same analysis is performed for rushing defense? The result is indeed a better fit than above for this set of data at least and an improved r-squared of 0.47 in this case. So rushing defense (although still far from a perfect predictor of wins) at least correlates better to wins than rushing offense.
3. Regression Analysis Plot for Pass Offense (Yards per Game)
Passing offense in terms of yards per game is likewise not a very good predictor of wins for the data set in question. Teams again win overall with low passing yardage or with high passing yardage. This wide spread results in a very low r-squared of 0.04 or almost zero correlation.
4. Regression Analysis Plot for Pass Defense (Yards per Game)
Passing defense is only slightly better in this regard than passing offense. There are many teams that won games with both strong and weak passing defense in terms of pass yards allowed per game.
This result is of course possible when you realize that if a team has a strong rush defense, teams may simply opt to pass more against that team. The result is a team that gives up lots of passing yards but may still be a fairly good defense overall.
Of course there are a dozen or so different charts that could be checked and posted (e.g. total offense, total defense, pass yards per attempt, pass yards per completion, pass efficiency offense, pass efficiency defense, etc.). I took the most commonly quoted and available categories from the NCAA database and summarized the contents in a table below.
2007 NCAA Division I Summary Table
|Rushing Offense |
|0.15||Low correlation to wins. Using rush yards per carry instead of yards per game only moves the r-squared to 0.17|
|Rushing Defense |
|0.47||Medium correlation to wins. |
|Passing Offense |
(Yards Per Game)
|0.04||Extremely low correlation to wins. Often teams behind have to pass to catch up. The result is a lot of passing yards but not necessarily wins.|
|Passing Defense |
(Yards Per Game)
|0.09||Extremely low correlation to wins. Inverse of the above case. Teams ahead put in back ups and might surrender lots of passing yards yet still win the game.|
|Passing Offense |
(Yards Per Attempt)
|0.21||Low correlation but more relevant that simply looking at passing yards.|
|Passing Offense |
(Yards Per Completion)
|0.03||Extremely low correlation. Teams behind throw more passes especially near the end of the game which could account for this result.|
|Pass Efficiency Offense (Rating)||0.30||Low correlation but higher than other passing statistics.|
|Pass Efficiency Defense (Rating)||0.43||Medium correlation to wins.|
|Total Offense |
(Yards Per Game)
|0.29||Low correlation to wins. Gaining yards does not necessarily produce wins.|
|Total Defense |
(Yards Per Game)
|0.44||Medium correlation to wins. |
|Scoring Offense |
|0.51||Medium correlation to wins. Highest metric on the offensive side of the list. Points of course can be produced by offense, defense, and special teams and that effect is not accounted for here.|
|Scoring Defense |
|0.52||Medium correlation to wins. highest metric on the defensive side of the list. Just edges out scoring offense in the 2007 data set for highest r-squared honors.|
|Turn Over Margin|
(Plus / Minus)
|0.28||Low correlation overall.|
Surprisingly (or not so surprisingly depending upon your point of view I guess) nothing in this list strongly correlates (.8 or above) with winning games. Medium correlation is the best result obtained on a couple of metrics meaning that other factors not included here or in Bowden's chart strongly influence winning in football.
Even combining events such as rush yards, pass yards, and scoring offense together yields only a result around .51 for medium correlation. On the defensive side combining yards per rush, yards per pass attempt, and scoring defense only pushes the r-squared up to .56 as well.
Overall as Bowden's matrix indicated the defensive metrics tend to have more correlation with winning. However, these metrics are still medium in terms of correlation at best.
At the end of the day I suspect a couple of factors are at work here that would take a lot of work to untangle.
First is the problem concerning strength of schedule. The above data is not adjusted to reflect schedule difficulty. A team might rack up more yards versus weaker opponent while another equal team plays a tougher set of opponents. One might win more games (or lose more) accordingly and this could affect the r-squared values observed.
Second is the problem of "garbage time" in games. An excellent defense might look poor in pass defense for example if the starters are pulled and back up personnel play the entire fourth quarter. The team playing from behind may throw for a lot of big plays late in the game and narrow the gap. This could make pass defense appear less important in the end result than it might be if measured at the third quarter mark for example.
Third as coaches will tell you, these sorts of statistics are flawed in some respects when it comes to the actual way the game is evaluated and decisions are made in game. The above statistics reflect merely the end results of plays and ignore the specifics of the play call, down and distance, execution, and other factors.
For example, gaining three yards per rush might not sound impressive to fans. However if a team can average 3.0 yards 90 percent of the time on third and two against an eight man front, this keeps the chains moving and eventually helps a team score. That statistic can become more impressive when considering all relevant factors.
During the season and in the offseason coaches break the game data down much further than the simple tables above to analyze specific plays. Specific run plays (e.g. weak size zone run) out of specific formations (single back set or I formation set) versus specific defensive fronts (under, over, even, odd, and the location of the strong safety) are all charted and effectiveness is judged.
Logic, film observation, data tracking, analysis, and problem solving are all critical skills for football coaching staffs. Unfortunately the information made available to the public is too general to appreciate this aspect of the game. Terry Bowden's article is a step in the right direction though and I hope he explains more details in the future.