NFL: Examining the In-Season Correlation of Defensive Performance

Zach FeinAnalyst IOctober 10, 2009

DENVER - OCTOBER 04:  Elivs Dumervil #92 of the Denver Broncos celebrates his sack of quarterback Tony Romo #9 of the Dallas Cowboys with teammates Darrell Reid #95 and Wesley Woodyard #59 during NFL action at Invesco Field at Mile High on October 4, 2009 in Denver, Colorado. The Broncos defeated the Cowboys 17-10.  (Photo by Doug Pensinger/Getty Images)

Riddle me this: If many a fantasy analyst proclaim that it’s very difficult to predict how a defense will do before the season, then how much stock should we put into early-season performance?

Think about it. Is it at all likely that said defense will perform up to those numbers the rest of the season if they are so unpredictable and inconsistent?

Moreover, during which week of the season is it easiest to predict a defense’s stats for the remainder of the year? In other words, can you better predict the final 12 games using the first four, or the final four games using the first 12?

I looked at every team’s game-by-game stats since 2002 (gathered from to investigate. I compared defensive stats prior to and after a certain point of a season, as described above, as well as splitting up teams into four quartiles and seeing how each group performs the rest of the way.

First up is the relationship between in-season defensive performance. I found each team’s stats up to and after each game played (for instance, the average points allowed in the first three games and the final 13 games) and compared the two using correlation and average absolute error.

Quick explanation: Correlation is a number between one and negative-one that describes how close a set of values are related. A negative number, in this case, would mean that the more points allowed in the first half of the season, the less allowed in the second half, and vice versa; a positive number means that a team that allows a low number of points in the first half will also allow a low number in the second half.

The further the correlation away from zero, the more the two sets of data are related.

Average absolute error (AAE), on the other hand, is much easier to understand; it’s essentially the average difference between each of the two data sets.

The tables below show this data. The higher correlation and the lower AAE, the better the relationship is for each stat.

(The first row reads: The correlation between points allowed in a defense’s first game and points allowed per game in their remaining games played is .165.)

A lot to digest there. First of all, take a look at the correlations for passing yards and rushing yards. The highest correlation for passing yards is .211, after five games, whereas the correlation for rushing yards after just one game is .227.

In other words, it’s easier to predict a team’s rushing yards allowed for the rest of the season using just one week of data than it is to predict passing yards allowed at any points of the season.

Although sample size gets larger as the season goes on, that doesn’t mean that the autocorrelation—the correlation with itself—of each stat does too. After 15 games, you have a large amount of data, but there’s only one game left in the season.

The data show that the autocorrelations and AAEs for each stat are at their highest and lowest, respectively, near the middle of the season, when there is an equal number of games in each bucket.

Let’s now move our attention to our next topic. I split all 192 team-seasons into four quartiles based on their defensive performance after four, eight, and 12 weeks (one quartile is eight teams per year).

I then averaged the stats of all teams in each quartile the first N weeks (four, eight, or 12) and the remainder of the season. Are the best teams early in the season the best teams for the rest of the season?

(The first row reads: After four weeks, teams in the first quartile—those with the lowest in each stat—allowed 13.1 points per game, and they allowed 20.0 per game the rest of the season.)

Through four games and eight games, points allowed have a clear but weak autocorrelation, as the four quartiles finish in the same order the rest of the season.

But after 12 games, there’s no relationship at all—the teams in the fourth quartile allow almost as few points in the last four games as teams in the first quartile.

This is especially key for fantasy owners who look at a player’s schedule in the final weeks of the season during the fantasy playoffs.

As the first table shows, pass yards allowed does not have as strong an autocorrelation as rush yards allowed.

For each week, the quartiles’ pass yards does not line up for the remainder of the season; after four games, for instance, teams in the third quartile allow five less yards per game through the air as the second quartile.

This is not the same for rushing yards. Teams that are at the top or bottom of the league in rush yards allowed through four, eight, or 12 games usually are at the top and bottom, respectively, the rest of the season; after 12 games, in fact, teams regress only 45 percent to the mean in the final four weeks.

Turnovers face the same problem as passing yards—the third quartile (actually those with the second-most takeaways) has more takeaways than the fourth quartile the rest of the season at each interval.

Though the data isn’t divided into interceptions and fumbles, I can tell you that both have little autocorrelation from additional research: Interceptions have a year-to-year correlation of .09, and for fumbles recovered the correlation is .13.