History of WAR
Baseball Reference defines WAR as the number of wins a player adds to a team in a single season. It is widely accepted as an accurate predictor of a player’s value by statisticians, baseball analysts and sportswriters.
WAR was created during the sabermetric movement in the 1980’s. The term sabermetric comes from the Society of American Baseball Research (SABR) and was coined by the great Bill James, night-shift security guard turned baseball writer.
The purpose of the creation of sabermetrics was to give an unbiased view of various qualities of baseball players.
The old but still prominent stats used to describe players include runs batted in (RBI), runs (R), batting average (AVG), stolen bases (SB), and home runs (HR) for batters and wins (W), strikeouts (K), saves (SV), earned run average (ERA) and walks and hits divided by innings pitched (WHIP) for pitchers.
This system was devised by sportswriter and baseball enthusiast Henry Chadwick during the middle of the 19th century. Since this was around the time baseball became a professional sport, not much was known about the intricacies of the game, and the stats reflected that.
It is now known that none of the aforementioned stats take solely into account the individual contributions of a player, and so do not show how well a player will truly perform—on any team, in any ballpark, in either league.
Here's what I mean: Dante Bichette of the 1999 Rockies had 133 RBI off 34 HR, 104 R and a .298 AVG. Awesome stats, right? But his WAR value for that season was -2.8. What does that tell us? Basically, he got lucky.
There was usually someone on base when he got a hit, and whenever he was on base, someone hit him in. Bichette can’t control either circumstance, but he can reap the benefits and have his stats show for it.
But even if he was lucky, that still counts for something, right? Who cares if it was only good fortune that got him 104 runs and 133 RBI? He must have helped the team win, but his WAR value suggests otherwise. So where does the disjoint come from?
Well, as WAR is supposed to portray the overall value of a player to the team in terms of wins (which are ultimately what make a team successful), fielding must be taken into account, as well as batting. And that’s where Dante struggled.
In 1999, he posted a .951 fielding percentage. (It’s hypocritical for me to use fielding percentage as an indicator, since the scoring of an error, which is what determines fielding percentage, is subjective, and I’m explaining why you can’t trust those stats. However, it gets my point across.) To be such a detriment in the field obviously affects a player's overall value, and thus the wins he will create.
I will not attempt to explain how to calculate WAR, since different people calculate it slightly different ways, all of which are complicated, but non-predictor statistics show that it is one of the best estimators for how valuable a player is to any organization.
It was a really innovative idea to describe a player’s value through wins generated—easy for fans to understand and applicable to any team. But how realistic is the number the formula churns out? Well, it’s hard to tell on an individual basis, since players aren’t credited with wins. More interesting would be to look at who the wins are really affecting: the team.
WAR on a Larger Scale
WAR, as I said earlier, stands for Wins Above Replacement, or the number of wins a player will generate above a replacement player from the bench or minors.
There are many different definitions of a replacement player, from different sources, for different positions, for different teams. Obviously, the way I choose to define the replacement player will majorly affect my conclusion.
Initially, I chose a replacement player to hit 85 percent (as well as the average major leaguer), because it was a system I saw reoccurring.
Under that assumption, a team of replacements would play 85 percent as well as an average major league team and win 85 percent as many games. If you give an average team a record of 81-81, or a winning percentage of .500 (since I’ll be dealing with teams from before the 162-game era), a replacement team would earn about 69 wins, good for a .425 winning percentage.
If you added up the WAR values for every player on a team and added that sum to 69, you would theoretically have the total number of wins that that team earned in a given season. So does that actually work?
No. Not for any sort of team. Let’s look at each case.
Winning teams have winning players. Those players have high WAR values. As a team earns more wins, the WAR values of the players on that team go up. So in that sense, WAR does correlate to the number of wins a team earns: A higher WAR leads to more wins and a lower WAR leads to fewer wins.
However, those values tend to overestimate the actual number of wins a team earned.
For average teams, take a 162 game season, for the purpose of convenience. If a team produces 81 wins, that’s about 12 more than a replacement team. So, the sum of the WAR values for the entire team, or WAR Sum (WARS), should equal about 12. Rarely is that the case.
For losing teams, you’re going to need a pretty negative WARS, since 69 wins would be generated with a WARS of zero. If you look at some of the most losing teams in baseball history, their WARS don’t even come close to the negative values that would accurately predict the number of wins.
I also checked the winning percentages affected by batting WARS and pitching WARS, because in nearly all cases, one of those calculations is much closer to the actual winning percentage than the cumulative WARS of batting and pitching.
I found that, apart from Pythagorean winning percentage (based off runs scored and runs allowed), none of the WARS were particularly accurate, at least with the 85-percent replacement player.
I next began to mess around with the WARS-Affected Winning Percentage (WARSAWP, it’s easier than writing the whole thing out).
It isn’t particularly practical to use just a batter or pitcher WARS because that only gives a snapshot of half the team, even if it’s a seemingly accurate snapshot. And it turns out that neither pitcher nor batter WARS correlates to the team’s actual winning percentage. None of linear, quadratic, logarithmic, power or exponential regressions returned a promising correlation coefficient for either.
So, back to total WARS. As I said earlier, it uniformly overpredicts a team’s actual winning percentage. But it overpredicts consistently. In fact, for my little sample of 15 teams, there’s a constant relationship with a .971 coefficient of correlation. That means that according to my sample, which is insignificant but insightful, you can estimate a team’s winning percentage to within five percent of the actual value. All you have to do is subtract .118 from the WARSAWP.
I’m sure that constant will change as more winning percentages are analyzed and become more accurate. However, since I included an equal number big-winning teams, big-losing teams and completely average teams, the constant should only get more accurate.
The next step would be to calculate the WARSAWP of a bunch of other teams, compare the resultant values with the corresponding actual values and do a regression analysis (or just calculate the mean difference between predicted and actual, since it’s a constant difference).
While that is one way to make WAR a team predictor, another fix may be the replacement player definition that I discussed earlier. Obviously, if I used a different percent value for replacement players, the entire set of data would change and maybe it would be more accurate.
For instance, I found out after I had analyzed the above data that FanGraphs claims a team of replacements would win 49 games in a 162-game season. This was a pretty applicable way to express replacements for my experiment, so I did a quick check.
After checking a few teams, I realized that my new winning percentage was coming a lot closer to the actual winning percentage of my sample teams. That makes sense because using 49 games as opposed to 69 games is the constant adjustment I was talking about. However, the adjustment is done ahead of time and isn’t arbitrary for random samples.
This newly adjusted WARSAWP was coming within single points of the actually percentage, thus completing my investigation. Now we know that WAR isn’t just good at showing the individual contributions of player, but it also works well on a larger scale.
A flock of fantasy baseball players have jumped on the sabermetric wagon and designed their drafts entirely based off the intricate formulas designed by baseball's "scientists." After all, if you can reduce your draft to a perfect science, you have a better shot at winning, right?
While there is definitely some reward that comes from plugging numbers into an algorithm that always tells you which player is your best pick, I'm going to say that WAR probably isn't one of the numbers you want to use.
I've just shown that it's actually great for predicting wins for a team, but just as it isn't a particularly applicable number for individual players, it probably isn't applicable for your fantasy team.
Take the aforementioned case of Dante Bichette. He had stats that were worthy of a starting spot on your fantasy team, but he has a severely negative WAR.
So, until fantasy baseball moves away from scoring based on those classic ten stats, your best bet is to keep using those ten stats in your algorithms—or just pay ESPN to do it for you.
Value Over Replacement Player (VORP) calculates the number of runs a player should score above a replacement player—similar to WAR, but in terms of runs. While less applicable for a team in the wins sense, this is very relevant to Pythagorean winning percentage, which displays the number of wins a team should have earned given runs scored and runs allowed.
Pythagorean isn’t really useful as a predictor though. In other words, it can’t tell you how many wins you will get next season. If combined with VORP, however, and put through a similar method to the one I’ve described above, you have another predictor that can estimate how many wins you’ll get.
A comparison with WARSAWP could generate an even more accurate predictor that is subject to much less variation.
I’d love to hear what you think as I’m sure that there are improvements that can be made.
Also, if there's anyone who's willing to continue my method for a much larger sample, let me know, and I can share my data with you.