Measuring the Accuracy of Baseball's Win-Percentage Estimators

Use your ← → (arrow) keys to browse more stories
Measuring the Accuracy of Baseball's Win-Percentage Estimators

There are several formulas out there that can be used to estimate a team's "real" record: Pythagorean Formula, Pythagenport, Pythagenpat, etc. Some use run differential and some use a run-to-runs-allowed ratio.

The question is: Which is the most accurate? The least?

Using the Lahman Database, I ran tests of 13 different methods on every team since 1921 (the end of the Dead-Ball era) to find the most accurate way to measure a team's expected record, with 1981 and 1994 excluded for obvious reasons.

The RMSE (root-mean-square-error) in the table is calculated by squaring the error (in this case, the difference of the team's actual wins and expected wins), averaging all of those numbers, then finding the square root of the average.

The formulas for each method are at the end of this article, to save space.

Here are the results.

RMSE of each method since 1921
Method RMSE
Pythagenport 3.990
Pythagenpat 3.992
Palmer-RPW 4.015
Tango-RPW 4.021
Pythag-1.83 4.022
Ben V-L 4.024
Pythag-2 4.096
RPW=10 4.104
Soolman 4.111
RPW=RPG 4.156
E.Cook 4.537
Double Edge 4.606
Kross 5.124


What's funny is that Clay Davenport, inventor of Pythagenport, denounced his method in favor of Pythagenpat, yet it is in reality the best method when compared to actual record.

Earnshaw Cook may have been the first to create a win-percentage estimator, and the Double Edge method created by Bill James was never actually used, so their finishing near last can both be forgiven.

The Kross method, on the other hand, cannot be, as it was supposedly a precise way to estimate winning percentage.

Using the Pythagenport formula, we can find out teams that have been lucky and unlucky, by comparing their actual wins to expected wins based on Pythagenport.

Team Wins Exp.Wins Diff.
Atlanta 62 68.5 6.5
Cleveland 68 73.9 5.9
Toronto 75 80.0 5.0
San Diego 54 58.0 4.0
Seattle 55 58.9 3.9
Philadelphia 77 80.3 3.3
Baltimore 63 65.7 2.7
Boston 83 85.7 2.7
Chicago Cubs 85 87.7 2.7
Oakland 64 66.5 2.5
Detroit 67 68.7 1.7
LA Dodgers 71 72.6 1.6
Arizona 71 72.0 1.0
Washington 54 54.7 0.7
Chicago Sox 79 79.6 0.6
St. Louis 75 75.4 0.4
Minnesota 78 78.3 0.3
NY Mets 79 79.1 0.1
NY Yankees 75 74.7 -0.3
Cincinnati 63 61.4 -1.6
Colorado 67 65.4 -1.6
Milwaukee 81 78.5 -2.5
Pittsburgh 60 57.4 -2.6
San Francisco 60 57.2 -2.8
Kansas City 60 57.1 -2.9
Florida 72 67.6 -4.4
Texas 69 64.4 -4.6
Tampa Bay 85 79.1 -5.9
Houston 74 67.7 -6.3
LA Angels 85 75.8 -9.2


Because the Angels have won so many close games, their closer gets more save opportunities, and they have won more games than expected. Tampa Bay is also at the bottom for "lucky" teams—and guess what, the Blue Jays should have more wins than them!

—————————————————————————————————————————————

Differential formulas

W% = X * (R - RA) / G + .5

Where X is for different methods...

Palmer-RPW: 1 / (10 * sqrt(runs per inning))

Tango-RPW: 1 / (RPG / 2 + 5), where RPG = (runs allowed + runs scored)/(games played)

RPW=10 : 0.1

RPW=RPG : 1 / RPG

~~~

Ratio formulas:

W% = (RS^x)/(RS^x + RA^x)

Where x is for different methods...

Pythagenport: 1.5 * log(RPG) + .45

Pythagenpat: RPG^.287

Pythag-1.83: 1.83

Pythag-2: 2

~~~

Others

Ben V-L: W% = 0.91 * (RS-RA) / (RS+RA) + .5

Soolman: W% = (0.102 * RS - 0.103 * RA) / G + .505

E.Cook: W% = 0.484 * RS / RA

Double Edge: W% = (RS / RA * 2 - 1) / (RS / RA * 2)

Kross: For teams with RS>RA, W% = RS / (2 * RA) , and for teams with RA>RS, W% = 1 - RA / (2 * RS) . I used the first formula for teams with an equal number of runs scored and runs allowed.

Load More Stories

Follow B/R on Facebook

MLB

Subscribe Now

We will never share your email address

Thanks for signing up.