# Checking Up On Pythagoras' Midseason Predictions

James HulkaAnalyst IJune 30, 2008

The math geniuses and statistical gurus in sports (and academia on occasion) love to see how the numbers stack up in predicting results. When a set of numbers doesn't accurately describe a result, statisticians then look for something else to explain the disparity.

When it comes to wins and losses, we go back a couple thousand years and bring up everyone's friend from high-school algebra—the Pythagorean Theorem.

For those of you who have blocked most of high school and college mathematics from your memory, let's refresh it for a moment. It's used to solve quadratic equations, and can tell you what the length is of the third side of the triangle.

In baseball, examining a team's offense (runs scored) versus it's pitching and defense (runs allowed) has shown to be a pretty accurate assessment of a team's wins and losses. As the standings are now, the Tampa Bay Rays have the best record in baseball at 50-32, one of only three teams (Chicago Cubs and Boston Red Sox being the other two) with 50 wins, two weeks before the All-Star break.

While at the beginning of the season the experts might've predicted Boston and Chicago being at the top of their leagues in wins, the Rays would not have been choice No. 3. When you look at their run differential (+56), it's not surprising that the Rays are baseball's winningest team right now.

Let's look at how actual wins and losses compare to what is expected, based on the Pythagorean Theorem. MLB.com's standings page has this information, but you can check it yourself by doing the following.

1) Square the number of runs scored

2) Square the number of runs allowed

3) Sum the Squares.

4) Divide the Square of Runs Scored by the Sum of the Squares

5) The answer is the percentage of games expected to win. Multiply by total number of games played to get expected number of wins. So let's check the standings again.

Team (AL)      W - L      Expected W - L    Difference

Tampa Bay     50-32          47-35                +3

Boston          50-35          50-35                  0

New York        44-39         44-39                  0

Baltimore        41-40         40-41                  +1

Toronto          41-43         45-39                 -4

Chicago          47-35         50-32                  -3

Minnesota       45-38         42-41                 +3

Detroit           42-40         41-41                  +1

Kansas City     38-45        37-46                  +1

Cleveland        37-46         42-41                  -5

Los Angeles     49-34        42-41                  +7

Oakland          45-37         48-34                  -3

Texas             43-41         41-43                 +2

Seattle            31-51        35-47                   -4

Team (NL)       W - L      Expected W - L     Difference

Florida             43-39         39-43                 +4

New York         40-42         40-42                   0

Atlanta            40-43         46-37                  -6

Washington     33-51         30-54                  +3

Chicago           50-33          51-32                  -1

St. Louis         48-36          45-39                 +3

Milwaukee        44-38          40-42                 +4

Houston          40-43          38-45                 +2

Cincinnati         39-45          36-48                 +3

Pittsburgh        38-44         36-46                  +2

Arizona            42-41          43-40                 -1

Los Angeles      38-44          41-41                 -3

San Francisco    36-47          37-46                -1

San Diego         33-51         33-51                  0

Pythagoras was right on four teams (Red Sox, Yankees, Mets and Padres). 33 percent of the predictions were within one game of being correct. The root mean square error for this data set is about 3.1. What this means is that 21 teams should be accurately predicted within three games. 22 of the teams satisfy this condition.

There are other things to notice from these numbers. First—the Rays have exceeded expectations, but not by as much as the numbers would indicate. The pitching has been solid, especially in the bullpen, while the starters have been given enough offense to win games despite no hitter having a great year.

Second, it should come as no real surprise that the NL West, which many expected to be a competitive division, has been the exact opposite to the midway point of the season. They have no team exceeding its predicted record.

Third, most expected the NL Central to not be very competitive. Chicago was really the only team expected to do much this season. While they only fell one game short of predictions, the rest of the division has done better than expected. The Pirates may be in last place, but they're not the doormats they have been in previous years. Great pitching has helped the Cardinals and Brewers to better-than-expected records.

Let's take a look at those eight teams who didn't fall within the three-game error range of their predictions.

Over                                      Under

Angels +7                             Blue Jays -4

Marlins +4                             Mariners -4

Brewers +4                           Indians -5

Phillies -5

Braves -6

The statistics that usually best explain the largest differentials between actual wins and expected wins are records in one-run games and extra-inning games.

The Angels are 16-11 in one-run games and 2-2 in extra innings. However, they have the MLB leader in saves (Francisco Rodriguez) and an excellent record in two-run games.

Florida is 13-10 in one-run games and has split eight extra-inning games. The one advantage they do have is having played six more home games than road games at this point in the season.

Milwaukee is a ML best 17-7 in one-run games.

On the flip side, Atlanta is an ML worst at 4-21 in one-run games and 1-7 in extra innings. This easily explains why a team with a +40 run differential and the best ERA in the National League is three games under .500.

Philadelphia is 13-16 in one-run games—the worst record in such games of any team currently leading their division.

The Blue Jays and Mariners are 12-20 and 8-15 respectively in one-run games, and at or near the bottom of the league in runs scored.

Cleveland is a so-so 6-8 in one-run games. However, they've won all four of their games this year that were decided by at least ten runs.

It will be interesting to see how things shape up the remainder of the season.

Will Cito Gaston's return to Rogers Centre spur the Jays' offense to reduce the strain on their pitching staff?

Will the Braves start winning the close games now that they have closer Mike Gonzalez back from the DL, and a hopefully healthy duo of switch-hitters clicking together in the heart of their lineup?

Will the Marlins come back to Earth after a first half that saw them exceed expectations, mostly because of their team power?

Will the amount of offense coming from the Angels' lineup continue to be just enough?

In three months, we'll find out.

