The Chicago Cubs' Perfect Batting Lineup

David EisleyMay 11, 2009

Another caller on sports radio screams about the Cubs lineup.

“Soriano is not a leadoff hitter!”

“Get Derrek Lee out of the three hole! He’s lost his power!”

TOP NEWS

And so it goes for Lou Piniella.

But why is everyone so concerned about the batting order in the first place? After all, Billy Martin pulled his out of a hat in 1972 and won the front end of a doubleheader. In the second game, he reverted back to his traditional lineup and lost.

While that is just one example, I think it begins to make a point.

Traditional thinking has a speedy contact hitter in the leadoff spot, with a good situational contact hitter in the two hole. Spots three through five are typically your best hitters, since they have to drive in those who have reached base before them. Those in the six to eight spots (or nine in the AL) get progressively worse until you reach the (sometimes) woeful pitcher.

On the surface, it makes sense: Get your best hitters to bat more times than your lesser batsmen.

But how do you define your best hitters? Batting average? Power? On-base percentage? Runs batted in? Runs scored? A combination of all of the above?

Baseball is a simple game. If you score more runs than the other team, you win the game. You are given 27 outs to accomplish this. Every batter that does not reach safely brings your team one-27th closer to the end of the game. The longer you can prolong the contest offensively, the better chance you have to win.

Conventional wisdom tells us that you should bat your most prolific hitter third or fourth in the lineup. But is this true?

Baseball Between the Numbers, written by Baseball Prospectus, shows the major flaw with this theory. They point out that, in 2004, Barry Bonds mainly batted fourth, which would seem to make sense. But the Giants would have scored 10 more runs that season batting Bonds in the leadoff spot.

Why? Because of BLOOP.

BLOOP is the Baseball Lineup Order Optimization Program, designed by Baseball Prospectus. The program uses statistical probability to find the ultimate batting order. Each player is assigned a range of probabilities for different events (hit, strikeout, walk, home run, double, etc.).

Here is their basic example:

Joe Smith: single (23.2 percent), double (3.5 percent), triple (1.2 percent), home run (0.2 percent), walk (6.9 percent), and strikeout (one percent) all equal an on-base percentage of .350 (35 percent). During any simulated at-bat, if the random number generated by the program is less than .350, he reaches base. If the number falls between 0 and .232, he gets a single, between .233 and .267 he gets a double, and so on.

Baseball Prospectus used BLOOP to simulate thousands of seasons using different lineups. They arrived at some definitive conclusions.

Before we get back to the computer, here was the lineup Lou Piniella used on Opening Day, 2009:

1. Alfonso Soriano, LF

2. Kosuke Fukudome, CF

3. Derrek Lee, 1B

4. Milton Bradley, RF

5. Aramis Ramirez, 3B

6. Mike Fontenot, 2B

7. Geovany Soto, C

8. Ryan Theriot, SS

9. Carlos Zambrano, P

There are many flaws with this lineup, so let’s get back to BLOOP. Over thousands of simulated games, the difference between the most and least optimal lineup (using the same personnel) was 26 runs. Although that doesn’t sound like a lot, it could be the difference of two-and-a-half games in the standings.

Even if the manager used a lineup somewhere between the most optimal and the least optimal, the difference was about 10 runs, which could cost a team a crucial win over the long haul.

The most important variable in the lineup is on-base percentage, not slugging percentage or batting average. It's pretty simple—give hitters who have the least chance of making outs more plate appearances.

I will note that BLOOP did not take into account stolen bases, ballpark dimensions, or who the opposing pitcher was. However, are those factors that important?

Traditional thinking tells us that we must protect our good hitters with another good hitter so they may see better pitches.

Can that be done? Baseball Prospectus took a look at the 2001-2003 San Francisco Giants and Barry Bonds. How could the Giants force the pitcher to actually throw Barry a strike?

From 2001 to 2002, Jeff Kent protected Barry Bonds. He was a good hitter, and in those two seasons, Bonds was only intentionally walked 34 times, even though he was hitting home runs at a historical pace. In late 2002, with the Giants fading, they swapped Kent and Bonds in the order.

It didn’t work. Before the switch, Bonds walked 93 times in 293 plate appearances and hit a home run every 7.8 at-bats. After the switch, he walked 87 times in 268 plate appearances (about even), but his home run production slipped to one every 11.1 at-bats.

In 2003 and 2004, with Kent gone, the hitters behind Bonds worsened. He was now protected by players like Edgardo Alfonzo, Benito Santiago, and Pedro Feliz. In 2003, Bonds was pitched to more than the previous two seasons, and his home run rate remained consistent (around one in 8.5).

So, what should the Cubs lineup look like? Based on career statistics (including 2009 thus far), here is how the order should look based on BLOOP (on-base percentage) Ranking:

1. Kosuke Fukudome, CF (.374 OBP, .405 SLG)

2. Milton Bradley, RF (.369 OBP, .454 SLG)

3. Derrek Lee, 1B (.365 OBP, .495 SLG)

4. Ryan Theriot, SS (.363 OBP, .376 SLG)

5. Mike Fontenot, 2B (.361 OBP, .452 SLG)

6. Geovany Soto, C (.356 OBP, .470 SLG)

7. Aramis Ramirez, 3B (.342 OBP, .503 SLG)

8. Alfonso Soriano, LF (.329 OBP, .519 SLG)

On days when subs were used:

* Micah Hoffpauir, 1B, would bat second (.374 OBP, .530 SLG)

* Reed Johnson, CF, would bat sixth (.344 OBP, .408 SLG)

* Koyie Hill, C, would bat eighth (.275 OBP, .303 SLG)

* Aaron Miles, 2B, would bat eighth (.327 OBP, .361 SLG)

Since many of the 2009 performances (Derrek Lee, Geovany Soto, and Milton Bradley) have not approached their career statistics, the lineup should be adjusted after a body of evidence has been presented. If you took the 2009 performance thus far, this is how the lineup would appear:

1. Kosuke Fukudome, CF (.449 OBP, .543 SLG)

2. Aramis Ramirez, 3B (.417 OBP, .591 SLG)

3. Ryan Theriot, SS (.366 OBP, .444 SLG)

4. Alfonso Soriano, LF (.331 OBP, .555 SLG)

5. Mike Fontenot, 2B (.321 OBP, .427 SLG)

6. Milton Bradley, RF (.321 OBP, .328 SLG)

7. Geovany Soto, C (.297 OBP, .195 SLG)

8. Derrek Lee, 1B (.282 OBP, .363 SLG)

On days when subs were used:

* Micah Hoffpauir, 1B, would bat fourth (.343 OBP, .525 SLG)

* Reed Johnson, CF, would bat seventh (.316 OBP, .265 SLG)

* Koyie Hill, C, would bat 3rd (.390 OBP, .444 SLG)

* Aaron Miles, 2B, would bat eighth (.266 OBP, .274 SLG)

While the raw data show who is in the lineup is more important than where they bat, we all love to talk about the batting order.

However, the conclusions are pretty clear:

1. Protection is overrated: There is no evidence that a batter sees better pitches to hit based on who is behind him.

2. The conventional lineup most managers use is flawed: Sorting a lineup in descending OBP yields the most runs. Players with the highest OBP at the top of the order see 54 more plate appearances per year.

3. The more runs you score, the more games you should win.