Sabermetrics for Dummies: How-to Guide for MLB Fans to Learn the Ropes

Zachary D. Rymer@zachrymerMLB Lead WriterApril 25, 2014

So you want to be a baseball nerd.

Wise choice. Of all the types of nerd I've experienced, baseball nerd has been the most rewarding. It definitely tops that time I was into pogs, anyway.

But to be a baseball nerd, one must know sabermetrics. That's the field of study that Bill James once characterized, via SABR, as being the "the search for objective knowledge about baseball." It's more accurately characterized as baseball's answer to rocket science. It's complicated stuff.

Our goal is to make it less complicated by taking simple-as-possible looks at three topics: the best ways to evaluate hitting and pitching, and what to make of everyone's favorite saber-stat: WAR.

If you'll follow me this way...


The Best Ways to Evaluate Hitters

You don't just see average, homers and RBI when baseball telecasts introduce hitters. They now tend to include OPS, which is the most basic need-to-know saber-stat in existence.

For those who don't know, OPS is "on-base plus slugging," or a hitter's on-base percentage (OBP) plus his slugging percentage (SLUG). Crude as it is, it is a better reflection of a hitter's talent than the traditional trio of average, homers and RBI.

Hitters exist to score runs. Scoring runs is about getting on base and getting around the bases. The first talent is encapsulated in OBP. Because it measures how hitters use power to round the bases, the second is encapsulated in SLUG.

What OPS lacks, however, is context. It gives a good snapshot of a hitter's talent, but it doesn't tell you how good his talent compared to that of others.

That's where OPS+ comes in handy. 

Per—a site to knowOPS+ takes a hitter's OPS and adjusts for two things: league-average OBP and SLUG rates and the hitter's home ballpark.

Everything ends up on a scale of 100, with above 100 constituting "above-average" production and below constituting "below-average" production.

To see it in action, let's recall the 1997 National League MVP race between Mike Piazza and Larry Walker:

Mike Piazza vs. Larry Walker in 1997

By average, OBP, slugging and OPS, Walker was the better hitter. Being the better hitter helped him win the MVP because voters love hitting.

But had the voters been looking at OPS+, maybe they would have chosen Piazza instead.

It's not hard to find the source of OPS+'s conclusion. Whereas Walker played at hitter-friendly Coors Field, Piazza played at pitcher-friendly Dodger Stadium. Neutralize the two home parks using OPS+, and Piazza had the more impressive season.

That's how OPS+ can be used to compare contemporary hitters. But by virtue of OPS+ also adjusting for league-average OBP and SLUG rates, it's also good for neutralizing different run-scoring environments.

Thus, it's useful in comparing players from different eras.

We can use OPS+ to see that Miguel Cabrera's production in 2013, an extremely pitcher-friendly season, was actually better than Jason Giambi's in 2000, an extremely hitter-friendly season:

Miguel Cabrera's 2013 vs. Jason Giambi's 2000

If all you want is a quick snapshot of a hitter's talent, OPS is fine. But if you want to compare hitters, OPS+ is your huckleberry.

But if you want to get really nerdy, there's a stat that does OPS+'s job even better than OPS+. That would be weighted runs created plus (wRC+).

To get to know this one, though, you must first get to know weighted on-base average (wOBA).

While I'm generally fine with it, OPS does have a fundamental flaw. In simply adding them together, it assumes that OBP and SLUG are equals. 

Per FanGraphs—another site to knowOBP is actually about twice as valuable as SLUG. Then there's how SLUG assumes that doubles are twice as valuable as singles and so on. In reality, it's more complicated than that.

What wOBA does is correct for these imperfections by weighting aspects of hittingunintentional walks, HBPs, singles, doubles, triples, homers"in proportion to their actual run value."

There's some wizardry that goes into determining "run value," but the basic concept is that each of the above events influences a team's chances of scoring to a specific degree. It's these specific degrees that wOBA incorporates that OPS doesn't.

It's best to think of wOBA as a more accurate version of OPS, which is related to how wRC+ is more accurate than OPS+.

Like OPS+, wRC+ works on an above-average/below-average scale of 100. The difference is that wRC+ is sort of a cross between a rate stat and a counting stat, one that's designed to get at a player's offensive value by measuring it in runs.

It's complicated stuff, but the basic idea is to take a player's wOBA, throw in some league and park adjustments and more wizardry to convert it into how many total runs the player's wOBA is worth. Then that figure is taken and divided by the league average and multiplied by 100 to create wRC+.

To see it in action, let's compare what Josh Donaldson and Robinson Cano did in 2013:

Josh Donaldson vs. Robinson Cano in 2013

If we stopped at OPS, we'd be looking at Cano as the superior hitter.

But wOBA favors the season Donaldson had, and it's not hard to pinpoint why. Since wOBA doesn't care about intentional walks, he got a boost from drawing 74 unintentional walks to Cano's 49.

What wRC+ does is factor in how Donaldson played at pitcher-friendly Coliseum while Cano played at hitter-friendly Yankee Stadium. The edge for Donaldson helps explain his superior wRC+.

Here at the end, I'll say that you can get away fine with knowing just OPS and OPS+. But if you want to feel at home among baseball super-nerds, you need to know wOBA and wRC+.

Now then, what's say we talk pitching?


The Best Ways to Evaluate Pitchers

We haven't gotten to the point where baseball telecasts are as enlightened with pitching stats as they should be. They're still mainly about wins, losses and ERA.

I'm going to assume you already have a basic understanding of why wins and losses are bogus. ERA, thankfully, is less bogus.

But it's still far from perfect.

Like wins and losses, ERA can be influenced by things outside of a pitcher's control, particularly the talent level of his defense. Even if a defense doesn't make many errors, it can still struggle to convert batted balls into outs, which can hurt a pitcher's ERA.

This is why we have fielding independent pitching (FIP), expected fielding independent pitching (xFIP) and skill-interactive ERA (SIERA), which estimate what a pitcher's ERA should be based on the things he can control.

FIP is the simplest of the bunch, as it focuses on just four controllable outcomes: strikeouts, walks, HBPs and homers. This family-friendly video will help explain why just these four:

One gripe with FIP is that pitchers only have so much control over homers, which is true.

This is where xFIP comes in handy, as it replaces a pitcher's homer total with an estimate for how many homers he should have allowed. That's acquired by multiplying the league-average home run-to-fly-ball rate (HR/FB) by the pitcher's fly-ball rate (FB%).

The next gripe is that pitchers must have some control over batted balls. Specifically, since ground balls are good, shouldn't pitchers who get a lot of them be rewarded?

Enter SIERA. It focuses on the same things FIP and xFIP focus on, but it actually tries to make something of batted balls.

Notably, Baseball Prospectus—yet another site to know—says it recognizes is how "run prevention improves as ground ball rate increases." That makes sense given that ground balls A) rarely go for extra-base hits, B) are easily converted into outs and C) get double plays.

Let's stop and consider last year's two ERA champions: Clayton Kershaw and Anibal Sanchez.

Clayton Kershaw vs. Anibal Sanchez in 2013, Part I

It seems ridiculous, but you have to consider what those stats look at.

With FIP, it's about strikeouts, walks, homers and HBPs. Kershaw had the lower walk rate, but Sanchez had the higher strikeout rate. Elsewhere, there's no big difference in how many batters they hit and no difference in their HR/FB rates.

With xFIP, it's about neutralizing a pitcher's home run total based on his FB%. When you look at the FB% of both pitchers, you see no big difference. Kershaw's was slightly lower, hence the slightly lower xFIP.

When it comes to SIERA, ground-ball percentage (GB%) is the key. Kershaw had only a modest advantage in GB%, which helps explain his merely modest advantage in SIERA.

So if not their pitching, what was the real difference between the two? 

Primarily, it was defense.

Kershaw pitched to a hugely superior defense, as Baseball Prospectus had the Dodgers at No. 9 in defensive efficiency, which simply measures the rate at which balls in play are converted into outs. Sanchez, meanwhile, pitched to the No. 27 defense.

One more question before we move on: What can we use to compare pitchers from different parks and/or different leagues?

At the least, there's ERA+. It does the same thing OPS+ does for OPS in that it takes ERA and adjusts for leagues and parks. Once again, anything above 100 is above average and less than 100 is below average.

Using this stat, we can see that Kershaw's 1.83 ERA from 2013 is laughably inferior to Pedro Martinez's 1.74 ERA from 2000:

Clayton Kershaw's 2013 vs. Pedro Martinez's 2000

Regarding the ERA estimators, another great thing about SIERA is that it's league- and park-adjusted by default. FIP and xFIP are not, which is why FIP- and xFIP- exist.

Like ERA+, FIP- and xFIP- work on a scale of 100. The difference is that, as the minus indicates, everything below 100 is above average and anything above is below average. 

Let's take these two stats and apply them to Kershaw and Sanchez.

Clayton Kershaw vs. Anibal Sanchez in 2013, Part II

Here the advantage is with Sanchez in both categories. This is due to how, per FanGraphs' park factors, Comerica Park was a less friendly pitching environment than Dodger Stadium. 

So yeah. Not only can it be argued that Sanchez was Kershaw's equal in 2013, it can be argued he was actually better. Because sabermetrics!

(Disclaimer: Don't take all this to mean I'm anti-Kershaw. I actually heart him a lot.)

By this point, you should be feeling properly nerdy. The next time you gather with your pals at the ballpark, you're going to be able to impress/befuddle them with talk of not only OPS+, wOBA and wRC+, but of FIP, xFIP, SIERA, ERA, FIP- and xFIP- as well. 

But you also need to be able to talk WAR.

...Which isn't easy.


What to Make of WAR

If you missed the 2012 American League MVP debate, WAR stands for wins above replacement and is a doozy of a concept.

In the words of FanGraphs, WAR attempts to "summarize a player’s total contributions to their team in one statistic." It does that by quantifying how many more wins a team has gained from a player than it would have from a replacement-level player, such as a benchwarmer or a minor league scrub.

For pitchers, finding WAR mainly comes down to innings pitched and runs allowed. The innings part is easy, but figuring out the runs part is a matter of preference. looks at total runs allowed, while FanGraphs uses FIP as a foundation.

As such, FanGraphs WAR (fWAR) is more about hypothetical runs allowed and, thus, hypothetical value. WAR (rWAR), by comparison, is more about actual value. 

Regardless, I personally think WAR works fine for pitchers. In focusing primarily on innings pitched and runs allowed, it certainly highlights a pitcher's value to his team more than his record does.

I also like how it doesn't downplay the importance of innings. It rightfully values starters over relievers, and it can also give innings-eaters the credit they deserve.

The battle between Justin Verlander and David Price for the 2012 AL Cy Young makes for a good example:

Justin Verlander vs. David Price in 2012
PlayerIPRuns AllowedERArWARFIPfWAR
Price211.0632.566.93.054.8 and FanGraphs

Verlander's edge in innings pitched was notably larger than Price's advantage in runs allowed, and their ERAs and FIPs were roughly equal. Verlander basically did what Price did over more innings. Yet another case of voterscewedupus.

That's my take on WAR for pitchers: It's as simple as it is effective. You have to be mindful of which WAR you prefer, but you should feel free to use WAR when talking about pitchers.

With position players, though, caution is recommended.

In theory, WAR is even more ideal for position players than it is for pitchers. Whereas hitting, baserunning and fielding are secondary concerns for pitchers, hitters must do all three. Since WAR attempts to encompass how many runs hitters produce with their hitting and baserunning, and how many they take away with their defense, it should be the perfect stat for them.

It's not. Not quite, anyway.

The hitting and baserunning parts of WAR are solid. Both and FanGraphs use wOBA as a base for hitting value, and a system of credits and debits on specific plays—stolen bases, caught-stealings, first-to-thirds, tag-ups, outs on the bases etc.—for baserunning value.

But defense? That's the tricky part.

For starters, and FanGraphs base the defense portions of their WAR calculations on different metrics. uses defensive runs saved (DRS). FanGraphs uses ultimate zone rating (UZR).

How these two stats work is beyond complicated, but Fox Sports' Gabe Kapler summed up DRS well here:

UZR works in a similar way, and both UZR and DRS ultimately attempt to calculate the same thing: how many "runs" above or below average a player is on defense.

But since you have two different systems, you have the potential for two different outcomes. And that happens a lot. In fact, it's rare that a player's DRS and UZR look exactly alike, which means it's rare that a player's rWAR and fWAR are going to look exactly alike.

Then there are the times when the two systems completely disagree, as they did with Jhonny Peralta in 2012:

Jhonny Peralta's Defense and WAR in 2012
- and FanGraphs

DRS had Peralta as a below-average defensive shortstop, which influenced his rWAR coming out to just 1.1. UZR, however, had him as an above-average defender, which influenced his fWAR coming out to 2.5.

Peralta's situation highlights not just how rWAR and fWAR are often going to disagree, but how the DRS and UZR systems themselves are imperfect. As carefully crafted as they are, they're subjective—certainly more subjective than the other stats that go into calculating WAR, which is an issue.

This doesn't mean that WAR is totally invalid as a measuring stick for position players. It just means that WAR can't be the beginning and end of debates. It must be part of an argument, not the argument.

I think that's our cue to call it a day. Provided I did my job and you were paying attention, you're now more of a baseball nerd than you were before. Congratulations.

One word of warning, though. This might start happening to you now:

Just a heads up.


Note: Stats courtesy of unless otherwise noted/linked.


If you want to talk baseball, hit me up on Twitter.

Follow zachrymer on Twitter


    Darvish Placed on 10-Day DL with Triceps Tendonitis

    MLB logo

    Darvish Placed on 10-Day DL with Triceps Tendonitis

    Adam Wells
    via Bleacher Report

    Kershaw Nears Return from Biceps Injury

    MLB logo

    Kershaw Nears Return from Biceps Injury

    Vlad Jr. Might Be the Future of Baseball

    MLB logo

    Vlad Jr. Might Be the Future of Baseball

    M's Acquire Colome, Span from Rays

    MLB logo

    M's Acquire Colome, Span from Rays

    Kyle Newport
    via Bleacher Report