Sean Forman: Interview with the Mind Behind the Numbers at Baseball-Reference
If you are a true baseball fan, you're probably a regular visitor to Baseball-Reference.com.
If you are a sports writer, you most likely set your sails for Baseball-Reference.com prior to writing your baseball pieces.
What exactly is Baseball-Reference.com?
It is a virtual warehouse of statistics covering every player who has played an inning of Major League Baseball.
It is the vast file cabinet where all baseball numbers reside, waiting to be retrieved at your command.
If you want to know what Roy Halladay's career record is against the New York Yankees, you will find it there. By the way, as of this morning (Saturday, 6/19/2010) he is 18-7 with a 2.98 ERA, and the Yankees are collectively batting .241 against him.
All of that information is found with less than a handful of mouse clicks. Impressive?
I wouldn't think of beginning a baseball article without spending some time at Baseball-Reference.com.
I must warn you, if you have never visited the site, be prepared to clear your day calendar as it is addictive and can cause the clock on the wall to go by much faster than usual.
I enjoy the site so much (I'm a subscriber) that I wanted to find out more about the site and discover a little about the man behind the statistic factory known as Baseball-Reference.com
The following is an interview I conducted with the founder of the site, and President of Sports Reference LLC, Sean Forman.
CE: Sean, first I want to thank you for taking the time to answer a few questions and to enlighten the public on the many attributes of Baseball-Reference.com. My first question is, when did you launch the site?
SF: April 1, 2000. I had to wait until I could find a cheap web host offering 200MB of disk space.
CE: My how times change. Today, 200MB is a bb rolling around in a boxcar. What was the impetus for this endeavor, what drove you to develop such a fantastic source for baseball fanatics?
SF: Basically, at that time there was no site that had historical stats. You couldn't get Babe Ruth or Ty Cobb's stats on the internet. I also thought that the Internet was the perfect form for an encyclopedia. Centrally updated for all, lots of good hyperlinking and infinite space available.
CE: Do you have a background in sports, mathematics, computers or what is the extent of your knowledge and experience?
SF: I have a Ph.D in Applied Mathematical and Computational Sciences from the University of Iowa. I played a lot of sports growing up through high school. Golf was actually my main favorite. I've always loved sports stats from sorting my baseball cards by stats on the back, keeping track of my fantasy league's stats, or collecting tackle stats for my dad's football team while in junior high.
CE: Almost everything on the site is free. The "Play Index" area (probably 95 percent) is free to sample and experiment with, but due to the enormous amount of information that is crunched in such a short time, a subscription is required to gain full access to it. What are some of the things it enables a researcher to discover?
SF: I like to say we put a friendly face on the RetroSheet data. You can search for things like all five-hit games against the Yankees. The most doubles by a catcher in the 1950s. The most pitches thrown in relief in 2000, and lots, lots more. You can get 95 percent of the data for free, but we think that last five percent is a compelling reason to subscribe.
CE: I concur completely. Being a Cincinnati Reds fan, I was watching a FoxSports Ohio game broadcast by Chris Welsh who mentioned that he had a subscription to Baseball-Reference.com. Obviously, the site is so popular, it is not only frequented by bloggers and freelance writers like myself, but also professionals such as Welsh and so many other writers and announcers. Did you ever, at the beginning, envision the far-reaching effect of this website?
SF: Certainly not. It was really just a site that I wanted to use. I have to admit that it is a lot of fun to sit in a press box or in the press room at the winter meetings and see most of the writers on your site at one time or another.
CE: Everything is menu-driven and linked together seamlessly. The entire process seems overwhelming and daunting to me. The fact that so many numbers are crunched and moved around every day, by the box scores of 14 or 15 games, that my head spins just thinking about it. How are your pages updated after each game? What has to be entered manually on a daily basis?
SF: Nothing happens manually. There are like 20,000 or more pages that are updated each morning, so it is all automated. We purchase the data for the in-season, so that arrives each morning and it all starts while I'm asleep. Occasionally something breaks or the stats don't arrive, but generally it is seamless.
CE: I can't imagine the programming that went into that design. Not only are the old school "tried and true" statistics such as BA, HR, RBI, OBP, ERA, etc. found there, many stats that most of us have never heard of can be found there as well. What are your personal thoughts on sabermetrics?
SF: I'm obviously a fan. I have a copy of every Baseball Prospectus ever written and really got into this because of an interest in sabermetrics. The field has obviously exploded, so I struggle sometimes to keep up with all of the research going on, spend time with my family and run our company (we have six, soon-to-be seven sites now). I don't think we are at the bleeding edge like FanGraphs or Baseball Prospectus, but we try to be very easy-to-use and also present numbers and data that will expand fan's interest in the game.
CE: I know that you have expanded your online presence to include the National Football League and the National Basketball Association. What other sports are offered by your company and do you have plans for further expansion?
SF: We have baseball, football, basketball and hockey. The Olympics data behind our Olympic site is just insane. It takes things like wind speeds for individual long jump attempts and much more. We just launched college basketball in the winter and we expect to launch college football this summer.
CE: Do you offer special customized (paid) services to high-profile clients?
SF: We can do specialized things every once and awhile, but we are pretty focused on the consumer.
CE: How about player photos? Have you entertained the thought of placing photos of the players on their pages? What legal hurdles would that bring with?
SF: We just did add photos for players who debuted pre-1960. We have the post-1960s in hand, but as you mention, those are at first glance not in the public domain, so it's on my to do list to talk with our attorney as to whether we can use those or not. I'm hoping we can work something out.
CE: As a writer and baseball historian I can't begin to tell you how thankful I am for the services you provide. I want to thank you for taking the time to answer these questions and share this information with me and the rest of the baseball community. Thank you and I wish you all the success in the world.
SF: You're welcome Cliff. It has been a very rewarding project to build and create. I'm just so happy that people like the site as much as they do and want to use it.
Just a few other things that the site allows you to do are:
Find information about how a player does against left-handers; night games, away games, at a certain park, with runners in scoring position with less than or two outs.
It lets you know the percentage of runners a catcher has thrown out; what a pitcher's stats are with a 3-1 count; what a pitcher's stats are the third time in a game that he faces a batter.
It is just fascinating what all you can find.
You can see the 555 teammates that Tommy John had during his career.
Have you ever heard of the six degrees to Kevin Bacon? There are only five "degrees" from Babe Ruth to Barry Bonds: Tony Lazzeri, Phil Cavaretta, Minnie Minoso, Jim Morrison, Bonds.
If you have never visited the site, or have been in seclusion for several years, go to Baseball-Reference.com and have yourself a blast.
What is the duplicate article?
Why is this article offensive?
Where is this article plagiarized from?
Why is this article poorly edited?