NFLNBANHLMLBWNBARoland-GarrosSoccer
Featured Video
Mets Walk Off Yankees 🍎

Simulate The Rest of The Baseball Season Any Way You Dang Well Please

Adam IntwinslandAug 25, 2009

A proliferation of statistics focused baseball websites have cropped up in recent years, and while I won't comment on the feud between purists who dislike stat junkies and those who feel that understanding baseball statistics at a deep, complex level enhances their enjoyment of the game, I'll simply note that what while once baseball stats were what was printed on the back of baseball cards, the number of fans who analyze baseball statistics with much greater depth and more powerful tools has exploded in recent years. What was once the purview of a few isolated stat "nerds" who trafficked in an alphabet soup of new-fangled statistics is becoming ever more mainstream. Websites offering ever more complex baseball statistics, baseball statistics from minor league and college baseball and complicated analyses of those stats are many, with these sites becoming must-know information for hard-core baseball stat- heads. While a comprehensive list is impossible, some general trends are apparent; there is a desire for both statistical analysis of the game and quantitative modelling of the game at a level of complexity that was non-existent a few years ago.

While a comprehensive list is impossible, I'll outline some general trends. Modern baseball statistics sites don't shy away from math or data-overload, they continually question the assumption that the inherited wisdom of which stats are meaningful is correct, and they try to generate new insight into a very complicated game that was not possible before the era of computerized records of every single play. The Baseball Cube gives its users access to player stats from every minor league level and college, trusting the fan to sort out what it means that their team's second round draft pick from a year ago is OPSing .950 in A ball this year. Baseball Prospectus offers its subscribers a complex model of player development, The Hardball Times and The Baseball Think Tank Factory offer some of their own proprietary stats as well as featured articles that run the gamut of analytic baseball writing. Stand-out articles of late have focused on things as esoteric as the physics of hitting a home-run at the MLB ballparks to ontological discussions of fundamental concepts in baseball. For instance: what, exactly, is a hit as opposed to an error and why?

TOP NEWS

Washington Nationals v Los Angeles Angels
New York Yankees v. Chicago Cubs

One of my favorite new stats out there is available from Bill James Online (although he doesn't calculate it) and it is the Fielding Bible's plus/minus numbers which uses a computerized video system of every play of every MLB game to compare how players at the same defensive position did when baseballs were hit in about the same spot at the same speed at the same angle and gives them merits or demerits if they made or missed that play relative to the average of all the other players in the league who play that position. This gives you an idea of the kind of technology (and money) that are being brought to bear in solving what is essentially a single problem, which, if you could solve it consistently, would be very lucrative one indeed: what baseball players are undervalued and which ones are overvalued?

There has even been some creep of this "SABRmetric" type of statistical analysis onto mainstream sports sites. While apparently the number of foot-lbs of energy needed to get a ball over the Green Monster is not making the front-page on ESPN, you've probably noticed that baseball game previews on ESPN feature a projected winner... but not by giving how many runs Team A will win by, but rather by expressing the likelihood of their winning as a percentage. There was even a brief period last year where their standings page on ESPN had columns that predicted the chances a team would win its division, the wild-card and make the playoffs at all, as Baseball Prospectus offers on its standings page, letting you use three different models of team performance to estimate how the division and wild-card races will turn out--each with slightly different results. Several sites offer this basic type of information using differing methodologies--for instance BP's "PECOTA" model is based on predicting player performance by matching a player with a cohort of historical baseball players whose career trajectories, age, size, handedness, etc. match up to the player of interest.

Needless to say, these websites need to make money and their projections of both individual player performance and team outcomes over a season are generated by labor-intensive proprietary models that no-one simply expects them to give away. There's nothing wrong with this, but it has in the past left me feeling like I wished I could go deeper into the model... I'd like to be able to understand what the assumptions were, and perhaps more importantly, change them if I disagreed with them. So I set about building my own web-site that would let one do that, and while it's not ready for rigorous use... I haven't ironed out the kinks for Internet Explorer yet so it'll only work in Firefox/Safari at the moment... I've been working on a side project that lets you model league-wide and team-specific baseball statistics where the way the model works is completely transparent and the assumptions that the model makes can all be altered by the user.

A word to bleacherreport.com readers, first; this is, to put it bluntly, a long article that while it talks about other stuff essentially a link to a web-site I built. You might not find that acceptable and if admins or readers think this article should be removed then I won't object; however, there are no advertisements on the site and I have no plans to monetize the site. I made the program to address what I saw was a hole lacking in the baseball Internet statistics community and have no plans to get rich off it. If people like using it then that's great.

OK, with that disclaimer out of the way I can send you on your way to mymlbsim.com. You'll notice when you click that link that the URL redirects to a website whose url is baseballmarkovchain.org and you might wonder why that is. Well, while mymlbsim.com easier to remember and conveys what the website does better, IMHO, the heart of the site is a Markov Chain simulation of the game of baseball. While explaining what a Markov Chain is in detail is well beyond the purview of this article, a number of articles (like this one) can be found by simply typing in "Baseball Markov Chain" which explain the link in detail. In one sentence, a Markov Chain is a graph (the mathematical object, not the picture) with edges that have a directionality. It is helpful for modelling baseball half-innings, and the blog on my web-site explains in more depth how this works.

What's more important, though, than how the math works is that you don't need to care about how the math works to use the site. For now, if you use FireFox, the site is designed to be as intuitive as possible for a site that allows you to build a complex mathematical model of any or all of the 30 MLB teams. When you want to input a view that you don't think a player's default projected statistics for the season match reality--say, you want to input the prediction that Cliff Lee will be dominant, not just OK, for the rest of the season, then you go to the Phillies page after you sign in and click on the "cfg" (configure) button next to Cliff Lee. His projected season shows up and three sliders that control things like how many strikeouts, walks and base-hits he allows show up, and you can slide those around until it looks like the projected season is what you think he'll do.

To "completely" configure a team you need to give it a lineup by using either a custom or default projection of performance from the guys on a team's active roster; you need to configure the whole lineup (for NL teams you can leave the #9 spot for the pitcher blank) and for pitchers you need to have 4 starters and 4 relievers. Teams that are not fully configured have their true "skill level" measured based on a measurement that looks at their ability to score and prevent runs and tries to isolate luck, available from Baseball Prospectus. Once one or two teams are configured you can engage in simulations involving those single teams (how many runs do they score with this batting order?) or the entire league (what are their chances of making the playoffs with this lineup if a certain player starts playing better?)

Sorry for the long, self-promoting post, but if you use FireFox and you're interested in this kind of stuff you can set up a free account at mymlbsim.com and play with your own custom baseball statistics web application. As noted... you'll need FireFox (for now) and a few small functions (password recovery, changing your account e-mail address, etc.) don't work just yet, but as far as the meat of the site that lets you build and configure a set of simulators, it's all working. Detailed instructions are available at the site. Email adam AT baseballmarkovchain.org with questions, comments, gripes, bug reports, etc.

Mets Walk Off Yankees 🍎

TOP NEWS

Washington Nationals v Los Angeles Angels
New York Yankees v. Chicago Cubs
New York Yankees v Tampa Bay Rays
New York Mets v San Diego Padres

TRENDING ON B/R