Moneyball

If you’re wondering where I got my love for sports analytics, the answer is Moneyball. Published in 2003, Moneyball explains Sabermetrics through the story of Oakland Athletics General Manager Billy Beane. As defined by Wikipedia, Sabermetrics is the empirical analysis of baseball, especially baseball statistics that measure in-game activity“.

Moneyball generated a lot of controversy. These stats nerds were ruining the game with complicated algorithms. At the time, Beane thought players who excelled in on-base percentage were undervalued compared to those with a high batting average. Old school analysts like Joe Morgan mocked those who valued on-base percentage for wanting to clog up the basepaths.

The Oakland Athletics haven’t come within a 10-foot pole of a World Series. Why is Moneyball so lauded?

I evaluated whether on-base percentage is a better statistic than batting average. If you think that’s a poor way to evaluate the value of Sabermetrics, I agree. But it is probably the most popularized insight from Moneyball and makes for a simple test.

Using MLB data from 2003 to 2018, I calculated a multiple linear regression between batting average, on-base percentage, and wins. Here was the result:

Wins = (OBP * 560) – (BA * 204) – 49

This means to estimate how many games a team will win, you multiply their on-base percentage by 560, subtract their batting average multiplied by 204, and subtract 49. After controlling for OBP, having a higher BA is associated with a smaller chance of winning.

How could this be? Hits make you less likely to win? Not exactly. On-base percentage includes hits, walks, and hits by pitch. Since batting average includes hits, on-base percentage captures everything captured by batting average and then some.

To confirm this, I calculated two simple linear regressions, that between on-base percentage & winning and that between batting average & winning.

Wins = (OBP * 409) – 52
Mean Squared Error = 98

Wins = (BA * 340) – 7
Mean Squared Error = 113

First, we see the coefficient is stronger for on-base percentage than for batting average, 409 compared to 340. Second, we see the mean squared error is smaller for on-base percentage meaning the relationship is clearer. Conceptually, on-base percentage is more valuable to me than batting average. But if you’re skeptical and wanted the data, there it is.

The debate between on-base percentage and batting average is only one example between the statistics nerds and the old guard. Over a decade after Moneyball was published, on-base percentage is now properly valued. The promise of using statistics and economics to recognize undervalued assets in sports remains as alluring as ever.

Thanks to Kyle Safran for helping me prepare this post.

Leave a Reply

Your email address will not be published. Required fields are marked *