Naked Statistics and the Best Baseball Player of All Time
Over the weekend, I finished reading a relatively new mathy book titled “Naked Statistics” by Charles Wheelan. I’m a man who gets fired up about things like sample size, normal distribution and standard deviation, so it was a nice time for me. More importantly to this blog, there was a sports angle to one of the chapters, specifically an attempt to use statistics to name one man the best baseball player of all time.
To write his chapter about Descriptive Statistics, Wheelan contacted a serious baseball math expert (Steve Moyer, the president of Baseball Info Solutions) and determined the three most valuable statistics used to evaluate a non-pitcher. He goes into detail describing each, but I’ll assume my readers need none of that. The key stats are:
- On-base percentage (OBP)
- Slugging percentage (SLG)
- At-bats (AB)
After giving us these criteria, he concludes: “In Moyer’s view, the best player of all time was Babe Ruth because of his unique ability to hit and to pitch. Babe Ruth still holds the Major League career record for slugging percentage at .690.”
I have a lot to say about those three points and the conclusion, but it’s important to start off with a clarification: these stats don’t tell us the best baseball player of all-time, they tell us the best batter of all-time. There’s a good percentage chance the same person is both of those, but it should be clear what is being measured.
We can debate things like the importance of pitchers vs. hitters or the impact of adjusting for rule changes, eras and ballparks. That’s fine. What we won’t debate is whether we can ever really come to a concrete conclusion about the best there ever was. Now that that’s out of the way, let’s talk about those three criteria.
For anyone who’s seen Moneyball, this seems obvious. Batting average is an important stat, but in a lot of situations, taking a walk is just as good as a single, so that should be taken in account. Teams want batters who get on base. Of equal importance, they want batters who don’t get out.
On-base percentage doesn’t really tell the entire story of a batter’s effectiveness. It obviously important to have a batter who is difficult to get out, but batting stats can generally be broken into average and power. OBP is great, but it doesn’t account for power. It should be relatively obvious that a team would rather have a .280 hitter who launches 50 homers a year than a .300 hitter who slaps singles all the time. Not all hits are equal, so SLG weights them appropriately.
The first two stats measure quality, this one measure quantity. A batter who hits safely (even if it’s a homer) in his only plate appearance and then suffers a career-ending injury touching home plate would never be in this conversation even though his OBP and SLG are maxed out. If doesn’t matter how well you bat if you don’t stay batting for very long.
Overall, if I were to have been asked about the most important stats in baseball, I would have given the same three. They offer a combination of hitting quantity (AB) and quality (OBP and SLG), with the quality metric broken down to itself include both quantity (OBP) and quality (SLG).
Actually, it doesn’t matter what I would say, Moyer is one of the most qualified people alive to be declaring which stats are the most important. I probably would have tried to get cute and account for era by adjusting OBP and SLG to account for how many standard deviations a batter is away from the average for the years they played, but that’s me being nitpicky.
If you learn nothing else from this post, it’s that those three stats are the most important in evaluating batters. The most important word in that sentence is “batters.” Again, I didn’t use “players” because we’re not measuring playing, we’re measuring batting.
This, however, is where it gets tricky. In case you didn’t notice, Wheelan’s final conclusion doesn’t match up with his explained reasoning. He did the research and found the three magic stats, then he explained that Ruth was the most important player because of other criteria.
He goes through all the effort to find these important data and then the statement about the best player talks about how he was a successful pitcher and hitter. He just went through the trouble of explaining why OBP, SLG and AB are important and then he talks about how a player is valuable because of positional versatility! He does say that Ruth has the highest SLG stats ever, but no mention of the of two members of this numerical triumvirate.
It seems to me like the logical place to go from those three important stats is some sort of formula to create an overall rating. Something along the lines of (OBP+SLG)(AB) could be our starting point. I’d suggest squaring the quality metric before multiplying the quantity metric, so the final product looks like (OBP+SLG)2(AB). We can call it the Wheelan Score. Of course the Wheelan Score number doesn’t mean anything, but it would at least be useful in comparing players relative to each other.
Conveniently, there’s already a stat that combines on-base percentage and slugging. It’s called (get this) on-base plus slugging (OPS). To find our magic all-time rating, all we have to do is square that and multiply by career at-bats. Then we just rank high to low.
Before we look at the results, it’s important to recognize the limitations of data. Here are a couple of points to consider:
- This only accounts for players who competed in Major League Baseball in America; it’s possible that the player we’re looking for played in another league or another country and doesn’t have the stats to qualify for this comparison
- Data on active players isn’t complete, their OPS will likely not change too much, but they are at a pretty severe disadvantage in total AB count
- Pros like Ted Williams and Ty Cobb missed full years in their prime to serve in the military; while we shouldn’t penalize them for that in the way we remember them, it does hurt them in this model to lose the ABs
- This data only applies to batting, it doesn’t take into account fielding, base running or anything else
Once I had the formula figured out, I just went to baseball-reference.com and found the all-time OPS leaders. I squared it and multiplied by each player’s AB total and the top ten Wheelan scores of all time are:
- Babe Ruth: 11,372
- Barry Bonds: 10,881
- Hank Aaron: 10,659
- Stan Musial: 10,445
- Ty Cobb: 10,209
- Willie Mays: 9,643
- Ted Williams: 9,589
- Lou Gehrig: 9,329
- Tris Speaker: 8,785
- Jimmie Foxx: 8,757
That seems like a lot of the names you would expect to be there, right?
What if we took my earlier suggesting and used an adjusted quality number? Luckily, there’s a stat for that, conveniently called Adjusted OPS. I squared career Adjusted OPS numbers and multiplied by ABs to get another top 10 list, this time we’ll call it Adjusted Wheelan Score:
- Babe Ruth: 356,419,964
- Barry Bonds: 326,172,028
- Ty Cobb: 322,713,216
- Hank Aaron: 297,045,100
- Ted Williams: 278,186,600
- Stan Musial: 277,383,132
- Willie Mays: 264,800,016
- Lou Gehrig: 256, 360,041
- Tris Speaker: 251,296,555
- Rogers Hornsby: 250,298,125
The lists look almost identical. In fact, they have the same players at 1, 8 and 9 and 2-7 are the same six players reshuffled. The only difference is in the final player on the list. If you’re curious, Hornsby was 14th on the Wheelan Score metric and Foxx was 15th in Adjusted Wheelan Score.
After all that math, it turns out I could have just listened to Moyer in the first place. The Babe is mathematically the best. You can argue what you want in the comments, but you better come with some data.