What’s in a Name

If you don’t have anything more than a passing interest in baseball statistics, this post will bore you even more than it will if you do have more than a passing interest in baseball statistics. You have been warned.

We talk a lot about wins above replacement level in baseball now. Wins Above Replacement, WAR as it’s often abbreviated, the hilariously complicated statistic that expresses a simple idea: how many wins was a player worth above his Triple-A replacement. And, because it’s expressing a simple idea, WAR has slowly worked its way into the baseball mainstream. The always-interesting Joe Sheehan has a column in nearly every issue of Sports Illustrated, and he uses WAR a good amount; cover stories in the magazine about players almost always mention their wins above replacement total. It’s not on TV broadcasts, yet perhaps, but if it’s in Sports Illustrated, I think it’s safe to consider mainstream. So I feel okay saying WAR has made it to the big time.

So here’s what I’ve been thinking about: Why do other, equally useful and maybe even more useful, baseball stats fail to make it so far? There have been dozens of new baseball stats over the past few years. Why do some fail where others succeed? What makes a successful stat?

And here comes the rant. I’m going to pick on wOBA first. wOBA, although it expresses something simple – how good a hitter is at producing runs — is never ever going to work its way into the mainstream. It’s an excellent statistic, maybe the best at describing a player’s offensive contribution. It says a single is worth this much, a walk this much, a double this much, triple this much, etc., then adds it all together and averages it out. That’s wOBA. But it’s not going to be widely used, and I think there are two reasons:

Reason 1a.: It has a bad name. It’s a lower case letter in front of three capital letters, wOBA. If I’m saying it out loud, it seems like I should whisper the “w” and then yell the “OBA.” And it’s awkward to begin a sentence with. xFIP – see how weird that looks? – is doomed for similar reasons. I’m not trying to say that people can’t understand acronyms with lowercase letters in them because people are book-burning hobgoblins. But I am saying that putting a lowercase letter in front is like calling it “Dungeons-and-Dragons-OBA.” Rightly or wrongly, it’s going to be off-putting to some because it’s going to look needlessly complex.

Reason 1b: It’s a number based on the on-base percentage scale. The OBA in wOBA stands for on-base average, and the number is supposed to look like on-base average/percentage. So if a hitter’s wOBA number would be a good on-base percentage, he’s a good hitter.

The problem is that, five years ago as a more casual fan, I had no idea what a good on-base percentage was. I knew what on-base percentage was and how it was calculated, but I had no meaningful scale for the number like I did for batting average and ERA. I didn’t know what the single-season record was for OBP, who the all-time career leader was, what a terrible OBP looked like, because it wasn’t something talked about that often.

That’s a problem. Because taking a statistic and scaling it to on-base average is like making a baseball stat and then scaling it to the weight of apples. I have an idea how much an apple weighs, because I’ve held them and eaten them and thrown them at my siblings. But I have no idea exactly how much an apple weights or how much a big apple weighs compared to a small apple. So if one hitter is eight ounces on the apple scale and another hitter is nine ounces, I have no idea if they’re both above-average hitters, both below-average, or if one is good and the other is bad. I have to go on Wikipedia and read all about apples first.

If you want casual fans to pick up wOBA without much effort, they have to understand the scale for OBA already. (And know that OBA is the same as OBP.) And I don’t know how many people know the normal range for on-base percentage off the top of their heads, because it’s not something most baseball fans grow up thinking about. I know I didn’t.

There might be an argument that OPS, On-base Plus Slugging, is an offensive statistic on a strange scale that has been accepted into general use. OPS is used on ESPN broadcasts and it shows up on a decent number of scoreboards. So maybe there is hope for wOBA. But OPS, like wOBA, is on a crazy scale that ranges from .500 for terrible hitters to 1.100 for Barry Bonds-like players. But it’s accepted anyway. My best guess why OPS works for a general audience is that:

A. Enough baseball fans know what on-base percentage is, enough know what slugging percentage is, so adding them together is easy enough to explain. I think. I’m just guessing here.

B. OPS happens, entirely by accident, to have a scale similar to how we grade things in school. For grades in school, a 90 and above is an A, between 89-80 is a B, 70-79 a C, and so on. But where a 90 grade on a test is an A, an OPS of .900 happens to be an A-grade OPS for a batter. Hitters with an OPS in the .800s are grade B, in .700s grade C, then Ds in the .600s and those failing hitting in the .500s or below. The great hitters, the ones earning extra credit, come in above 1.000. Carlos Beltran has a .904 OPS with the Mets this season, clearly earning an A-grade on offense. David Wright has an .801 mark, a B, and Jason Bay comes in with a .704, earning himself a C. Jason Pridie earns a D with his .634 OPS. Brad Emaus failed baseball with a .424 OPS in the majors this year, and the Mets let him go for remedial work in the minors.

OPS isn’t as accurate as wOBA for identifying the best hitters; wOBA is easily the better of the two. But “OPS, Oh-Pee-Es” rolls off the tough better than “wOBA, Whoa-Bah,” and it has a scale that’s familiar enough to most. At least that’s my theory why we use one and not the other.

On a similar note, you can argue about the merits of Ultimate Zone Rating (UZR) against Defensive Runs Saved (DRS), the two most-popular defensive metrics. But, in my opinion, DRS has two huge advantages over UZR that have nothing do to with the actual numbers: The people who produce defensive runs saved round it off to a whole number, and they gave it a self-explanatory name. Look, which sentence is easier to understand without further explanation:

Nick Evans has three defensive runs saved at first base this season.


Nick Evans has a 1.9 ultimate zone rating at first base this season.

Maybe I’m wrong, but I have to go with the first option here. And if I’m a writer concerned with brevity and making things understandable for the reader, as I sometimes pretend to be, I’m going to use the first option every time. Because if I didn’t know anything about defensive statistics, I could understand what the first sentence meant. I’d have to spend a while on Google to understand the second.

Plus, the “Ultimate” in “Ultimate Zone Rating” gives a Warhammer 40,000 feel to UZR.

Maybe I’m just shouting into the void here. But I sometimes wonder if new baseball stats had different names, that it might be all over TV broadcasts already. I suspect part of the reason football accepts passer rating, sometimes known as QB rating — a statistic that has a complicated formula and is weird — is because it’s called passer rating and not “weighted ultimate completion percentage.” I tend to believe the name matters.

So it seems to me that Wins Above Replacement is making its way into the mainstream because it’s struck the balance between usefulness and ease of use. Its acronym is WAR — all capital letters *and* spells an actual word we can say out loud — and it’s scaled to wins, which most fans have a feel for already. It’s still a little off-putting because it’s almost impossible to explain and it has decimal points . . . decimal points in a stat can be tough. We all have a feel for seven wins because it’s a real thing. We don’t all have a feel for 7.4 wins, because .4 wins is an abstract concept. You can’t win half a game. But WAR is a tough stat to round, because the differences between players are often so small that you need the .4 thrown in. It’s a necessary evil.

I have no idea if any of this is interesting to anyone besides me. Or if I’m even right, because most of this isn’t based on hard fact, but my subjective opinion. I don’t actually know why people picked up OPS and not wOBA, and I’m just guessing. I’m probably wrong on plenty of stuff, if not all stuff, here.

But if you’re a sabermagician cooking up a new stat, it seems to me that it would probably be worth the extra time to come up with a good name and scale that are self-explanatory. Because if you want people to use something, it seems that setting up fewer hoops to jump through would be a good idea.



5 responses to “What’s in a Name

  1. Two things: First, ‘OPS’ is not pronounced ‘oh-pee-ess’; it’s pronounced ‘ops’ — as in ‘black ops’. It’s cool.

    Second, I’m going to take this opportunity to remind you that back during spring training I predicted that as a unit Davis and Thole would average out to a sophomore slump. You owe me eighty-five million dollars. Or an apple I can throw at my sibling.

    Make it a nine ouncer.

  2. See, I can’t call it “ops,” because that stands for Operations. Immediately I picture Picard barking “Man the Ops station!” to Data – that just feeds into the stereotype of statistics being for robots and computers, not for true baseball lovers.

    (But at least it’s not Wesley manning Ops, which is even a worse mental image.)

    In any case, two observations.

    First, there already was (or presumably still is) a wOBA-like statistic scaled to a well-known number: EqA, Equivalency Average, scaled to batting average (.260 is average, .300 is raking, .350 is Pujolsian). It’s old enough that I remember a version of it included as a sortable stat way back in Sammy Sosa High Heat Baseball 2001. It has never caught on.

    Second, I suspect the reason things like EqA, wOBA, or even WAR (which is based on wOBA, IIRC) have an uphill climb is that they’re not simple enough to calculate on your own from the boxcar numbers. If you don’t know a guy’s slugging or his OBP, you could get it from the standard stats. It’s not hard. It doesn’t depend on context of park or position, nor is it relative to league average. You mentioned the trade off between usefulness and ease of use, and this is it.

    Recently at Joe Posnanski’s site, there was a combox squabble over WAR, with one person saying it was too *subjective*. I think he picked the wrong word, but his objection had a little weight: who decides that a offense from a shortstop is worth ten more runs than a third baseman’s, or five runs less than a catcher’s? Who decides how many runs better or worse a person’s baserunning is? And even if all of those levels are agreed on, how do you calculate it? It requires a database of dozens of individual component stats, and some of them (extra bases taken, for example, or reached on error) are obscure; it then requires a formula that changes from season to season based on the run scoring environment; and it requires park factors. One can’t grab a napkin and come up 30 seconds later with “This guy’s worth 1.5 more wins than that guy.”

    Honestly? That’s why passer rating has much less traction with casual football fans than the traditional yards, tds, ints, completion %, and maybe yards-per-attempt. All of those go into the rating, but nobody can figure it for themselves. Not only is the formula complicated, it’s scaled so that there’s a maximum possible rating. It would be like capping batting average at .350. Probably nobody cares if a QB is slightly more efficient than another (which is what the rating is supposed to measure).

    • Dammit nightfly, get it right – Wesley manned the helm, not ops!

      *slinks back into basement, Mountain Dew in hand….

    • “See, I can’t call it “ops,” because that stands for Operations. ”

      That’s a problem I run into a lot. It started when Mom’s society of Methodist women became the UMW. She tells me it means United Methodist Women, but I know my mom’s now a mine worker.

      And what about STAR? Up in this area we have at least four STARs, ranging from a police special tactics unit to a school for troubled children, and ending at a property tax rebate program for seniors.

      I feel you, man.

