If you go onto (the oddly green) Fangraphs.com and wander (with your mouse) into the value section, you can see that among the Mets everyday players, David Wright has been most valuable in the first half of 2010. Wright has accumulated 4.1 Wins Above Replacement (WAR) — that’s the second most in the National League, with Joey Votto leading Wright by just 0.1 WAR. Angel Pagan is second on the Mets with 3.1 WAR. Makes sense, Wright then Pagan.
If you then wander over onto Baseball-Reference.com, you’ll see something interesting. According to Baseball-Reference’s version of WAR, Angel Pagan has been the most valuable Mets player, worth 4.0 WAR. David Wright is now second, with 3.9 WAR. Pagan’s 4.0 WAR makes him the second most valuable position player in the NL. He trails Adrian Gonzalez by 0.2 WAR on the leaderboard.
So, on one website, David Wright is tops on the Mets team, and the second best in the NL. On the next website, Angel Pagan is the best on the Mets, and second in the league. On one, Pagan is 3.1 WAR; on the other, 4.0 WAR. Joe Posnanski recently complained about this WAR discrepancy. Seeing that there are now two easily and freely available versions of WAR on the Internet, I think it might be worth it to look into the differences between the two versions of WAR, and WAR itself — by talking to myself, of course.
Okay, so what’s WAR?
WAR stands for Wins Above Replacement. It’s a statistic that’s supposed to measure how valuable a player has been to his team, and then puts that value into a number of wins.
What is “Above Replacement”?
Replacement means “replacement level player.” Replacement level players are the sort of dudes every team has stocked away in AAA. All those guys on the 2009 Mets that were pulled out of nowhere after everyone good was injured? Like Emil Brown? Ramon Martinez? That’s basically replacement level. WAR is supposed to measure how much better a player is than someone barely serviceable.
But it has decimal points. “3.9 WAR.” I don’t like that.
I really don’t like it either. WAR would probably catch on more if it was expressed in whole numbers. Like, what the heck is 0.5 of a win? Baseball fans seem to be okay with whole numbers (RBI, saves) or percentages (batting average, slugging), but mixed numbers . . . not so much. Sabermetricians are generally pretty bad at coming up with accessible acronyms and numbers. Bill James is usually the exception, and he had the right idea by making his Win Shares whole numbers and multiplying them by three so the differences were larger.
So, why does Angel Pagan have two different WAR numbers?
Well, because there’s no single agreed upon way to calculate WAR. It’s not like batting average, “hits divided by at bats.” Each site uses a different method for WAR — this doesn’t help it catch on more widely, either, because now there’s a steep learning curve.
Sigh. Okay. What are the differences between Fangraphs and Baseball-Reference WAR?
Well, first we’ll need to break down the pieces that add up to make WAR. We’re going to ignore WAR for pitchers, which is totally different, in this post; I’ll deal with them later in the week.
Think of it this way — every position player is responsible for two things:
A. Creating runs on offense
B. Preventing runs on defense
And that right there is really the heart of WAR. It measures a player’s offensive contributions and their defensive contributions in terms of runs, and then converts those runs into wins.
The problem is that Fangraphs and B-R evaluate both offense and defense differently, and each comes up with different number of runs for the same player.
So I’m supposed to buy into a stat no one can agree how to calculate?
. . . .yes?
I’ll play along for now. Well, to start, how is a player’s offense measured differently on each site?
Well, it’s not that different. Each version of WAR bases a player’s offensive contributions on basically the same few things: his number of singles, doubles, triples, home runs, stolen bases, caught stealing, reached on errors, walks, and hit by pitches. Each batting event is assigned a value, and then those values are adjusted for the park, year, and league the player is in, so that batters in 1968 Dodgers Stadium can be compared to batters in 1997 Coors Field. The adjustments are slightly different on each site, but the ideas behind doing so are the same.
Baseball-Reference also adds in some baserunning events Fangraphs does not, such as advancing on passed balls and going first to third on a single. B-R also gives credit to hitters for avoiding hitting into double plays. Neither of these are enormous differences, but they are differences.
Fangraphs combines all theirs offense into one number, called “weighted runs above average.” (wRAA) . . .
Wait, why is the “w” in “wRAA” lowercase? Why do they always do that?
Um . . . I’m really not sure. I don’t see a problem with WRAA, other than it sounds like a crow’s noise if you read it aloud like a real word. The lowercase certainly doesn’t make it more appealing, and it’s going to look awkward when I don’t capitalize that “w” at the beginning of this next sentence . . .
wRAA is supposed to measure how many runs a player created on offense, above or below what an “average” player would create.
Baseball-Reference, for their part, breaks offense down into four numbers:
– Double play runs. (Not hitting in double plays.)
– Reached on error runs. (What it sounds like.)
– Baserunning runs. (Stolen bases, caught stealing, advancing on hits, wild pitches, or passed balls.)
– Batting runs. (Everything else — singles, home runs, ect.)
Each one is also compared to the average. B-R likes whole numbers, so sometimes you’ll see “12+1+1 + 0 = 15,“ for the four categories, but that’s only because of the decimal places they don’t show you in the rounding.
Here is what Angel Pagan’s actual 2010 looks like so far, only the table is cut off because I can’t figure out how to make it not do that:
And here’s what it looks like in B-R’s four offensive columns:
B-R says Pagan created about 12 runs above average in those four categories. Fangraphs credits Pagan with creating 13.8 runs above average.
That’s close. So the only difference in offense between the sites is in GIDP and baserunning?
Well, not exactly. They both use a slightly different formula and adjust it differently, but it’s generally going to spit up a similar number.
Also, at the end of the season, B-R’s offensive number is adjusted so that the number of runs a team is credited with creating matches up with the ACTUAL number of runs the team scored; Fangraphs doesn’t do the same thing. B-R’s version of WAR is rooted a bit more in what actually happened on the field.
Baseball-Reference also doesn’t bother to figure out reached on error runs and double play runs until after the season ends. I’m not sure why. You’ll see that right now, everyone, Pagan included, is 0 in both in 2010.
Fair enough. So . . . defense?
Ah, defense. The big problems show up in defense. Fangraphs uses a system called Ultimate Zone Rating (UZR) for their WAR. Baseball-Reference uses something called Total Zone. Fielding is where you’re going to find the really big differences between players. Fangraphs’ UZR gives Angel Pagan’s defense 6.4 fielding runs above an average player; Baseball-Reference’s Total Zone gives him 16 runs above average. Jose Reyes has either been -8 (Total Zone) or -0.4 (UZR). These numbers don’t always line up, and this causes most of the discrepancies between WAR numbers.
Okay. Why the enormous gap in fielding?
Mostly because it’s really, really difficult to figure out defense on an individual level. If Jose Reyes is at the plate and strikes out, we know it’s not David Wright’s fault; the strikeout is all on Reyes. Offense is easy to assign. On the other hand, if a ground ball scoots through between Wright and Reyes in the field, it’s a bit trickier to say whose fault that is — or if it’s the pitcher’s fault, or if it’s anyone’s fault. Assigning individual defensive credit is tough, and we’re not particularly good at it. This is why different systems can spit out vastly different numbers.
As for the differences between Fangraph’s UZR and B-R’s Total Zone — basically, if you kept really, really good scorecards, you could figure out Total Zone on your own with just that; if you recorded every game on your DVR, you could figure out UZR with that. Not really, but almost. Both take into account a player’s range, and his arm if he is an outfielder, and his ability to turn double plays if he is an infielder. They just do it in different ways.
Catcher defense is also evaluated differently by both sites. B-R uses passed balls, wild pitches, caught stealing and the
number of stolen bases allowed to evaluate a catcher; Fangraphs just uses CS and SB numbers.
Which one is better? UZR or Total Zone?
Definitely UZR, but it’s sort of like the difference between playing William Tell with someone who has 800/20 vision and playing with someone who has 200/20 vision. Neither system is perfect, or even close to that, but it’s better than letting the blind guy try to shoot the apple off your head, right? They’re better than the nothing we used to use.
On the other hand, Fangraphs goes with decimal points again for UZR. It’s not a huge deal, but I think it’s easier for everyone to understand Pagan saving 16 runs as opposed to 6.8 runs. No one ever wins a baseball game 6.8 to 4.2.
The big advantage of Total Zone is that it allows us to evaluate the defense of players throughout all of baseball history through Retrosheet, something UZR can’t do.
At the very least, both systems rate Pagan as a great defensive center fielder. The disagreement is about how great.
And that’s everything in WAR?
Almost. There are two other adjustments that need to be made.
The first adjustment is for the position of the player. In other words, defense first positions like shortstops and catchers get bonus runs, and slugging positions like first basemen and designated hitter lose runs. Positions that stress defense are generally played by lighter hitters; this adjusts for that, so we can compare players across positions. Someone who can hit 25 home runs as a first baseman is easier to find than someone who can hit 25 home runs as a shortstop; WAR tries to adjust for that fact.
Pagan gets one run from Baseball-Reference for being a center fielder. Fangraphs, still going strong with the decimals, gives him 1.1 runs.
And the second adjustment?
The second adjustment is for “replacement runs.” Because it’s Wins Above Replacement, and the offense and defense are just compared to AVERAGE and not REPLACEMENT, we need to adjust for that — replacement players are worse than average players.
Basically, what replacement runs means, is that if someone plays 150 games, he starts with about 20 or so runs to his credit just for running himself out on the field; more games gets you more runs, less games gets you less runs. Those are “replacement runs,” an estimate of the difference between replacement and average level. It’s a weird concept, and probably where most people tune out.
Both sites figure these out differently as well — Baseball-Reference uses a certain number of replacement runs depending on which league. I don’t believe Fangraphs does the same thing, and just uses a blanket replacement level for both leagues.
Anyway, as for El Caballo Loco, Pagan gets 11 replacement runs from Fangraphs, and 9 from Baseball-Reference — about 10 runs for a half season.
But now that’s everything in WAR, correct?
*Awkward high five*
No, wait. How do we get from runs to wins?
Oh, right. As a general rule, 10 runs equals one win, but that changes from season to season. In years when less runs are scored, it takes less runs for win, and vice versa — if every game finishes 5-3, one run is 12.5% of the total scoring. If every game is 13-7, one run is 5% of the scoring. A run is a run is a run, but all runs aren’t equally valuable — a run in a lower scoring league is more valuable because it’s more of the total scoring.
Anyway, Baseball-Reference says Pagan is 39 runs above replacement this season, and turns that into 4.0 Wins. Fangraphs says 32.3 runs, and turns that into 3.3 wins. So ten runs roughly equals a win this season.
And that’s it?
*Even more awkward high five*
So . . . what’s all this good for?
It’s probably the best way we have to eyeball who’s having a good season, because it accounts for most things. It also lets us compare players across different eras more easily than something like batting average and home runs would. The league average batting average and the amount of home runs hit change year to year, sometimes going through huge dips and rises. The goal of the game on the other hand — to win — remains unchanged.
Okay. Are there problems with WAR?
Oh yes. Many.
It’s a counting stat, so it has some of the same problems as runs scored and RBI. That means if there are two equal players, the one who plays more will have a higher WAR.
For example, good players who play for bad teams — bad teams that don’t score many runs — will get less plate appearances over the course of a season and less WAR because of that. Last season, Albert Pujols played 160 games for the NL Central Champion Cardinals; he had 700 plate appearances. Adrian Gonzalez played 160 games for the fourth place San Diego Padres; he had 681 plate appearances. Similarly, American League players will generally get more plate appearances than National League players, solely due to the pitchers using up outs in the NL. Most of the WAR leaders this season are in the AL.
It also doesn’t take into account the timeliness of hitting, so a home run in the first inning is worth just as much as a home run in the ninth inning. How big of an issue this is depends on your thoughts on clutch hitting, which is another story.
For some reason, no one ever brings up clutch fielding, but it doesn’t measure that either. Also, as we saw above, defense in general is a mess. Defense is easily the biggest problem. If you see someone at -30 runs or +30 runs, it might be a fluke throwing off their value. Things smooth out over a career, but season to season is a minefield.
It’s also not going to measure immeasurable things, such as leadership.
So it’s got problems — why should anyone use it? Marvin Gaye says WAR is not the answer.
It’s the best system we have. It’s not perfect, but it’s the best we have, at least so far. It might be a Model T, but it sure beats walking everywhere. Jeff Francoeur would certainly agree that walking is overrated . . .
Sorry, lame joke.
Oh, no, I wasn’t booing that awful joke. It was just a reflex from hearing Francoeur’s name.
*Awesome, non-awkward high five*
So, show me Pagan’s WAR again, only this time broken down.
Offense: 13.8; Defense: 6.4; Replacement: 11; Position: 1.1 = 3.3 WAR
Offense: 12 ; Defense: 16 ; Replacement: 9 ; Postion: 1 = 4.0 WAR
Pagan’s been good?
Pagan has been awesome.
Where can I read more about this?
I’ll do that one later in the week. It’s even more complicated, believe it or not, but there’s no baseball on, so what else am I going to do?
Image via slgckgc’s Flickr.