-The K Zone-
December 25th, 2016
Dissecting WAR, by Ian Joffe
For year now, there has been a war going on over WAR. Wins Above Replacement (WAR, for short), a statistic engineered to encompass a player’s full, true contribution to a team, has drawn “radicals” both for and against it, as well as a bunch of people who have no idea what is going on, or want to know more. This article is for those people, who seek want to know all they can about the statistic, and then decide whether or not it should be followed. First, there are a few things that need to be cleared up. There is not one WAR, in fact, there are three major sites that provide their own statistic: Fangraphs has fWAR, Baseball Reference has bWAR, and Baseball Prospectus has WARP. I will begin by explaining fWAR, and then discuss its similarities and differences to the other calculations. Furthermore, the is WAR for both hitters and pitchers. However, batter WAR is considered more refined, and is much more widely used, therefore I will stick to batter WAR in this article. Indented statistics make up the nearest less-indented statistics above them.
WAR (Wins Above Replacement): This is the big stat we are trying to break down. It attempts to measure a player’s value above replacement level, in the form of wins. Replacement level refers to what would happen if the player suddenly disappeared, and had to be “replaced” by the average minor league or bench option. The three WAR models each calculate how many wins a replacement level player is worth, and use it to calculate how many “Wins Above Replacement” someone else is worth. Additionally, WAR is park-adjusted and league-adjusted, meaning it uses league averages and park averages to put all players on an equal playing field. One hitter will not score a higher WAR because their opponents has easier pitchers or they player in a hitter-friendly ballpark.
Hitting Runs: wRAA (Weighted Runs Above Average): wRAA is a statistic that compares a player to the average major league hitter, and says how much better or worse they are. wRAA only measures hitting ability, and ignores areas of the game like speed and defense. A player with a positive wRAA is above average, and player with a negative wRAA is below average. This is a little different from WAR, which compares a player to replacement value, not average value. It will manipulated a little in order to adjust for league and park, and then make it fit with replacement value.
wOBA (Weighted On Base Average): wOBA is like slugging percentage on steroids. It is the primary component of wRAA. While SLG simply makes a single one point, a double two points, etc., wOBA works to provide the true values of each of those four outcomes. It turns out a double is only worth about 1.4 times as much as a single, and a home run is worth only about 2.4 times as much as a single. Walks are also included in wOBA (worth slightly less than a single), unlike SLG. The so-called weights (how valuable each play is) is calculated using the run expectancy matrix (the decimal is the odds of scoring a run in that inning):
|Runners||0 Outs||1 Out||2 Outs|
|1 _ _||0.831||0.489||0.214|
|_ 2 _||1.068||0.644||0.305|
|1 2 _||1.373||0.908||0.343|
|_ _ 3||1.426||0.865||0.413|
|1 _ 3||1.798||1.140||0.471|
|_ 2 3||1.920||1.352||0.570|
|1 2 3||2.282||1.520||0.736|
Singles, Doubles, Triples, Home Runs, HBP, BB: I hope these are self-explanatory
Baserunning Runs: Baserunning Runs looks the different occurrences on the basse paths, and assigns each a run value. It then turns the number into WAR format.
wSB (Weighted Stolen Bases): wSB uses steals itself to make a more accurate measure of how much a player’s steals are worth. It not only looks at the sheer number of bases taken, but how often a player chose to run, and how often the runner got caught.
SB (Stolen Bases)
UBR (Ultimate Baserunning): UBR tries to provide for almost every baserunning event that occurs when the ball is hit, such as fielder’s choice, tagging up, and advancing two bases on a single or three on a double. It uses video tracking systems to come up with its raw statistics.
7 Possible Situations: I am not going to list them here, but you can check the list out on fangraphs here.
wGDP (Weighted Groundballs into Double Plays): wGDP (NOT Gross Domestic Product, I hate it when people make that joke) looks at how often a player has an oppurtunity to ground into a double play, and how often it happens. This stat is only a minor addition to the WAR puzzle.
DP (Double Plays), DP Opportunities
Fielding Runs: UZR (Ultimate Zone Rating): UZR uses video tracking technology to watch each ball hit at a fielder. It looks at such information as hang time and distance from the fielder, and then takes a binary input: did the fielder catch the ball or not. This works well for data like range and how often a player makes an error, and is generally a good representation on how well a player fields. NOTE: there is no UZR for catchers, so WAR uses alternate statistics.
Such Raw Statistics as flyball/groundball/liner/”fliener,” and hard/medium/soft contact
Positional Adjustment: WAR adjusts a player’s score based on their position. If a player is at a position with very poor other players, they are considered to be worth more, and vise versa. Currently, catcher is considered the weakest position, with shortstop as a close second. DH is considered the strongest position, followed by first base and corner outfield. If a player spends split time between positions, they will receive split adjustments based on how many games they played at each.
Well, those are the basics for the fWAR calculation. bWAR is very similar for the most part, but has a few minor differences. For example, they use DRS (Defensive Runs Saved) instead of UZR for defense, but those two stats are very similar. bWAR also has different positional adjustments and a different replacement level. WARP is, unfortunately, less transparent. We do not know exactly how it is calculated, but we do know that it is a far more complex version of VORP, a statistic that attempted to be similar to WAR, about 15 years ago. For these transparency reasons, I generally use WARP less than the other two models, but despite those issues, it is often hailed as the most accurate of the three. Anyways, the measurements almost always come out to be extremely close no made which model you use.
For context, a player with a WAR of about 2.0 is considered average, a player with a WAR of 4.0 is considered all-star level, and a player with a WAR greater than 6.0 is considered a legitimate MVP candidate. In 1923, Babe Ruth racked up 14.1 bWAR, the most all time in one season. He also holds the second and third place records. Ruth also owns a miraculous 183.6 lifetime bWAR, by far the most in history. Out of modern era players who did not take PEDs, Bill Ripken leads the charge 1991 11.5 bWAR (Barry Bonds accumulated 11.8 bWAR in 2001). Mike Trout led 2016 in bWAR, posting a 10.6 mark, follwed by Mookie Betts (9.6) and Kris Bryant (7.7). Every year, a many regulars finish with a negative, WAR, meaning a replacement player would have done a better job. Not to start a roast, but last year Alexi Ramierez had the worst fWAR in baseball, at -2.4. Better luck next year, Alexi!
So, the final question is, how good is WAR of a stat, really? Wins Above Replacement drew a lot of criticism in 2012, when many people argued AL WAR leader Mike Trout should have won the MVP rather than Triple-Crown winner Miguel Cabrera, who held a far worse WAR in any model. By sabermetric standards, the triple crown, consisting of hits, home runs, and RBIs, is a very outdated model, not even close to what WAR has reached today. And, I personally tend to agree with them them. WAR is a terrific stat, one of the best we have, and it should be strongly considered when discussing a player’s worth. Also, team WAR has shown to have a large correlation with team win/loss record, which is a good sign that it means something, But WAR is not everything. While it may appear to be an “everything stat,” it hardly is that. The truth is that while WAR is some of the best information we have, it is still highly imperfect. However, when taken with a grain of salt, WAR can be a highly useful statistic to front office and fan alike.
Images attributed to: