By Christopher Cwik
No area of player evaluation in baseball has seen a bigger rise in importance in recent years than fielding. While the early 2000s focused on power and patience, the spectrum has shifted. Defense is being stressed more than ever before, and meanwhile, we've seen the creation of various advanced defensive metrics. There's no better example of that than Elvis Andrus' eight-year, $120 million extension with the Texas Rangers. Though it seemed crazy to hand a shortstop with a career wRC+ of 86 at the time a huge deal, the move was actually celebrated by stat types. Andrus' performance in nearly every one of these fielding metrics said the deal was more than fair.
Defensive metrics, however, do come with a bit of a stigma. Even those who have created their own metrics will warn against the reliability of these stats, because defense is incredibly difficult to actually measure, and these stats are relatively new. On top of that, the people who create the stats usually keep some of their calculations private, making it hard to know exactly what goes into these metrics. Further complicating the matter is the fact there's no one defensive metric everyone supports. Instead, the most prominent sabermetric websites feature different metrics, all of which spit out slightly different figures.
These issues make it tough to know exactly how to use defensive stats. Though they may have their flaws, these metrics still play a significant role in player evaluation. And while the different stats seem complicated, they are more similar than they appear.
Before exploring the intricacies of each stat, it's important to understand why these metrics were created. The simple answer is that the two most cited defensive stats prior to the creation of these metrics -- errors and fielding percentage -- have major flaws. Errors are determined by the official scorer, and can be susceptible to human error. Anyone who has watched enough games can remember an instance where they thought an error would be given, but was not. Fielding percentage also uses errors, but also doesn't take a player's range into account. A rangy player doesn't get extra credit for reaching more balls than a statue who can make plays only when the ball is hit right at them.
Because of this, stats like Ultimate Zone Rating (UZR), Defensive Runs Saved (DRS) and Total Zone (TZ) were created. Though these three stats work in slightly different ways, they all have the same goal. "Defensive metrics come down to three questions," according to Russell A. Carleton of Baseball Prospectus. "How many plays should [a defender] have made? How many did he make? What's the difference between those two numbers?"
That's precisely what UZR attempts to do. Created by Mitchel Lichtman, UZR is set on a scale where 0 equals the league average. A positive value means the player was an above-average defender, while a negative value means that player was a below-average defender. The higher the number, on either side, the more extreme the opinion about that player. The resulting number attempts to tell fans how many runs a player saved or cost his team due to defense. For reference, Baltimore's Manny Machado led all fielders with a 31.2 UZR last season. Shin-Soo Choo, playing out of position in center field for Cincinnati, was last in the league, with a -16.9 UZR. The data that goes into UZR is pulled from Baseball Info Solutions (BIS), a company that specializes in collecting delivering and baseball data. UZR can be found on FanGraphs.com.
In order to determine whether a player should have made a play, UZR uses a zone-based method to determine value. In this scenario, the field is broken up into different zones. Players are then assigned value based on how many plays they make in each zone. For example, let's say the shortstop zone is zone six and the third baseman zone is zone five. A ball is hit where those two zones meet, but Andrus is able to get over there to make the play. UZR does a few things here. It calculates how often Andrus typically makes plays in that area, and compares it to how often every other shortstop makes that play. If few shortstops make that play, or Andrus is able to get over there a fair amount, he receives a small fraction of positive value to his UZR. He is the only player whose UZR is impacted in this scenario.
Now, let's take the same situation and assume the ball was not fielded. UZR penalizes both the shortstop and the third baseman in this scenario. This is one of the unique aspects of UZR. On plays where the ball is not fielded, multiple players can be penalized. In this scenario, let's say shortstops only make that play 30 percent of the time, but third basemen usually make that play 65 percent of the time. The third baseman would receive a larger penalty than the shortstop for not getting to the ball.
UZR also uses multiple years of data when determining how much to award or penalize a defender. When a defender makes a play, UZR looks at the last few seasons and isolates similar plays in order to determine how much value is assigned to a player. It is also park-adjusted, meaning it compensates for defenders who have to deal with strange quirks in their park, like the Green Monster in Boston. The data also includes infield shifts, but only when MLB's stat stringers -- people who record play-by-play data for MLB -- indicate there is a shift.
Like most metrics, UZR does have its flaws. First baseman do not get credit for scoops, which could penalize certain players who may not have range, but can make up for it at the bag. Another issue has to do with how individual players are valued. "A player can play exactly the same, but look better or worse depending on how everyone else in the league is doing," says Jeff Zimmerman of FanGraphs. Since UZR compares players to their contemporaries, their performances are all linked, which can sometimes skew the numbers. When an otherworldly defender is introduced at a position, like Andrelton Simmons last season, other shortstops may see their defensive numbers drop slightly due to Simmons' insane skills.
Different defensive strategies and positioning can also alter the numbers, according to Zimmerman. He referenced a situation on the 2012 Royals to illustrate his point. Shortstop Alcides Escobar, who is regarded as an excellent defender, was rated horribly by UZR. Third baseman Mike Moustakas, meanwhile, was leading the league in defense at third. Zimmerman said after watching a few of their games, he noticed the team was allowing Moustakas to take anything shallow hit to the left side of the infield. This took balls away from Escobar, and left him with the deeper, harder to field balls, which impacted his defensive numbers.
DRS is similar in many ways. Both UZR and DRS operate on the same scale, where 0 is league average. DRS also utilizes zone-based data, which is pulled from BIS, in order to rate defense. Like UZR, DRS attempts to adjust for park factors. Each stat has slightly different ways of calculating the final product, but both are actually fairly similar. DRS was created by BIS, and is prominently featured in The Fielding Bible. It can also be found on Baseball-Reference.com.
The biggest difference between the two is that DRS uses a one-year sample when comparing plays. If Andrelton Simmons makes a play in 2013, that play is compared to all similar plays from only 2013. This is unlike UZR, which compares that play to multiple seasons of data before assigning a value. While both systems utilize zone-based data, DRS breaks them up into smaller zones in order to attempt to be more precise. If unclear if the data is more precise, but it's one of the main differences between the two stats. DRS also adjusts for whether the first baseman is holding a base-runner, and whether middle infielders are covering second on a hit-and-run play.
The biggest point of controversy with DRS lies in its sample size. Since DRS uses only one year of data to compare fielders, it's not pulling from a large set of data. This is done to compensate for possible year-to-year changes in around the league. DRS operates under the belief that teams will utilize better defensive players as the league continues to recognize the importance of defense, and that those numbers should not be compared to past seasons, when the league didn't value defense as much. There's a reason for the adjustment, but it also leaves DRS with a smaller sample of data to pull from.
Invented by Sean Smith, Total Zone (TZ) is used prominently in both UZR and DRS. Since BIS data only goes back to 2002, TZ was created to determine defensive values for historical players. Any defensive numbers prior to 2002 were created using TZ. Since BIS did not exist prior to 2002, TZ relies on Retrosheet data. Retrosheet is a website that collects extensive play-by-play data for baseball games pre-1984. Retrosheet data isn't as exhaustive as what BIS provides since past play-by-play data didn't include things like speed off the ball off the bat, or angle at which the ball travels.
Because of this TZ mainly looks at where the ball is hit, and whether it was a ground ball, line drive, fly ball or popup. It then takes this data and determines how often a fielder makes a play on those types of balls, and assigns value. This system is adjusted on the same scale as both UZR and DRS, so all the numbers are consistent. TZ also integrates park factors, outfielder arm and catcher data in its calculations. Like UZR, TZ pulls from multiple seasons of data before assigning value. Despite TZ being slightly less sophisticated, it's "pretty faithful about recording where the ball went," according to Carleton.
With all defensive metrics, there comes a question of reliability. At what point can these numbers be trusted? "If I was trying to get [a player's] value, I would look at a three-year comparison," says Zimmerman. Carleton agreed, adding that infield data was more reliable than outfield data. "Any measurement of outfield defense should be taken with a big chunk of salt," according to Carleton. Sean Forman of Baseball-Reference.com went a different route, saying that he feels comfortable using a season worth of DRS in order to get an accurate look at a player's value in that season.
Forman admits he's more comfortable than most when it comes to using a one-year sample, though he does add that in situations where he wants a player's true talent level, and not just what happened last season, he would rely on multiple seasons of data. Forman specifically cited Simmons, saying he wouldn't want to compare Simmons to Ozzie Smith until he had multiple seasons of data. Forman's point brings up another potential issue with defensive metrics. What should be done when there isn't a large enough sample, or when UZR and DRS disagree about a player?
In those situations, utilizing scouting reports is helpful. Despite the fact that defensive metrics exist, scouting remains an integral part of player evaluation. There are some instances where metrics will produce questionable results, and scouting reports can either confirm those figures, or cause analysts to delve deeper into why a player with a good scouting report rates poorly according to the metrics.
Since the scouting reports are compiled prior to the metrics, they can provide an honest look at a player's defensive capabilities. Baseball Prospectus' prospect maven Jason Parks says he doesn't use defensive metrics when compiling a scouting report. "To be honest, I never really look at [defensive metrics]," says Parks. He explains, "sometimes [the metrics are] telling me things that my eye disagrees with." Parks is far from an anti-numbers guy, but doesn't want his grades to be influenced by data that tells him what he should be seeing. If he's seen a player, he's going to trust his own evaluations.
One of the biggest differences between scouting and the metrics is the sample at which they start to become relevant. Where the stats crowd says three-to-four years of data is a good sample, the scouts may submit evaluations after just three-to-four games. "I will not put a grade on a player in a small sample," says Parks. "I want to see a full series." Typically, a series will include three to four games and a few player workouts. Parks admits that context is important when scouting. Over the course of a few games, you have to be fortunate enough to see a player get the opportunity to make plays in the field. He adds that a report can be skewed dramatically based on how the player looks over a short series. On top of that, scouts rarely scout just one player at a game, so they are devoting their focus to multiple players at the same time in some instances.
Again, while there are some flaws in scouting, it's still an incredible resource, especially when paired with the metrics. "I think the best organizations are the ones that have a really strong scouting side and a really strong analytical side," adds Parks.
The problem with scouting reports is that, outside of a select few baseball insiders, very few have access to this information. Prospect analysts, like Parks, can relay what other scouts have told him, but most of the actual reports will never be seen by the public. In order to combat that, Tom Tango, a respected analyst who works under a psuedonym, created the Fans Scouting Report (FSR).
Tango said the idea behind the FSR stemmed from Bill James crowdsourcing ratings for players back in the 1980s. Fans would mail James letters detailing their own ratings of players. Tango wanted to revive the idea. "With the internet, the reach was far higher, and the feedback is instantaneous," he explains. Fans can grade players based on their instincts, speed, first step and arm strength, for example. The final calculation of all those figures results in an overall rating of that player's defense. Tango adds, "of all the things I've been involved in, I think this is the one I'm most proud of." The FSR can be found at FanGraphs and on Tango's personal site, Tangotiger.net.
As with every other method of player evaluation, there are flaws in the FSR. The data is coming from fans, not experts, which can lead some reliability issues. While some analysts choose not to look at the FSR, Zimmerman said that he has turned to them when he didn't know much about a player. Tango believes there is value in the project, saying the FSR gives a "completely different perspective" than the metrics. He adds that all metrics have uncertainty, so it's up to each person to "weight them as best you can."
There's some hope that a better defensive metric is on the horizon. FIELDf/x is a service created by SportVision that attempts to "digitally record the position of all players and hit balls in real time." SportVision is the same company that developed PITCHf/x, which has revolutionized how analysts evaluate pitchers. FIELDf/x promises to provide things the metrics do not, including fielder speed and path to the ball and the exact positioning of a fielding before, during and after each play. Forman referred to FIELDf/x as a "game-changer."
The problem is, FIELDf/x data has not been made public, and there's a question of whether it will ever be released. "FIELDf/x is this great tease," says Carleton. "We know it exists, we know what it does, but we don't have it." There are a number of reasons why the data hasn't been released. It's Sportvision's project, and they have every right to protect their product. There are also fears that the amount of data recorded by FIELDf/x is massive, and would be incredibly difficult to sort through.
Forman remains somewhat hopeful that SportVision and Major League Baseball will release some of the data. He says that, if anything, they could choose to release data from past seasons, which would allow analysts to get a better idea of the accuracy of the current defensive metrics. If past data was released, the current metrics could be tweaked in order to provide more accurate results. While FIELDf/x would drastically change the way the metrics are looked at, Parks said it would not impact how he scouts players.
The metrics may not be perfect, but that doesn't mean they aren't useful. "Conceptually, we're way ahead in our understanding of defense than we were 10 years ago," explains Carleton. The current stats are still in their primitive stages, and will be tweaked as analysts gain are able to further their understanding of defense. While FIELDf/x could be the push the metrics need to take the next step, the current metrics are already viewed as an upgrade over the traditional stats. As Carleton concludes, "The reliability [of the metrics] isn't quite where we would want it, but it's a hell of a lot better than fielding percentage."
* * *
Chris Cwik writes for various baseball sites on the internet, CBSSports.com and FanGraphs.com. He has also contributed to ESPN and the Hardball Times Baseball Annual. Follow him on Twitter at @Chris_Cwik.