Updates to TotalZone using MLB Gameday Hit Locations

Total Zone has been around for a little over three years now. I invented the stat in an attempt to make sense of player defense using the information available in the retrosheet files. The stat has become more popular than I ever expected. It can now be found on site like Baseball-reference.com and Fangraphs as well as right here on my site. In general, I think it identifies good and bad fielders more often than not, but due to some of the shortcuts it takes in the abscence of precise data it falls short of stats like UZR and Plus/Minus.

Two shortcomings of TotalZone are the treatment of infield singles, and the assignment of hits between fielders.

On infield singles, I had treated the infield single as a missed opportunity for the infielder. On some levels, this makes sense. An infield single to the shortstop is obviously a ball hit that he could reach, and one that he failed to record outs on. The problem is that by keeping a ball on the infield, a fielder prevents runners from taking extra bases, and sometimes saves runs that would have scored had it been out of his reach and into the outfield. The problem is further compounded by the split credit for balls hit between fielders. The shortstop who knocks a ball down but can't make the play gets charged with one hit. The shortstop who waves at it as it goes into center field gets charge only 0.55 hits, since all retrosheet knows is that a ground ball went through the infield, was fielded by the center fielder, and might have been the responsibility of the 2B or the SS. Since TZ doesn't know, it splits the credit.

On the macro level, it seems to work OK (at least when the fielders are of similar skill), as if you have 100 groundballs to center field, a portion were closer to the 2B and a portion were closer to the SS. Where if you have 100 infield singles to short, on virtually all of them the shortstop was the only player who had a chance to make the play. On the micro level, an individual play, it does not work. When I first put this metric together, I was doing it for my own amusement. At this point, the baseball stat loving audience has access to this metric on a daily basis. That audience might even include some teams, and some players. I cannot imagine a player letting a groundball go by thinking "I can't stop it, it will hurt my TotalZone" - I doubt teams pay enough attention to the stat that it would help a player more than the new cornholio his manager would tear him hurts.

The first thing I can do is to reduce the charge to a player's chances from infield singles. They now count 0.9 as much as outfield singles, since the run value of infield singles is lower. That is still more than the partial blame that a player gets for outfield hits, so that brings us to the second, and much more significant change.

MLB Gameday includes coordinates of hit location for every batted ball in the majors or minors. These are given as X and Y coordinates starting from a spot beyond the left field corner. With an estimate of where home plate resides, we can turn the Gameday hit coordinates into an angle. With the advice from some of the regulars on The Book Blog, I designated second base as 0 degrees, with the 3B line as 45 and the 1B line as -45. Using these angles, I no longer have to guess and split the blame on hits among infielders. I set certain angles as the responsibility of the fielder, creating four infield zones. If a ball goes through that zone, it is charged to that infielder. Deciding which angles are the responsibility of each fielder is not cut and dry, even when I move to the shortstop angles, there are some plays made by the third baseman. What I decided to do was start at the line and move left to right, and assign the angle to the next fielder when they made more plays than the other fielder. The entire field is covered, there is no such thing in this system as a zone where no fielder is responsible for the hits. I set separate zones depending on whether the defense faced a lefty or righty batter. For Righthanded batters: The third baseman starts at 45 (actually, there are a few hits in foul territory, and the third baseman gets those too. A ball that goes past the bag fair and is picked up by the left fielder in foul ground will show an angle greater than 45). The third baseman's responsibility ends at 25. 24 to -1 belong the the shortstop, and -2 to -27 belong to the second baseman. The first baseman gets -28 to -45, plus the ones in foul territory. For lefty batters, the 3B gets 45 to 23, shortstop 22 to -4, 2B -5 to -28, and 1B -29 to the foul line and beyond.

Right now the two changes with the way TotalZone was calculated in the past are the assignment of hits allowed (on outfield singles) and the run value fpr infield singles. I'm using the angles of responsibility to assign the hits to each infielder, there is no such thing anymore as a ball with shared responsibilty. If my angles say 25, that belongs to the 3B, if 24 it belongs to the shortstop. Plays made, errors, and infield hits allowed are calculated the same as before (though infield singles only count as 0.9 plays). Most plays (93% for shortstops) are made within a player's zone. I count a player's in zone and out of zone chances equally. This could be controversial, I will explain further why I do this later on.

Here are the players whose ratings change the most with these new data:

Player Pos New TZ Old TZ
Branyan, Russ 1B 5 -1
Butler, Billy 1B -5 -12
Fielder, Prince 1B -13 -4
Howard, Ryan 1B 0 10
LaRoche, Adam 1B -23 -13
Overbay, Lyle 1B 17 12
Barmes, Clint 2B 1 10
Callaspo, Alberto 2B -8 -15
Ellis, Mark 2B -2 5
Hudson, Orlando 2B -2 5
Izturis, Maicer 2B 8 2
Johnson, Kelly 2B -1 4
Lopez, Felipe 2B 0 7
Lopez, Jose 2B -10 -1
Roberts, Brian 2B -6 -14
Uggla, Dan 2B -1 -11
Utley, Chase 2B 12 7
Valbuena, Luis 2B -12 -7
Jones, Chipper 3B -13 -8
Wright, David 3B -12 -17
Betencourt, Yuni SS -23 -11
Cabrera, Orlando SS -17 -23
Escobar, Yunel SS 28 23
Izturis, Cesar SS 16 10
Scutaro, Marco SS 8 15
Tejada, Miguel SS -16 -22
Tulowitzski, Troy SS 7 16

Under the old system, Chase Utley wasn't getting enough credit for his range, but his ability to stop balls from going into right field was making Ryan Howard look good. Prince Fielder and Adam LaRoche: old TZ didn't reflect just how bad they were. Mark Ellis goes from a slight positive to a slight negative, maybe age and injuries were getting to him. Brian Roberts wasn't as bad as TZ made him look, he just has the misfortune of being an Oriole. Yunel Escobar already ranked as having a great TZ season, the refined numbers make him look even better. Old TZ did not appreciate the horror of Yuni Betencourt's fielding, he goes from -11 to -23. Miguel Tejada and Orlando Cabrera weren't quite as bad as the old numbers had them.

There is a potential problem with treating out of zone and in zone plays the same. On an out of zone play a fielder may not get enough credit. This is because while he gets some credit for making a play, the hit that he saved his team was not his own: it would have been charged to a teammate. This is an old criticism that goes back to the introduction of STATS zone rating back in the early 1990's. Let's use an example to illustrate for range factor. The rules are if a ball is in your zone, it's an opportunity. If you record an out, it's a play made. If a ball is outside your zone, it can't hurt you, but if you make a play out of zone it's both a play made and an opportunity. Say Joe Fielder has made 75 plays in 100 chances, for a .750 zone rating. On the next play, he makes a great play out of zone, but then after that misplays an easy one for an error. His zone rating is now 76/102, or .745. Suppose instead Joe had made the easy play, but dived and missed on the tough one. Since it was out of zone it didn't count as an opportunity, he's now at 76/101 for a .752 ZR. He rates higher despite having exactly as much value to the team as the guy who made the rangier play. The alternative is to keep out of zone plays separate, so that way player A zone rating of 75/101 and +1 on OOZ plays, and player two is 76/101 with +0 on out of zone plays, and their run value would be the same.

When John Dewan reinvented zone rating for Baseball Info Solutions, he gave us exactly this information. Unfortunately, this leads to other problems. Now we run the risk of overvaluing players who have a lot of out of zone plays. Say we have a player with 75 plays made in 100 chances, and another 20 out of zone plays. Say this player is exactly average. Now what happens if the scorers made some errors, and counted 20 of those in zone plays made as out of zone plays? In that case, he's 55/80 in zone, or 5 plays below average. But he's now made 20 more plays out of zone, a net boost of 15 plays above average. For us to give the player credit for that, we need to have some estimate of how many near zone chances he had to make plays, and be sure that out of zone plays were being scored correctly.

The intent of TotalZone is to have a large enough zone that encompasses all of a player's chances. Yes, players still make some plays into other player zones, but more often it seems there is a reasonable explanation for this, and these plays do not all represent great plays that deserve extra credit for saving hits from their teammates.

Yunel Escobar, the top rated shortstop last season, made 48 plays out of his zone (only Miguel Tejada had more). Through MLB.com's game archives, I watched almost all of these plays. On 13 plays, he was on the second base side because the defense used the shift against a lefty pull hitter. These were ordinary plays, not evidence of great range preventing a hit that the 2B should have had. He probably had more chances on the shift than most shortstops, playing in the same division as Ryan Howard and Adam Dunn. These two players hit 11 of the 13 shift balls. There are a few cases where the hit location code is clearly wrong, such as when the coding indicates the ball was in the 3B zone, but the shortstop actually moved slightly to his left to field it, or when the coding says it's on the 2B side (having 2B as a marker makes it much easier to judge where the zone boundaries are), but the shortstop clearly fields it on his side. There were 6 miscoded plays, 21 more where it appears the ball was in the shortstop's zone (though not certain), including some that were routine grounders. There were 2 plays where I couldn't load the game or find the inning in question. Only 6 plays, in my judgement, were outstanding plays where Yunel ranged into another fielder's zone.

My conclusion is that this process is best: Count all plays made, regardless of where the ball is hit. Count hits against a fielder when they pass through his zone. There are some problems, on shift plays that are not made maybe the shortstop should be charged instead of the 2B, depending on where they set up. And if there are some outs that are coded clearly in the wrong zone, then some hits must be miscoded as well. Those are limitations I'll have to live with, as fixing the data errors would basically entail watching every game for every team.