This is an overview of how defensive range ratings are derived from BIS ZR and OOZ data. This system was initially developed for the 2008 card set. All cited examples are from the 2011 card set (2010 MLB data).
Basic Principles
ZR (zone rating) and OOZ (out of zone plays) are converted into plays +/- average per 162 games (1458 innings). The number of range plays each defensive position makes over 162 games is estimated, with average range (F) set to 50% of this number. The raw range rating is subject to downward adjustment for part-time players.
Breaking Down the Charts
In order to do direct mapping we have to figure out how the range plays are distributed per position. First, we calculate the PA distribution of pull/spray/opposite hitters for both IFR and OFR:
IFR OFR lp/ro 12.8% 15.3% lsp 29.4% 28.7% rsp 26.4% 40.0% rp/lo 31.4% 16.0%
Then we need to determine how many plays hit the IFR/OFR range. In 2430 MLB games there were 187,352 plays which would be determined by a standard 3-die roll, resulting in roughly 77.1 rolls per game. If a single player were to play every inning in the field (keep in mind an individual player is only on the field 50% of the time) there should be around 6245.1 rolls ( 77.1 / 2 * 162 ). By plugging the average number of range plays on the cards (50.1 IFR, 27.0 OFR) and the pull/spray/opposite distribution into the charts we can determine how the range plays would be distributed on the field:
p 9.4 1b 67.7 2b 86.5 ss 98.2 3b 90.1 lf 72.8 cf 99.4 rf 75.7
So, these are the number of plays we have to work with at each position.
Converting RZR/OOZ to Plays vs Average (+/-)
For RZR this is fairly simple as it is a conversion rate, so you assume an average number of balls were hit to the player with his actual conversion rate (RZR) and then compare that to the number of plays which would have been made with an average conversion rate. OOZ is more complicated because we don’t have a denominator. In an attempt to weed out distribution bias, what I’ve done is determine for each player how many balls (ground balls for infielders, fly balls for outfielders) were fielded by each position on the field while they were in the game (denoted as f1 for pitcher, f3 for first base, etc…) Then we determine whether those various positions fielded more or fewer balls than average based on the overall MLB distribution.
Using Alexi Ramirez as an example:
groundball distribution (bunts excluded) p 1b 2b 3b ss lf cf rf 145 208 406 300 476 117 145 91 vs average p 1b 2b 3b ss lf cf rf -6.2 1.3 8.0 -30.9 32.7 10.1 30.5 1.9
IOW, while Ramirez played ss the White Sox 3b fielded 300 grounders (not all of these were turned into outs, it includes infield hits and errors), which were almost 31 less than what you would have expected given the MLB average distribution.
The next step is using the “vs average” numbers to determine a play bias. The assumption I’ve made is that any ball that was fielded by the position or the outfielders was a potential opportunity, whereas any ball fielded by an adjacent infielder by definition could not have been an opportunity (only one infielder can get credit for OOZ). So, the opportunity adjustment calculation for a shortstop is:
f6 + f7 + f8 – f1 – f4 – f5 (we ignore 1b & rf)
so for Ramirez this ends up being:
32.7 + 10.1 + 30.5 – (-6.2) – 8.0 – (-30.9) = 102.3
then we convert this into a ratio:
total balls (f1+f4+f5+f6+f7+f8) / total balls + opportunity adjustment
for Ramirez this ends up being 0.939
The calculations above will be slightly different for the corners, e.g. for 3b we’d only use f1/f5/f6/f7.
We then multiply the number of OOZ plays by this opportunity adjustment. Coverting OOZ into +/- then follows the previous process, where you compare the number of adjusted OOZ plays vs the average player.
Range Ratings and PT Adjustments
The raw range rating is the combination of the two +/- values scaled to 162 games (1458 innings), yielding a total net plays above/below average. This is then converted to a raw range rating between 0 (K range) and 10 (A range).
Because there is a wide range of difficulty between defensive plays it is easier for players who log lower amounts of playing time to do well simply by avoiding the tougher plays. Therefore the raw range rating is subject to playing time adjustments which I have discussed previously. I’ll simply summarize this adjustment by saying that the less innings a player logs in the field the more difficult it is for that player to achieve a high range rating.
New Bonus System
The playing time adjustment has an unfortunate side-effect of punishing utility players who do not log a lot of innings at any specific position, so I’ve tried to come up with bonuses to help these players. Previous years used a rather simplistic system (which is detailed at the end of the post referenced above), but for 2011 I’ve implemented a new bonus system which works much better.
The range bonus is added to the PT adjusted range and is calculated using the following formula:
innings @ similar position / 1080 * ( unadjusted range @ similar position – PT adjusted range )
The bonus cannot result in a range higher than the unadjusted range (e.g. if unadjusted range is 5, adjusted is 2, and the bonus is 4, the resulting range is still 5). The “similar positions” used for the bonus are:
pos similar --- -------- 1b 2b,3b,ss 2b 3b,ss 3b 2b,ss ss 2b,3b lf cf,rf cf lf,rf rf lf,cf
(for ss & cf the range at the similar position is required to be at least 3 grades higher in order to result in a bonus)
Example using Jerry Hairston:
uRg = unadjusted range, aRg = PT adjusted range pos inn uRg aRg bonus final ss 490 8 8 0 8 2b 381 8 7 0.5 8 3b 19 10 2 4.8 7
At 2b, Hairston’s bonus is:
490 inn @ ss / 1080 * ( 8 - 7 ) + 19 inn @ 3b / 1080 * ( 10 - 7 ) ( 0.45 ) + ( 0.05 ) = 0.5
resulting in a final range rating of 8 (C range).
At 3b, Hairston’s bonus is:
490 inn @ ss / 1080 * ( 8 - 2 ) + 381 inn @ 2b / 1080 * ( 8 - 2 ) ( 2.72 ) + ( 2.12 ) = 4.8
resulting in a final range rating of 7 (D range).
Love the work Sean, but it made me wonder whether some of the range adjustments should be weighted based on position. In the your example above using Alexi Ramirez, the 3b and pitcher probably have a higher chance of stealing an opportunity from Ramirez than a 2b does. You could argue that half of the balls the pitcher fields are missed opportunities for the ss, where as less than 25% of the 2b chances would be an opportunity for the ss. I’m curious to your take on this. Oh, and thanks for all the awesome work and information that you provide.
Mike
You bring up a good point. The adjustment that we’re talking about is essentially a crude attempt to account for the fact that one guy’s OOZ play could be another guy’s routine RZR play. However, I’m not sure how you come up with accurate weighting values without more detailed location data, and unfortunately we don’t get that from Retrosheet (if we did then we wouldn’t need OOZ/RZR in the first place). I also think there’s so much noise in the data to begin with that it’s always going to be difficult to know whether your change is making things “better” or “worse”. Generally the way I end up tweaking this stuff is when there’s an easily identifiable counter-intuitive outlier case (i.e. some guy who is generally an excellent/poor fielder getting a “wtf” rating) but I think the existing system does a pretty good job of preventing those.
[…] (0.656 * 146 + (0.352 + 0.175) * 139 + 0.550 * 223) / (146 + 139 + 223) = 0.574 Then standard rules are then applied for playing time adjustments. Since we are using aggregated data there is no need for a complicated bonus system. […]