In 2005 I implemented a regression algorithm for the LH/RH splits on the cards, replacing the legacy PtP method of basing each side of the card on the raw split data. This is a brief explanation of the rationale and methodology.
The only way the legacy PtP system can generate accurate results is if the “user” ensures that each card receives the proper PA distribution of LH/RH. What years of experience have shown is that our “users” do not do this, nor do they want to do this. The game isn’t as fun when player usage is rigidly enforced. Therefore, the PA distributions are not allocated properly and the overall player performance is skewed. If we strike a balance between replicating the platoon split and replicating the overall performance of the player we will improve the overall accuracy of the system.
Now, one one end of the spectrum we have games which ignore platoon splits or enforce a standard split for all players, and on the other end we have the PtP-inspired system which treats vLH/vRH data separately. I decided that a hybrid system, one which regresses platoon splits toward the norm, retains the original intent of the PtP system which we all enjoy while making the game more accurate in reproducing overall performance. This rationale is not based on ability, but rather on a tacit acknowledgement of how our “users” interact with the product.
Implementation
Once the raw target totals have been calculated for a player card, they are passed into the regression algorithm. The regression algorithm takes the existing LH/RH ratio, and regresses it on a sliding scale toward the standard ratio (LL/LR for LH, RL/RR for RH).
Regression amount is a simple linear function starting at 75% for 0 PA and scaling down to 0% at 900 PA. This divides neatly, with 300 PA being the 50% point and 600 PA being the 25% point. For each event, we determine which side generated more offense and use that PA sample as the value to plug into the regression function. We do this because the offense has final decision on the pitcher/batter machup.
When adjusting the events to match the new target ratio you have to use an “exchange rate” based on the PA split. So, if you’re moving events from the LH side to the RH side you have to do so at the ratio of RH/LH PA.
Out of curiosity, did this change also have the effect of smoothing out zoids with extreme LH/RH splits? I’m thinking, by way of example, of a Jeff Reed card from back in the mid-90’s — the one where most of his card against LH was a HR result due to 3 or 4 HRs in only 11 PAs against LH.
Oh — and hi — nice to see everything still ticking along after all these years.
Yes, that’s exactly what the split regression system is designed to do. It preserves a reasonable individualized platoon split for each card, but makes it much more likely that a player will approach his overall stat line.
Good to hear from you after all these years Richard!
[…] regression tool I created a tool that will allow you to visualize how the platoon split regression system will affect individual players. February 9th, 2011 | Tags: design | Category: IBL […]
With the split regression formula, I’m guessing that hitters with adequate sample date (say, 90+ plate appearances vs. R or LHP overall), adjustments to the split numbers on their cards will not be so great, correct? If they have a lot of at bats, it should already accurately reflect actual performance in the split numbers. I’m thinking about an old Rob Deer PTP card from ’92. His HR numbers were HUGE against LHP and he was given something like a 93 AB limitation against LHP, but his tendency toward injury (frequent) and a lack of better options on the Detroit bench seemed to keep his platoon numbers from getting too skewed. Of course, I managed him like his actual managers tended to…sitting him down against really tough RHP with reps for big strikeouts and very few fly balls (he never had close to 500 AB in a season in his career, even when healthy).
The more playing time on the side that generates offense the less the splits are regressed toward the mean. The IBL used to have rules governing the use of players that PtP would give PA limitations, but we just voted to remove those restrictions during the regular season because of split regression.
If you want to see a simple model of split regression in action you can try using the web tool I created.
Hi,
Is there a way to see the split regression formula, so I can plug into an excel spreadsheet? By the way, this is great idea, it eliminates the issue with card creation and those cards with crazy numbers for Snell PA’s.
The actual formulas used to generate the cards aren’t going to work in Excel because splits are regressed to the MLB averages which change every season. The formula used in the web tool is simplified to regress to an even split, which is close enough when you’re trying to get a general idea. If you contact me via email I can share the parameters of the simplified scheme.