BaseRuns - Addendum

Some data and formulas used in my BaseRuns article

By Tangotiger

1974-1990, all data generated from Retrosheet and Ray Kerby

EventsOfficial Statlwts_RClwtsFreqABCD
Singlex0.4630.46043651810.72600
Doublex0.7530.75010321511.94800
Triplex1.0351.0331596213.13400
HRx1.4021.4025387401.69401
Walkx0.3030.30320243510.05200
IBBx0.1760.176222601-0.48300
HBPx0.3300.3301265210.16300
Error-0.4810.4783127010.79900
Interference-0.3570.35736910.27700
OtherSafe-0.6310.631294711.43300
Sacx0.106-0.090282510.0800.7270.9200
Strikeoutx-0.111-0.2693613330-0.05710
Out--0.098-0.26513673210-0.00410
SBx0.1930.1934872900.81300
CSx-0.282-0.437205430-1.18800
Pickoff--0.120-0.22896540-0.50700
PickoffError--0.082-0.182300-0.34700
Balk-0.2500.250498001.05400
PB-0.2760.276502101.16500
WP-0.2790.2781837101.17400
DefensiveIndiff-0.1320.13218400.55500
OtherAdvance--0.252-0.3626330-1.06100
FoulError-00105600.00100
implied outs--0.451034610-1.48810

A couple of quick points as to why I did certain things. I tried to weigh the merits of not breaking up "official stats" like strikeouts, so that we can apply these measures historically. However, some "K" are also safe plays. It's rare enough that we can ignore this. But then I get the problem with the sac bunt having safe plays as well. This is why you see that annoying ".08" in the basrunner field. I also have a problem with partial innings, and the case where you have runners left on base. So, I introduced "implied outs", which is the number of outs "left in the game", to balance out. The "lwts" values and the "lwts_rc" values are identical except for the outs. However, you can get outs with a single, which is why the two numbers are not exactly the same. Using the Retrosheet scoring system of events, anything that happens following a single is credited to a single. Even a double-steal counts as "1" in the SB field.

Anyway, there's alot of technical things that don't amount to a hill of beans, but that I had to make certain assumptions/decisions so that everything added up. I'm not sure if all the decisions are correct, and I suspect that I may change some of this in the future.

"Freq" is how often that event occurred. If you multiply the "freq" by the "lwts_rc", you get the total number of runs scored. If you multiply "freq" by "lwts", you get zero (or pretty close to it).

The BaseRuns formula follows the true definitions of scoring runs: BaseRunners x scoreRate + HR. BaseRunners is denoted by "A", scoreRate is "B" / "B" + "C", and HR is "D". Again, you can make the case that the CS removes a baserunner, except that sometimes a CS is still safe. And you can also take it from the point of view of "initial baserunners". Again, I'm not 100% sure that these decisions are the best ones.