Tango on Baseball Archives

© Tangotiger

Archive List

Winter Leagues, Redux (February 5, 2004)

"Chance" is the cumulative binomial probability distribution for Nsame and N, assuming a 50/50 chance for Nsame; the closer chance is to 1, the more likely this is a real effect, and not just random dumb luck.
...
For many of the players on these lists, their winter league performance really indicates nothing more than a return to form following an unusual regular season.

This second sentence is really the key here. I wouldn't be surprised if all of this was simply regression towards the mean. I think the results, like saying that the closer chance is to 1, the more likely this is a real effect, and not just random dumb luck does not fit the sample properly. The guys who increased the most in winter ball play were probably guys who had abnormally low EqA, and would have played better, regardless.

What I would have done is looked only at players with an EqA of .220 to .240, and THEN, looked at how they did in winter ball to see what happened the following year. I wouldn't be surprised if the effect would be rather muted, and nowhere near the effect that is being shown.


--posted by TangoTiger at 04:54 PM EDT


Posted 7:05 p.m., February 5, 2004 (#1) - MGL
  I have to agree with Tango 100%. This is a horrible and misleading study I think. A classic case of how regression and selectve sampling can give you completely erroneous and misleading results.

Selective sampling in baseball, as the name implies, occurs when you sample a group of players who were not randomly or at least sem-randomly chosen. You get in real trouble when the group you select non-randomly is biased in terms of being very lucky or unlucky, which can happen all the time in these types of studies.

Clearly all players who improved from 2003 to the Winter League are made up of players who, as a group, were either unlucky in 2003 or lucky in the Winter League or a combination. Vice versa for players whose performance declined.

Of course, the Winter League stats give you more information to make a projection for 2004, but if you want to figure out what kind of weight to give it or if you should have a different weighting system for banner Winter League seasons (which is possible, as I found out in my banner year BB study), you HAVE TO adjust the data for the effect of the selective sampling and regression.

I cringed when I read this study, because it was obvious what was coming, and I cringed again when Clay only casually mentioned the regression issue, as if it was a potential and minor problem. It is a huge problem which invalidates the results and conclusions of the entire study! Unless I'm missing something, I am shocked that someone as good as Clay would overlook (or at least severely minimize the importance of) such an obvious thing...

Posted 5:35 a.m., February 6, 2004 (#2) - Pete Rose
  What's the over/under on the number of times MGL has cringed in the last month?

Posted 5:36 a.m., February 6, 2004 (#3) - Pete Rose
  I'll take the over.