Tango on Baseball Archives

© Tangotiger

Archive List

Banner Years
by Tangotigre

Suppose you have two players, one that had a banner year, followed by 3 average years, and another who had three average years followed by a banner year. Which player will be better in the fifth year?

The question essentially deals with how much to weight each of the seasonal performances, and if recent performance is a better indicator than not. So, what do you think?

True Talent Level

Before, we go on, we should understand what a player's seasonal performance represents.

Every player has a true talent level, which is fairly static, but might change from day to day, because of change in training, or approach, or diet. Layered on top of this is the environment in which a player performs: the pitcher, the fielders, his teammates, the park, the weather, the base-out situation, the inning-score, etc, etc. And to top it off, the player only has 4 chances each day to perform (small sample size). So, a player's seasonal performance represents a combination of his talent, his environment, and luck. The more he can play and the more his environment matches to what the average player sees, the more representative his performance will be.

Unfortunately, one season's performance does not balance everything out. Rather than represent his true talent level, it is only an indicator of his true talent level. Of course, if we can increase the sample size from 500 AB to 2000 AB, things look alot better. To do that, we need to span seasons, which brings along another implication: change in talent level from year-to-year. While the talent level is fairly static day-to-day, as you start expanding this into many years, we start to lose confidence as to what a player's true talent level is.

Ok, now that I got that out of the way, if we look at a group of players, many of these problems are reduced, so that we are left with the aggregate performance level being fairly representative of a group's true talent level. Let's go back to our two players. Have you decided which player would be better, if any? Have you decided how much to weight each of the banner years, if at all?

Study 1 - The Five Year Cycle

I looked at all players from 1919-2001 who over a 5 year period had at least 300 PA (this is a selection bias as well, as only players of a certain talent level will play this much, this long). Then I looked for players who had a banner year (at least 25% better than league average) followed by performing at an average level (within 10% of league average) for three straight years. Unfortunately, I only found 18 such players. I did the same for the second group, but this time I looked for the banner year at the end of the cycle. 19 players fit the bill. Here are the results, along with the group's performance in year 5:


   Year 1   133%   104%
   Year 2   102%   100%
   Year 3   101%   100%
   Year 4   102%   131%
   Year 5   102%   107%
(Stats based on Linear Weights Ratio.)

We see here that having a recent banner year is more predictive than having an old banner year. In fact, the group of players with old banner years really matched their performance without consideration of having a banner year. They were average for the last 3 years, and were average again.

For the group of players with a recent banner year, we see that it does have some impact. However, they kept very little of that performance level in the following year. If you were to just do a simple average of the 4 years, you would get a predictive value of 109%. That is, without giving the recent performance any extra weight, you can predict the group's expected performance level.

Study 2 - The Four Year Cycle

Seeing that the first year didn't have any predictive value for the first group, let's redo our study, this time looking at a 4 year cycle instead of 5 years. (This will also allow us to increase our sample size.) There were 61 players in the first group and 61 in the second. Here are the results, along with the group's performance in year 4:


   Year 1    1.32     1.02 
   Year 2    1.02     1.00 
   Year 3    1.02     1.34 
   Year 4    1.09     1.13

We see here that a 3-year sample should suffice (if we are to assume that the small sample size of the first study is not an issue). We also see that the recent banner year also has more of an effect than an old banner year. However, there is still not that much of a difference.

Study 3 - The Stars

Let's look at one last group of players, before we get too far ahead of ourselves: the stars. There are 655 strings of players who were at least 25% better than league average for three years running. (We might have an issue here as players like Hank Aaron have many strings of 3 great years. I'm not sure how to resolve this, nor if we have to.) Here are the results, along with the group's performance in year 4:


   Year 1    1.49 
   Year 2    1.49 
   Year 3    1.49 
   Year 4    1.42

Interesting. Even though our group of stars were 49% above league average for 3 straight years, their next year averaged 42%. What this implies is that these players were probably slightly lucky for three straight years, and that their true talent level is actually 42% above average. These players retained 86% of their value, meaning that they regressed 14% towards the mean.

Let's put it altogether

Let's go back to our banner year groups. If we assume that 14% of the weight should be given to regression towards the mean, then we can use the following weights to predict next season's performance level:


   24% - Year 1
   24% - Year 2
   38% - Year 3
   14% - average

By using these weights, we can predict the Year 4 values that were produced by the last two studies. As a shorthand, you can say 1 part average, 2 parts each first two years, 3 parts third year.

I also see no evidence that a player's banner year, that is far above his previously established level, is a signature to a new higher established level. In essence, such a player was slightly unlucky for 2 years, and very lucky for one year. He was a 110 player who happened to play 100 for 2 years, and 130 in the third year. While it is possible that some players do have a higher established level of performance, we cannot find it in the traditional seasonal stat lines.

In the past, I've also looked only at HR, to see that maybe some part of a player's talent can signal a new established level of performance, and the results match pretty much with what these new studies show. If you want to look for a new established talent level, you have to look elsewhere.


October 31, 2002 - R P Coltrane

You define banner year as 25% better than league average. Does this account for position? If not, there are probably a whole lot of middle infielders and catchers who've had what I would think of as a banner years whom you're missing out on.

Moreover, I think it would be more helpful, rather than defining banner year relative to the league, to define a banner year relative to the player himself. So, you'd be looking for players who either in year 1 were 25% than in years 2-4 or in year 4 were 25% better than in years 1-3. This way, we can see what trends apply for _all_ players, not just those whose banner years happen to top 25% of the league average.

October 31, 2002 - Stephen Hobbes

I disagree with your assessment of the great players. 3 years at 149 followed by a year at 142, and you think this means they were lucky for three years and then we see the actual value of them? Isn't it more likely that they were unlucky the 4th year? I think it is most likely that after a few years of high performance they were just getting older and starting to decline slightly. Attributing this to luck has no factual basis.

I'm really not sure what to think about the rest of it either. The extremely small sample sizes leads me to believe the parameters need refined. There have to have been more "career" years than that to study. Where'd they go?

October 31, 2002 - Mikael (e-mail)

Interesting. Even though our group of stars were 49% above league average for 3 straight years, their next year averaged 42%. What this implies is that these players were probably slightly lucky for three straight years, and that their true talent level is actually 42% above average. These players retained 86% of their value, meaning that they regressed 14% towards the mean.

It seems likely to me that this sample is heavily weighted with players reaching the end of their peak. A few Hank Aarons will have more than three years at 150% of league average, but most players will reach that peak for a few seasons and then decline. I think you may have found an aging trend - a real reduction in skill - not a regression.

October 31, 2002 - Walt Davis

Sorry if I missed it, but was there any control for age in this? The "stars" decline in year 4 may well be an age effect.

What this implies is that these players were probably slightly lucky for three straight years, and that their true talent level is actually 42% above average.

Even though you used "implies" this strikes me as an overly strong and frankly illogical statement. I certainly can't think of any reason to look at 3 years of 1.49 and 1 year of 1.42 and come to the conclusion that 1.42 is our best estimate of their true level. If anything, it implies that they were unlucky in year #4 and that their true talent level is 49% above average. Though, as noted above, age and other factors are potential non-luck predictors.

I also think there may be some slight selection bias. In either the 4 or 5 year cycles, you show that those who have the banner year at the end of the cycle retain a little of that banner year. But due to your sample selection, the group whose banner year came at the beginning of the cycle didn't retain any of it in Year 2! In other words, this group consists of folks who never retained anything from their banner year in the next season, so it's not really surprising that they didn't retain anything 3-4 years later.

And I agree with the earlier poster that it probably makes more sense to measure "banner year" relative to the player. To me the question of retention is whether the player retained their improvoed performance. In the case of the ones who had the banner year at the end, we can compare to their pre-banner performance and see what they retained. But we can't do this with the folks who had the banner at the beginning of your period of ovservation. For example, what we know about those folks is:

133 102 102 101 102

and we conclude that they didn't retain anything. But perhaps with a fuller set of numbers like:

95 95 133 102 102 101 102

we'd see that they did retain some of their improvement.

I think what you really want to do is look at, say, the 3 years before and after a banner (or fluke) year. That would be better able to address what, if any, retention there is and how long it lasts.

October 31, 2002 - tangotiger (www)

Good comments, guys. I actually meant to address these two issues, and I'm glad you brought them up.

Age: definitely should be looked at, but I can tell you that there is no big age bias, even with the 149 group. I will do the breakdown, hopefully before the end of the day.

Banner selection: one of the considerations I had was that I did not want to select players that were say 110-110-110-140, because of the "regression towards the mean" issue I brought up. That is, even though you've got a guy who you think is a 110 level for three years, he might actually be 107 or 113, etc. The closer you are to 100, the more likely it is that this is a 100 player. Furthermore, by introducing all players, then I get into trouble with losing players. While it is unlikely that you will lose a player from the pool who is 100-100-100, it is very possible that you might lose a player who is 80-80, thereby introducing a bias. Of course, this also depends on position.

Having said that, I was thinking about running through the data anyway, and see what happens. And with the much larger sample size this would allow me, I can select 35% above "previous 3 years" to really highlight the banner years. I'll try to get to this next week.

October 31, 2002 - tangotiger (www)

Walt, good comment at the end, and this is exactly what I did with the HR study I linked to. And rather than seeing a "retention", we see essentially that what the player did the year after the banner year was repeated the year after that as well.

Again, what I am talking about is not "retention", even though I used that term. We are presuming that a player's performance level is a sample of his true talent level. Therefore, by selecting 130-100-100, I am choosing those players that had a great year followed by 2 average years. This does not imply that this player had an injury or something that forced him to go down to 100. The more years I tack on, the smaller my sample size. You are correct that I can simply show year1, select on years 2 through 4 (whether 100-100-130, or 130-100-100), and then look at year 5. My guess is that year 5 production will be only slightly different than year 1 production, age notwithstanding. This is a good idea, and I will run that next week as well.

Walt: any comment about the Hank Aaron issue?

October 31, 2002 - Walt Davis

For what you're doing with it, I don't think "the Hank Aaron issue" is an issue. If I understand what you did correctly, the primary issue with Hank Aaron is that he contributes numerous 3/4 year spells of "star" performance -- i.e. he appears multiple times in your data. This does create the problem that the observations are not independent of one another. However, the assumption that observations are independent is an assumption that relates to calculating standard errors and such, and is generally only a problem when you're performing statistical tests. But here you're only calculating means and not testing anything. (note, lack of independence almost always leads to larger standard errors than what you get by assuming independence, meaning the problem is usually one of finding more significance than there really is. Note also that these days most statistical packages have ways of performing appropriate tests.)

There are other potential complications I suppose. Hank Aaron may be an outlier of sorts. For example, if he contributes lots of 149 149 149 149 spells while others are going 149 149 149 132, he could be masking what is really a fairly sharp decline for every player of less caliber than Aaron. [note, banner years are essentially outliers too and the real question is always to what extent do you allow such observations to impact your analysis]

Or essentially the same thing from a different angle is that numerous Aaron spells may be pulling up the "true" level of the whole group.

Another comment on your comments: while it's true that a player that's close to 100 is more likely to be a true 100 player, I don't see why this is important. Doesn't it make the same sense just to look for players who were relatively stable for a 3 year period then saw their production jump by X% over the mean of the previous 3 years? I don't see why it should matter whether the pre-banner (or post-banner I suppose) performance be league (position?) average, I think the important characteristic is that it's stable and substantially lower than the peak.

October 31, 2002 - MGL

Excellent work!

You lost me a little on this one (you have a great writing style which generlaly makes evetything crystal clear. Either you deviated a litlle from such a style of I syddenly became thick. The latter is entirely possible. In any case, if you could explain the following (as if I were a 6-yo child).

"Let's put it altogether

Let's go back to our banner year groups. If we assume that 14% of the weight should be given to regression towards the mean, then we can use the following weights to predict next season's performance level:

24% - Year 1 24% - Year 2 38% - Year 3 14% - average

By using these weights, we can predict the Year 4 values that were produced by the last two studies. As a shorthand, you can say 1 part average, 2 parts each first two years, 3 parts third year."

(Also why the blue font? Or is that juts my browser? I can't seem to highlight, in order to "copy and paste" any of the blue text!)

For those of you who question why 3 149% years followed by a 142% year is "3 lucky" years and one "more indicative of talent" year, that is exactlty true! It's hard to explain why. I'll try. First of all, the 142% is exacty what we would expect after 3 years of 142%!

ANY TIME WE SAMPLE A PLAYER'S TALENT (1 YEAR, 2 YEARS, 5 YEARS) WE EXPECT THAT HIS TRUE TALENT LEVEL IS EXACTLY EQUAL TO THE SAMPLE MEAN (LWTS RATIO, OPS, OR WHATEVER) PLUS A REGRESSION TOWARDS THE MEAN OF THE POPULATION THAT PLAYER COMES FROM!

Without knowing the age, height, weight, etc, of that player, we have to assume that that player comes from a population of professional baseball players only. So the mean towards which our 3-year sample will regress is 100% (the mean normlalized lwts ratio of all players, which is by definition, 100%). And of course, the smaller the sample we have (say 2 years of 149%, as opposed to 3 years), the more we expect the next year to regress. Without doing the work, I KNOW that 2 years of 149% will be followed by something LESS than 142%.

Any given year, where we don't know a priori, what the value of that year is, is neither expected to be lucky or unlucky. That is the 142% year - simply a random (for players who already had 3 good years, of course) year whose value is unknown before we calculate it. Therefore, we neither expect it to be a lucky or an unlucky year.

The 3 149% years, ARE BY DEFINTION LUCKY YEARS!

October 31, 2002 - MGL

Excellent work!

You lost me a little on this one (you have a great writing style which generlaly makes evetything crystal clear. Either you deviated a litlle from such a style of I syddenly became thick. The latter is entirely possible. In any case, if you could explain the following (as if I were a 6-yo child).

"Let's put it altogether

Let's go back to our banner year groups. If we assume that 14% of the weight should be given to regression towards the mean, then we can use the following weights to predict next season's performance level:

24% - Year 1 24% - Year 2 38% - Year 3 14% - average

By using these weights, we can predict the Year 4 values that were produced by the last two studies. As a shorthand, you can say 1 part average, 2 parts each first two years, 3 parts third year."

(Also why the blue font? Or is that juts my browser? I can't seem to highlight, in order to "copy and paste" any of the blue text!)

For those of you who question why 3 149% years followed by a 142% year is "3 lucky" years and one "more indicative of talent" year, that is exactlty true! It's hard to explain why. I'll try. First of all, the 142% is exacty what we would expect after 3 years of 142%!

ANY TIME WE SAMPLE A PLAYER'S TALENT (1 YEAR, 2 YEARS, 5 YEARS) WE EXPECT THAT HIS TRUE TALENT LEVEL IS EXACTLY EQUAL TO THE SAMPLE MEAN (LWTS RATIO, OPS, OR WHATEVER) PLUS A REGRESSION TOWARDS THE MEAN OF THE POPULATION THAT PLAYER COMES FROM!

Without knowing the age, height, weight, etc, of that player, we have to assume that that player comes from a population of professional baseball players only. So the mean towards which our 3-year sample will regress is 100% (the mean normlalized lwts ratio of all players, which is by definition, 100%). And of course, the smaller the sample we have (say 2 years of 149%, as opposed to 3 years), the more we expect the next year to regress. Without doing the work, I KNOW that 2 years of 149% will be followed by something LESS than 142%.

Any given year, where we don't know a priori, what the value of that year is, is neither expected to be lucky or unlucky. That is the 142% year - simply a random (for players who already had 3 good years, of course) year whose value is unknown before we calculate it. Therefore, we neither expect it to be a lucky or an unlucky year.

The 3 149% years, ARE BY DEFINITION LUCKY YEARS! That is because we purposefuly chose players with 3 good years, relative to the league average. Any time we purposely choose goof or bad years, we are, again, by definition, choosing "good (or bad) AND lucky (or unlucky) years. That is why all good or bad years will regress towards more average ones!

Beleive it or not, now that we have 3 years of 149% followed by 1 year of 142%, we have a "new" sample talent of around 146.7 (ignoring the 1st 149% year, as it is gettin gold). In fact, let's bump up the 146.7 to 147 because fot he first 149% year. Even though this is our "new" talent sample for our player, it is still not our best "estimate" of his talent, i.e., our prediction for th enext year! We still think that all 4 years have been lucky (remember all samples above average are automatically lucky and all samples below average are automatically unlucky; how lucky or unlucky delends upon the size of the sample; here we have 4 years of above average performance - it is still lucky performance, but not by that much) so we think that our 147% projection is TOO HIGH! Again, without looking at the database, I can tell you that the 5th year will be something less than 147% - probably around 145% (maybe less, becuase we may start to have an age bias - th elonger the sample you look at, the more likely it is that you are looking at older players).

All that bein said, alrthough I don't think it is a problem either, I would address the age thing - either control for age or at least include it in your results.

I would also do some park adjusting or at least inlcude some park info in your results. I am a littel concerned that a banner year is weighed towards players who have changed home parks, so that, for example, banner year followed by 3 average years will tend to be hitters park followed by pitcher's park, so that in 5th year, the player is more likely to be in pitcher's park, whereas 3 average years followed by banner year suggests that the player is more likely to be in hitter's park for 5th year. This would "screw up" the weighting system...

October 31, 2002 - Walt Davis

In that linked HR analysis, tango wrote:

So, here is a question to ask some statisticians: Q - Is the fact that 15% of players play at the banner-level, and 15% play at the pre-banner level, in their post-banner years: can this be explained simply by random chance?

Others can follow the link to see the numbers. This is hard to say for sure just based on the data presented in the article, but we can make some guesses. For example, Hack Wilson's pre HR rate was essentially .064/AB. His peak rate was .096/AB. In the article you say that a good approximation of the post HR rate is 60% pre + 40% peak, which in this case gives us an expected post HR rate of .077. His actual post HR rate was about .04.

We can use the binomial distribution to find out how likely that is over, say, 2 seasons of 600 AB each, given these rates. There are some reasons why assuming the binomial distribution is not ideal, but it's likely a good enough approximation.

For fairly large samples, say 100 or greater (especially with small %ages at play), this can be done fairly easily by hand (well, calculator). Assuming .077 is the "true" rate and we have 1200 AB:

92.4 = .077*1200 = mean 85.3 = .077*(1-.077)*600 = variance 9.2 = standard deviation = sqrt (variance)

A rough 95% confidence interval of the expected # of "successes" is the mean +/- 2*sd. OK, technically it should be more like 1.96. Anyway, in this case, assuming a true HR rate of .077 and 1200 AB, we'd expect Hack to hit 74 to 110.8 HR. He actually hit 48. So we can safely say that Hack his significantly fewer HR than expected.

Even by "rough" standards, that still doesn't fully answer your question. If we'd hypothesized beforehand that Hack would underperform, the above would be the appropriate statistical test. But we didn't. And regardless, by using a 95% CI (or a 5% alpha), we've essentially said we're willing to accept a 5% error rate. In other words, even by random chance, there's a 5% chance that a hitter with a true HR rate of .077 will hit fewer than 74 or more than 110.8 HR in 1200 AB.

To guesstimate this, for 116 hitters in your sample, we'd expect about 6 of them to exceed their 95% CI. Note I said _their_ 95% CI, as each hitter would have a different expected post HR rate and slightly different standard deviation. You'd perform the above test for every hitter, find out how many exceeded their 95% CI, then see whether this number was substantially bigger than 6. If so, that's pretty good evidence that something non-random is also happening.

Another, I suppose easier, way to approximate this is to regress the actual rates on the expected rates. Look at the mean squared error. If the remainder is purely random, then the MSE should be about the same as the above standard deviation divided by 1200. Probably better if you calculate the standard deviation using the mean expected post HR rate but the differences will be trivial.

October 31, 2002 - tangotiger (www)

MGL, yes, I agree with almost everything you said. Two points:

1 - yes, not my best writing work, as I wrote it in 30 minutes, but what is it that is unclear? Was it the weighting thing at the end? It basically means that you put more weight on the most recent year, and you have some weight to regress towards the mean. Or was it something else?

2 - As for the 149,149,149, which I selected for, the 4th year was 142. However, you then say that this group is actually a "147" group . Not! Because my group is "fairly large", then I would say that this selected group of 149,149,149 is a 142 player. And, if I looked at the 5th year, I would bet that this group would also exhibit 142. I would also bet that the year prior to 149 would also be 142. I would say *every* year around the 149 years would be 142. Do you agree? (Age of course is an issue if I start going crazy and start to consider pre-24 and post-36, etc years.)

However, for a single player, if I had a 149,149,149,142 player, since I didn't select such a player, then I would have to guess that he is a 147 player.

I think we are on the same page, but I'm not sure.

*** As for parks and changing teams, etc, yes that is always a problem. It's "possible" that the park may play an influence in the selection of my players, but I doubt it. The banner year was 25% above the base years, and so, while playing at Coors does increase the chances that he will be selected in the banner year, I don't think this is the case. I'll look into it though.

*** By the way, the more I look into this, this is just like MGL's hot/cold streak study. While he is looking at 15-day periods, I am looking at 3-year periods. We are (or will in my case) looking at the pre-selected and post-selected period, and we are (or might/will in my case) finding that those two values pretty much match, regardless of the intervening period.

October 31, 2002 - Stephen Hobbes

MGL - The flaw in your argument is with us knowing the population. All the baseball players may regress to 100%, but the elite players, the ones who compose the 149 group, may only regress to 130% for instance, compared to the average. I think it's a mistake to consider their base talent level the same way. I'm not clear that 142 becomes the "true" assessment after 3 149's, (especially when this is all we have). Also, since "true" talent values vary constantly, and for most people not named Bonds decreases as they age into their 30's, while he may be a 142 at the end, couldn't he have been a 155 having slightly off years for awhile?

I have to agree with RP, that comparing the "banner" year to the league rather than to the player himself is a mistake. If you have an 80, who suddenly sports a 120, wouldn't that be a banner year? Also, the same thing applies north of the border. Is a 130 player, always having a banner year, or only when he spikes to 160?

I think comparing to the player rather than the league will also help alleviate the 142 as "true" issue.

October 31, 2002 - tangotiger (www)

First off, I'm not trying to capture ALL banner years, just some of them. As well, I am not suggesting at all that 149,149,149 is banner performance. I am using that type of player to show that a 149,149,149 player is not in fact a 149 player but a 142 player.

So, when you look at a 130,100,100 player, a player that certainly had a banner year, we should treat the 130 with some hesitation, since, as we've seen, this performance was "lucky" in some respect.

****

Anyway, I've re-run, so that we have "x", 149,149,149, "y". That is, how did the players with 3 great years do just before the "banner 3 years", and just after? Here are the results

Year 1 1.42 Year 2 1.49 Year 3 1.49 Year 4 1.49 Year 5 1.41

This population of players had 593 strings.

Now, if we break it down by age (in Year 5), this is what we get: Age 1 2 3 4 5 n

34+ 1.46 1.51 1.51 1.48 1.37 173 30-33 1.44 1.49 1.49 1.47 1.41 229 29- 1.36 1.46 1.49 1.51 1.45 191

Again, as you can see, the "3 selected years", were pretty constant around that 149 level. The before/after years are consistent with the age grouping. But, in all cases, the before year was less than the selected period, even for the old guys.

There is also about an annualized 2% change in performance level between year 1 and year 5, which is also consistent with my findings in aging patterns previously done.

So, the "true talent level" is year 1 and year 5, and everything in-between is "lucky".

October 31, 2002 - Kenny

I hope you guys can indulge a stats novice for a minute.

It's been a few years since I took true score theory, but from what I remember, outcomes are a function of true score plus measurement error. So, in other words, in some book in heaven somewhere, it may be written that Barry Bonds is a .320 hitter. Everything else represents measurement error.

I think I understand regression to the mean. If the average baseball player hits .280, then we would expect Barry Bonds to follow a "true score" season with something less than .320. If I were a betting man, I would go with that.

So if Barry Bonds hits .320 for three years in a row, his failure to regress represents luck. But why does that mean that his true score is not .320? Why can't Barry just be a lucky player who happened to hit at his true score level for three straight years?

Thanks.

October 31, 2002 - MGL

Stephen, there is no falw in my argument! All players (as a group, NOT every single player - heck some 149/149/149 are actually 155 players!) will regress towards the mean, because...

Some of those players are actually 150 (let's ignore the exactly 149 players) or better players and got UNLUCKY, and some of those players are less than 149 players and got lucky. Those are the only two choices. The chances that any given player is less than a 149 player and got lucky is MUCH higher than the chances that he is better thana 149 player and got unlucky, sinply because there are many, many more sub-149 players.

The chances that a random 149/149/149 player if actually a sub-149 player who got lucky as compared to an above-149 player who got unlucky goes down as the sample size of the 149 group goes up (for example 149/149/149/149). However the upper limit (when the sample size is infinite) of the real talent of a sample group of players who have a sample performance of 149 is 149! It can never be higher and it can never be exactlty 149. It must be lower! That is why all all samples of performance above or below average will ALWAYS regress towards the mean of the population they come from. This is not my argument or opinion. It is a mathematical certainty, based on the "upper limit theorem" I describe above.

The only caveat is the definition of the "population they come from". In Tango's study, he looked at ALL players. Any player who had a 149/149/149 period qualified. Yes, this group is comprised of mostly good players (140, 135, 155, etc.), but they still come from a population of all ML players (technically all ML players who have played full time for 3 consecutive years, which is probnably an above average population, so they will not regress towards 100%, but maybe 110%). Now if we only look at first baseman or players over 6 feet tall, then the number towards which we regress will change...

David, your writing is excellent (isn't that what I said?). I just got hung up on the part I quoted in my last post. Could you re-explain that part please? As I also said, it may just be me being thick. Why did you and DS claim that I said that you didn't write the article well? That's an example of only telling part of the story and thereby disttorting the truth (like politicians and commentators do all the time in order to prove a point). I know youi guys didn't intentionally do that, but it is a bugaboo of mine...

October 31, 2002 - Brad Wenban

Kenny,

A player with 3 straight .320 seasons could be a .320 hitter in actuality. In fact, he could be a .350 hitter that got unlucky.

We know that baseball talent roughly follows the normal distribution. One thing this means is that (thinking graphically) for any player who has a true talent of distance X from the mean, there will always be a greater number of players within + or -X of that player on the side closest to the mean.

In simpler terms, its always more likely that Bonds is more average than his numbers suggest because there are many more "truly" average players out there than there are "truly" great players.

So we can't say that Bonds (to use your your example) is definitely not a "true" <.320 hitter. What we can say is that there is a greater chance that Bonds is a <.320 hitter than that he is an = or >.320 hitter.

October 31, 2002 - tangotiger (www)

MGL, sorry for the bugaboo.

To go back to your question, let me amplify. The 149 performance is regressed 14% towards the mean to match the "expected probable" true talent level of 142. So, generally speaking, we should regress all 3 year performances by 14%.

Now, of the three remaining components (year x, year x-1, year x-2), we weight the most recent seasons (x) as 38%, and the other two as 24% equally.

As a shorthand, rather than remembering kooky percentages, you can apply integer weights of "3" for "x", "2" for "x-1","x-2", and "1" for "mean". Maybe I should have skipped this part, as it's probably more confusing than it should be.

November 1, 2002 - Stephen Hobbes

MGL - Well of course you're correct, the idea of regression to the mean was not what I was questioning. I probably did a poor job explaining it, but what I was getting at was what you mentioned in your caveat. Once the smaller "elite" group was taken out of the main, I don't feel that it is accurate to then compare them to the same 100% level. The mean for the elite group is going to be much higher, but I don't know the actual numbers. My point is that then you have a completely different percentage. While 7 out of 49 is about 14%, if the elite mean was 120, then you have 7 out of 29 for that grouping, which is 24%. Much different, but also more accurate for this caliber of player.

To take it one step further, how do we know that the elite group's mean isn't actually 160? Total baseball population may have a 100 mean, but within this group, based upon the info provided, all we really know is that the elite group is not going to be 100 (well, at least it shouldn't be). IF, and that's a big hypothetical if, that was the case (a mean of 160), then we would actually expect the next season to be higher to approach the mean right? I'm not saying this is what happened, just that it is a theoretical possibility based upon the information provided.

I guess I'm really just not happy with the idea of comparing an individual's season to the league when considering "his" banner season. At least not totally. I guess it is needed for the context of the "banner" season (1987 for example probably had quite a few banner years), but if you're trying to gauge a player's talent level, as far as rising and falling, it really can only be gauged against the player, with the league averages being used only as a modifier. If the league did 3% better the second year, then take his expected performance plus 3 and compare. That way you are actually seeing the talent level of the player, since any league effects have been removed. If the player jumps 30 points, but the league jumps 10, then he's up a net 20. If the league drops 10 though, then he's really probably just using a corked bat or similar. Then again, this was really a fairly narrow analysis. Lets take fairly average players that had one very good year out of 4, and prove that they really are basically average players. Nobody, except apparently some GM's, really fails to understand that principle. It is always nice to see something that is expected to be a certain way proven out though.

I think I just wanted this analysis to be more than it was, and then wasn't happy when it wasn't. Fool that I am.

November 1, 2002 - MGL

[i]I hope you guys can indulge a stats novice for a minute.

It's been a few years since I took true score theory, but from what I remember, outcomes are a function of true score plus measurement error. So, in other words, in some book in heaven somewhere, it may be written that Barry Bonds is a .320 hitter. Everything else represents measurement error. I think I understand regression to the mean. If the average baseball player hits .280, then we would expect Barry Bonds to follow a "true score" season with something less than .320. If I were a betting man, I would go with that.[/i]

You actualy have a nice handle on what's going on! Basically any player who hits better or worse than average over any time period of time is "expected" to hit closer to the mean than his sample hitting indicates. It's as simple as that. It is not conjecture. It is a mathematical certainty. It is a fundamental aspect of sampling theory. I think you completely understand how that works.

Given that we have a sample of a player's hitting (1 year, 5 years, whatever), that sample number is ALWAYS the "limit" of our "best estimate of his true talent" which is, of course, the same as his projection. For example, if Bonds' sample BA is .320, that is the "limit" of his true BA. Now the only thing left is to determine if a player's sample performance, like Bonds' .320 BA, is the upper or lower limit of his true level of perforamnce. The way we do that is simple, once we know the mean performance of the population that our player was selected from. If that mean is less than the player's sample performance than the sample performance is the upper limit of his true talent. If it is greater, then his sample performance is the lower limit of his true talent. In practice, it is usually easy to guess whether that mean is greater or less than a player's sample performance. In some cases, however, it it is not so easy.

For Bonds, if his sample BA is .320 we are pretty sure that no matter what, the mean BA of the popualtion that he comes from is less than that, so we estimate his true BA at something less than .320. That doesn't mean that we KNOW or that it is 100% certain that his true BA is less than .320. That's where a lot of people are making a mistake. There is a finite chance that he is a true .320 hitter, a true .330 hitter, or even a true .250 hitter who has been enormously lucky. All these things have a finite chance of being true. It's just that if you add up all the various true BA's times the chances of their occurrence, sampling theory tells you that you get a number which is closer to the population mean than his true BA. How much closer is completely a function of how large your sample of performance is and nothing else.

The other tricky part that gets people in trouble is "What IS the population of players that a particular player comes from and what is the mean of that population?" After all, that is an important number since that is the number that we need to regress to. Finding out or estimating that number can be tricky sometimes. If we pick a player from a list of all players without regard to anything other than he has a high or low BA, or whatever we happent obe looking for, then we know that the population is ALL BATTERS. It doesn't matter that we are picking a player who has a high or low BA deliberately. There is no "selection bias" as far as the population and its mean is concerned. Remember no matter what criteris we use to choose a player, the population that that player belongs to for purposes of estimating a performance mean that we will regress to, is the group of players that we are slecting FROM, NOT the group of players that we think that our player belongs to (good hitters for example)! If we pick a .320 player from a list of all ML players (or some random sample of all NL players), then that player comes from a population of ALL players and hence the population mean that we regress to is the mean of all ML players.

Now if we find out something else about that player we chose, then all of a sudden we have a differnent population of players and we have to choose a differnent mean BA, which is not all that easy sometimes. For example, if we find out if that player is a RF'er then all of a sudden we have a player who comes from the popualtion of ML RF'ers and NOT all ML players. Obviously the mean BA of all ML RF'ers is different than that of ALL ML players. Same thing if we find out if our player is tall or heavy or LH or RH or fast or slow, etc.

Anyway, for the umpteenth time, that's regression to the mean with regard to baseball players, in a nutshell, for whatever it's worth...

[i]So if Barry Bonds hits .320 for three years in a row, his failure to regress represents luck. But why does that mean that his true score is not .320? Why can't Barry just be a lucky player who happened to hit at his true score level for three straight years?[/i]

See the explantion above. Yes he could be a true .320 player, just like he could be a true .350 player or .280 player. It's just that the best mathematical estimate of his true BA is NOT .320, it is something less, depending upon the mean BA of his population (big, possibly steroid laden, black, LH, RF'er who has a very nice looking swing, has a great reputation, a talented father, etc.) and how many AB's the .320 sample represents...

Whew!

November 1, 2002 - tangotiger (www)

Good job, MGL!

The mean of the players who played in those 5 year spans, with at least 300 PA is 115%. Now, this may sound like alot, but don't forget, we have alot of repeating players in there (like Aaron).

I don't think that the regress towards the mean would regress to 115%, but I'd like to hear from the statistics-oriented fellows about their thoughts on this matter. I would guess at this point that the Aaron situation comes up, and I should identify unique players only.

November 1, 2002 - tangotiger (www)

Since age is an issue, and I can easily control for it, I will re-run using that.

As well, the "mean" of the players is 115%. If we look only at one age group for the 5 year period (say ages 26-30), we see that each year they average 115%. If we select any other time period like 24-28, you also get similar results. And of course, no player could possible exist more than once in each age group. Therefore, the mean is 115%.

Therefore, I should probably select players that center around 115%, and that center around the 27 age group. I'll get to this next week.

November 1, 2002 - MGL

Tango, why don't you think that a sample mean of any player you choose who has played 5 years with > 299 AB per year will not regress towards 155%? It will, if that 155% is the mean of the population of all such players. Now, in order to get (estimate) the mean of that population, you cannot weight player's numbers. For example, you cannot have more than one Aaron in your group. The population mean must come from a random, non-weighted sample of all players who have played at least 5 years with 300 or more AB per year (or whatever your criteria was). So you must find all players who fit that description and give each player equal weight even though some of those players (Aaron for example) may have many such 5-year spans.

What numbers (let's just call in BA) do you use for those players who appear more than once (have more than one 5-yrar span with 300 or nore AB's each year)? You would take the average BA over all years for that player, as that would represent the best estimate of that player's true BA for any 5-year period...

November 1, 2002 - MGL

BTW, I contradicted myself (in a subtle way) in my second to last post. I said that in order to determine what specific population a players comes from, we look at the "list of players" that we selcted our player FROM. Then I went on to say that if we found out afterwards that a player was tall, we would change our population (from ALL ML players to ALL tall ML players). This appears to be a contradiction, which it is.

Whay I meant was that we can use any characteristic we know about our player (either before or after we chose him) to define or estimate the population of players he comes from. We cannot, however, use his BA (or whatever it is we are trying to regress) to determine what population he comes from (for example, if it is .320, we cannot say "Oh, he must be from a population of good hitters), becuase that is what we are trying to ascertain in the first place (the chances that he IS a good hitter versus the chances that he is a bad or average hitter who got lucky, etc.). It's sort of analagous to the dependent and independent variables in a regression analysis. The characteristics of the player we are regressing (like height, weight, position, etc.) are all the "independent" variables and his BA (or whatever number we are trying to regress) is the dependent variable. The "independent" variables determine the population that he is from for pusposes of figuring out what number we should regress to (the mean BA of that population), while the dependent variable (the sample BA of that players) CANNOT be used to make any inferences about that population (for purposes of establishing a BA to regress to)...

November 1, 2002 - tangotiger (www)

MGL, maybe you missed my last post, but if I only look at one 5-yr period, say ages 24-28, then of course Aaron can only exist once in this string. And, the players in this group are 115% of league average. Now, if I select some other age group, the unique players in that group are also 115%.

However, if I decide to combine the two groups, I might have two Aarons, and two Ruths, etc. I don't see why I would want to remove one of them from the groups.

I think it would be easier to keep all the age groups separate (24-28, 25-29, 26-30, etc, etc) and report on each one separately. This removes the conflicting players, but addresses the Aaron issue. However, I don't see the problem in then combining these three age groups afterwards, AND KEEPING the mean at 115%.

Or maybe I'm missing something?

November 2, 2002 - Contrarian

If anyone who is new to B' has stumbled across this discussion, please ignore it. tangotiger and MGL are the too most prolific statistical ignoramuses you may ever encounter. Note in particular how Walt Davis tried to introduce actual math to this thread, only to be brushed off by those who don't care to comprehend anythiing more complicate than linear algebra.

November 2, 2002 - MGL

As far as I know, and I am no math or statistics maven (maybe slightly ahead of an ignoramus but something short of a sciolist), linear algebra is an advanced, college and graduate level, field of mathematics. So anyone who comprehends nothing more than linear algebra is indeed more advanced than I...

November 2, 2002 - David Smyth

Hey mgl, I took linear algebra in high school (advanced placement, etc.). The only problem is that I never took a statistics course :).

The poster Contrarian mentioned Walt Davis. We can all see by his posts that Walt is very bright and apparently has very good statistical training. But where are his articles? Where are his original saber studies? MGL and Tango are not PhD mathematicians, but at least they are out there, taking the lead in this field (along, of course, with others).

November 2, 2002 - MGL

Actually I wanted to add one more thing about "regression" as it relates to projecting talent in bsseball, assuming of course, that not EVERYONE is ignoring this thread now that the cat's out of the bag (that Tango and I are ignoramuses when it comes to statistics).

While the mean of a population from which a player comes determines the upper or lower limit of his true BA (from now on, when I use BA, it is simply a convenient proxy for any metric which measures talent), it isn't that useful in terms of knowing how much to regress a player's sample BA in order to estimate his true BA. In fact, it isn't necessary at all. Nor does the size of the player's sample BA tell us how much to regress, UNLESS AND UNTIL WE KNOW OTHER CHARACTERISTICS OF THE POULATION.

What I mean by that is that there are actually 2 things that tell us exactly how much to regress a player's sample BA to determine the best estimate of his true BA, and one of them is not the mean BA of the population to which the player belongs.

One of those things IS the size of the sample BA (1 year, 4 years, etc.). The other is the DISTRIBUTION OF THE TRUE BA'S OF ALL PLAYERS IN THE POPULATION.

Once we know those 2 things, we can use a precise mathematical formula (it isn't linear algebra, I don't think) to come up with an exact number whihc is the best estimate for that player's true BA.

Let's back up a little. In normal sampling statistics, a player's BA over some time period would be sample of his own true BA and our best estimate of that player's true BA would be exactly his sample BA. So if player A had a .380 average during one month and that's all we knew about this player, regular sampling theory would say that our best estimate of his true BA was .380 and we could use the number of AB's that .320 was based on (the sample size) to determine how "confident" we were that the .320 WAS in fact his real BA, using the standard deviation of BA, which we can compute using a binomial model, etc., etc. Most of you know that.

Now here is where we sort of veer away from a normal "experiment" in sampling statistics, when it comes to baseball players and their talent. We KNOW something about the population of all baseball players, which means, both mathematically and logically, that the .320 sample BA in one month (say 100 AB's) is not necessarily a good estimate of that player's true BA. We know logically that if a player hits .380 in a month that he is NOT a .380 hitter. The only reason we know that, however, is because we know that there is either no such thing as a .380 hitter or at least that a .380 hitter is very rare. If in fact we knew nothing about the range of what ML baseball players usually hit, we would then HAVE TO say that our player was a .380 hitter (within a certain confident interval, which would be around plus or minus 90 points to be 95% confident as the SD for BA in 100 AB's is around 45 points).

So now the question, as always, is, given that our player hit .320 in 100 AB's and given that we KNOW that players rarely if ever have true BA's of .380, what IS the best estimate of our player's true BA (still within the various confidence intervals)?

Let's say the mean BA of the population of ML baseball players (for the same year as our .380 sample) is .270. According to my other posts, that is the number that we regress the .380 towards, and the number of AB's the .380 is based on (100 ) determines how much we regress. Well, the first part is always true (the .270 is the lower limit of our player's true BA), but the second part is only true given a certain set of characteristics of the population of baseball players. IOW, it is these characteristics that FIRST determine how much we regress the .380 toeards the .270. Once we establish those characteristics, the more sample AB's we have, the more we regress.

What are those characteristics we need to determine before we can figure out how much to regress the .380 towards the .270? It is the percentage of batters in the population (ALL ML players in this case, since we know nothing about our .380 hitter other than he is a ML player) who have various true BA's. IOW, we need to know how many ML players are true .210 hitters, how many are true .230 hitter, true .320 hitters. etc. Obviously, there is a whole continuum of true BA's among ML players, but it would suffice for this kind of analysis if we estimated the number of players in each range. Now, estimating the number of players in baseball for each range of true B'A's is not easy to do and is a little curcuitous as well. The only wayt to do that is to look historically at players who have had a long career and assume that their lifetime BA is is their true BA. Of course, even that lifetime BA would have to be regressed in order to get their true BA, so that's where the "curcuitous logic" comes from - "in order to know how much to regress a sample BA, we have to find out the true BA's of ML players and in order to find out those true BA's we have to know how much to regress a player's lifetime BA..."

We have other problems in terms of trying to figure out hoe many players in ML baseball have true BA's of x. For example, not many players who have true BA's of .210 have long careers, so if we only loked at long careers to establish our percentages, we might miss some types of players (those with very low true BA's). In any casze, let's assume that we can cone up with a fairly good table of frequencies for the true BA's of all ML players. It might look something like <.200 (.1%), .200-220 (1%), .220-.230 (3%),..., .300-.320 (2%), etc.

NOW we can use Baysean (lower on the total pole than linear algebra) probability to figure our .380 player's true BA! The way we do that goes something like this:

What are the chances that our player is a true .200-.220 hitter (1% if we know nothiong else about this hitter other than he is a ML player) GIVEN THAT

November 2, 2002 - tangotiger (www)

Contrarian: I've already admited my shortcomings in many areas, including statistics. I've taken enough that I can follow conversations, but that's as far as I would take it. I also know enough to apply the basics. This is no news to people who've been reading me, and any of my comments should be taken like that.

I am always interested to hear from Walt Davis, and frankly I just missed his second post (the way Primer regenerates the site, there is a lag, and Walt's post got sandwiched in-between).

I have no problems with people criticizing my approach, or my comments, or anything I do. It would be nice though if you would provide an email address so that we can correspond privately, and you can elaborate further.

November 2, 2002 - MGL

Actually I wanted to add one more thing about "regression" as it relates to projecting talent in bsseball, assuming of course, that not EVERYONE is ignoring this thread now that the cat's out of the bag (that Tango and I are ignoramuses when it comes to statistics).

While the mean of a population from which a player comes determines the upper or lower limit of his true BA (from now on, when I use BA, it is simply a convenient proxy for any metric which measures talent), it isn't that useful in terms of knowing how much to regress a player's sample BA in order to estimate his true BA. In fact, it isn't necessary at all. Nor does the size of the player's sample BA tell us how much to regress, UNLESS AND UNTIL WE KNOW OTHER CHARACTERISTICS OF THE POULATION.

What I mean by that is that there are actually 2 things that tell us exactly how much to regress a player's sample BA to determine the best estimate of his true BA, and one of them is not the mean BA of the population to which the player belongs.

One of those things IS the size of the sample BA (1 year, 4 years, etc.). The other is the DISTRIBUTION OF THE TRUE BA'S OF ALL PLAYERS IN THE POPULATION.

Once we know those 2 things, we can use a precise mathematical formula (it isn't linear algebra, I don't think) to come up with an exact number whihc is the best estimate for that player's true BA.

Let's back up a little. In normal sampling statistics, a player's BA over some time period would be sample of his own true BA and our best estimate of that player's true BA would be exactly his sample BA. So if player A had a .380 average during one month and that's all we knew about this player, regular sampling theory would say that our best estimate of his true BA was .380 and we could use the number of AB's that .320 was based on (the sample size) to determine how "confident" we were that the .320 WAS in fact his real BA, using the standard deviation of BA, which we can compute using a binomial model, etc., etc. Most of you know that.

Now here is where we sort of veer away from a normal "experiment" in sampling statistics, when it comes to baseball players and their talent. We KNOW something about the population of all baseball players, which means, both mathematically and logically, that the .320 sample BA in one month (say 100 AB's) is not necessarily a good estimate of that player's true BA. We know logically that if a player hits .380 in a month that he is NOT a .380 hitter. The only reason we know that, however, is because we know that there is either no such thing as a .380 hitter or at least that a .380 hitter is very rare. If in fact we knew nothing about the range of what ML baseball players usually hit, we would then HAVE TO say that our player was a .380 hitter (within a certain confident interval, which would be around plus or minus 90 points to be 95% confident as the SD for BA in 100 AB's is around 45 points).

So now the question, as always, is, given that our player hit .320 in 100 AB's and given that we KNOW that players rarely if ever have true BA's of .380, what IS the best estimate of our player's true BA (still within the various confidence intervals)?

Let's say the mean BA of the population of ML baseball players (for the same year as our .380 sample) is .270. According to my other posts, that is the number that we regress the .380 towards, and the number of AB's the .380 is based on (100 ) determines how much we regress. Well, the first part is always true (the .270 is the lower limit of our player's true BA), but the second part is only true given a certain set of characteristics of the population of baseball players. IOW, it is these characteristics that FIRST determine how much we regress the .380 toeards the .270. Once we establish those characteristics, the more sample AB's we have, the more we regress.

What are those characteristics we need to determine before we can figure out how much to regress the .380 towards the .270? It is the percentage of batters in the population (ALL ML players in this case, since we know nothing about our .380 hitter other than he is a ML player) who have various true BA's. IOW, we need to know how many ML players are true .210 hitters, how many are true .230 hitter, true .320 hitters. etc. Obviously, there is a whole continuum of true BA's among ML players, but it would suffice for this kind of analysis if we estimated the number of players in each range. Now, estimating the number of players in baseball for each range of true B'A's is not easy to do and is a little curcuitous as well. The only wayt to do that is to look historically at players who have had a long career and assume that their lifetime BA is is their true BA. Of course, even that lifetime BA would have to be regressed in order to get their true BA, so that's where the "curcuitous logic" comes from - "in order to know how much to regress a sample BA, we have to find out the true BA's of ML players and in order to find out those true BA's we have to know how much to regress a player's lifetime BA..."

We have other problems in terms of trying to figure out hoe many players in ML baseball have true BA's of x. For example, not many players who have true BA's of .210 have long careers, so if we only loked at long careers to establish our percentages, we might miss some types of players (those with very low true BA's). In any casze, let's assume that we can cone up with a fairly good table of frequencies for the true BA's of all ML players. It might look something like <.200 (.1%), .200-220 (1%), .220-.230 (3%),..., .300-.320 (2%), etc.

NOW we can use Baysean (lower on the total pole than linear algebra) probability to figure our .380 player's true BA! The way we do that goes something like this:

What are the chances that our player is a true .200-.220 hitter (1% if we know nothiong else about this hitter other than he is a ML player) GIVEN THAT he hit .380 in 100 AB's (much less than 1% of course)? What are the chances that he is a .300-.320 hitter given that he hit .380, etc (more than 2% of course)?...

Do all the multiplication and addition (arithmetic, MUCH lower than linear algebra) and voila we come up with an exact number (true BA) for our .380 hitter (which still has around a 90 point either way 95% confident interval).

Remember that the mean BA doesn't tell us anything about how much to regress or what the final estimate of the true BA of our .380 hitter is; it only tells us the limit of the regression, and in fact, we don't even need to know that number, as in the calculations above. For example, let's say that the mean BA for all ML players were .270, as in the above, bu that all ML players ahd the same true BA. The true BA for our .380 hitter or ANY hitter with any sample BA in any number of AB's would be .270. Let's say that 1% of all ML players had true BA's of .380 and 99% had true BA's of .290. What would our .380 player's true BA be?

It is either .380 or .290, so it's not really a "fair" question. We could answer it in 2 ways. One would be that "there is an X percent chance that he is a .290 hitter (who got lucky in 100 AB's) and a Y percent chance that he is a .380 hitter (who hit what he was "supposed to" in 100 AB's). The other answer is that he is a .zzz hitter, where the .zzz is X percent times .270 plus Y percent times .380, divided by 100. Here's how we would do that calculation:

The chances of a .290 hitter hitting .380 or better is .025 (.380 is 2 standard deviations above .290 for 100 AB's). The chances of a .380 hitter hitting .380 or better is .5. So if we didn't know anything about the frequencies of .290 or .380 hitters in our population, our player is 20 times more likely to be a .380 hitter than a .290 hitter (.5/.025), or he has a 95.24% chance of being a .380 hitter. But since 99% of all players are .290 hitters and only 1% are .380 hitters, we now have the chances that our player is a .380 hitter at 20%, rather than the initial 1%. So we can say that our hitter has a 17% chance of being a .380 hitter and an 83% chance of being a .290 hitter or we can say that our hitter is a .305 hitter. We get the 20% chance of our hitter being a .380 hitter by the following Bayesian formula: The ratio of the chance of being a .290 hitter who hit .380 or better (.99 times .025 or .02475) to the chance of being a .380 hitter who hit .380 or better (.01 times .5 or .005), is 4.95 to 1. That means that it is 4.95 more likely that our .380 hitter is a true .290 hitter who got lucky, so the chances of our hitter being a .290 hitter is .8319, and hence .1681 for being a .380 hitter.

That same above Bayesian calculation would apply for any number of categories of true BA's in the population and the percentage of players in each category.

Now, given the difficulty in determining the categories and frequencies for true BA's in the population of ML baseball players and given the cumbersome nature of the ensuing Bayesian calculations, we can forgoe all of that by using a linear regression formula to approximate the same results. If we used a single regression formula for say the above example (a player who hits .380 in 100 AB's), we would take a bunch of data points constituting all players with a certain BA in 100 AB's (our independent variable) and regress this on those same players' BA for the next year or preferably multiple years. As usual, this will yield two coefficients, A and B in our y-Ax+B linear equation, and B will be colse to the mean BA of all baseball players (actualy the mean BA in our sample group we are using in the regression analysis). Remember that these coefficients will only work for 100 AB's. If we want to do the sem thing for a player with 500 AB's, we have to do a new regression analysis and derive a new equation, OR we can do a multiple regression analysis where number of AB's is one of the independent variables. Unfortunately, due to my status as a statistics ignoramus, I don't know wether there is a linear relationship if we include # of AB's (I don't think there is), in which case you would have to do a non-linear analysis, which is beyond my abilities...

November 3, 2002 - MGL

In case there is anyone on the planet still reading this thread, the 4th sentence in the third paragraph from the bottom should read (among lots of other spelling and grammar errors):

But since 99% of all players are .290 hitters and only 1% are .380 hitters, we now have the chances that our player is a .380 hitter at 17% (NOT 20%), rather than the initial 1%.

November 3, 2002 - Sancho Gamwich (e-mail)

This is interesting stuff, but there's similar analysis already done and out there, such as here:

http://www.economics.pomona.edu/GarySmith/BBregress/baseball.html

Another excellent resource is at:

http://personal.bgsu.edu/~albert/tsub/datasets.htm

SG

November 3, 2002 - Sancho Gamwich

Sorry - that last address should be:

http://personal.bgsu.edu/~albert/

SG

November 4, 2002 - tangotiger (www)

Sancho, thanks for the links. The first one I had not seen, and is a not bad one. As for Albert, I'm frankly disappointed. There's a long list of math professors who have tackled baseball issues, and really either miss something, or write so dry that I miss something. (Of course, there's an even longer list of sabermetricians who miss some math issues as well.)

November 4, 2002 - Mike Green (e-mail)

What about the effect of contract status? Have studies been done of performance outside of normal variations in the last year of a contract with impending free agency/arbitration? I can think of a number of players who had their banner year as they entered free agency.

November 5, 2002 - Shaun

Intuitively it seems to make since that a player who had a banner year in his fourth year would have a better fifth year than a player who had a banner year followed by three average years. It seems like the player who broke out during the fourth year learned something from the previous three years. While it seems like a player who has a banner year followed by three average years had a fluke season early on but opponents made adjustments and he couldn't counter-adjust. I think you have to take age into consideration when doing a study like this.

November 5, 2002 - tangotiger (www) (e-mail)

Shaun, I agree age should be taken into account, and I'm currently working on this. I should have something to show as soon as I get the time (which these days is not too much).

As for contract status, certainly this would have an impact. However, by having an aggregate of players, this impact should not be noticeable too much. And of course, since my data is from 1919 onwards, there's an even smaller population which would even be affected by this at all.

As for learning and improving, etc. This is the issue. Is it the case that the player is learning and improving, or is it simply random chance that the player happens to have a banner year. Hopefully, with the new data I have, we'll have a better answer.

November 6, 2002 - F James

I think there is an important point being missed here. The original question you posed was: Given the set of all players who hit at least 25% better than league average for 3 years in a row, what was their relative performance in Year 4? That was what led to the 149, 149, 149, 142 sequence. You then assert that 142, not 149, represents that groups real ability, and that you can expect a typical player in the group to regress toward the mean by 7/49 = 14%. But what is really going on here? I strongly suspect that most of the apparent regression is caused not by most players falling by moderate amounts but by a few players falling by big amounts. Remember, a player had to be at least 25% better than average 3 years running to make the cut, but there is no such requirement for Year 4. It would only take a few players falling below the 25% threshhold to entirely explain the 7 point drop in the average.

To illustrate, suppose we started with 5 players whose scores in each of the 1st 3 years were distributed as follows:

135, 142, 149, 156, 163. The average is 149.

Now suppose the player who scored 135 before has a really bad year and falls to 100 while the other 4 stay the same. The new average is 142, implying a 7 point regression for the group. But in reality it was caused by the simple fact that you no longer required performance 25% better than average.

To see how much of the "regression" is real and how much is due to the relaxation of the selection criterion segregate your data base into 2 groups: those that met the +25% criterion in all years and those that did not.

November 6, 2002 - MGL

Brother of Jessie,

Sorry but your argument is mathematically (statistically) unsound. I don't have the time right now to explain why.

BTW, Tango's 149 149 149 149 142 observation is not a revelation. It doesn't "need" explaining nor is it open to "criticsm".

It is a mathematical certainty that no matter what the distribution of true linear weights ratios in the population of baseball players is, any player or players who show an above or below average in any number of years, will "regress" towards the mean (100 in this case) in any subsequent year. How much they regress (in percentage, like the 7/49, if you want to put it that way) depend entirely on how many PA's the historical data is comprised of. In this case, 4 years of 149 regressed to 142 in the 5th year. One year of 149 will regress to, I don't know, something like, 120. Tango's observations were just to make sure that nothing really funny (like the statistician's view of the world is completely F'uped) was going on. We don't need to look at ANY data to tell us that 4 149's will be followed by something less than but closer to the 149, or that 2 149's will be followed by something even more less than and less close to 149. Again, it is a mathematical certainty, even if there is lots of learning and developing going on with baseball players. The learning and developing can only decrease the amount of regression; it cannot eliminate it! Of course, what we will and do find if we look at real-life data, is that this learning and developing (to the extent that people "read into" these banner years) is small or non existent beyond the normal or average age progression. The reason we know this is that if the learning and developing were a large or even moderate factor, we would see much smaller regressions after banner years than we do. The regressions we do see comport very nicely to what a statistical model would predict if no learning and developing were going on. Given that, there can be only one conclusion - THAT A BANNER YEAR IS 99 PARTS FLUCTUATION AND 1 PART LEARNING AND DEVELOPMENT (i.e, the concept of a "breakout year" is a FICTION!)

November 6, 2002 - DW

Mike Green asked: What about the effect of contract status? Have studies been done of performance outside of normal variations in the last year of a contract with impending free agency/arbitration? I can think of a number of players who had their banner year as they entered free agency.

The first "serious" study I ever did was on this topic - unfortunately, the only copies of it reside in a professor's office (or trashcan) and in a dead laptop somewhere in Middle Earth, so forgive my lack of detail. I looked at changes in the performance cycle (how players do in year one, year two, etc...) before and after free agency, wondering if players would play better (work harder) in their prior to free agency) and play worse in the year after FA. Used PROD+ and PROD+ times PA, iirc, as the prinicpal measures of performance (though results were independent of metric). Controlled for age and "quality" (average regular, star, etc...) Modeled with a variety of regression frameworks (I remember that I used single and multiple regs, but not much of the details). The twist is that I wasn't privy to a player's FA status - I estimated it by mandating high PA totals in the first n years of his career (to estimate service time) and comparing the pre- and post- free agency periods. Result: saw no impact, either in playing time or performance per game resulting from the advent of free agency. (Mind you - there are things I'd change about that study - most notably, actually knowing players' FA statuses and having a backup copy of everything on disk somewhere - but it's a faint datapoint against the shirking argument.)

November 7, 2002 - F James

Sorry, MGL, but I've got to shoot down your argument. Sure, any group of players will regress toward the mean. The question is, what mean? It is NOT the mean of all players (i.e., 100) but the true mean for the group itself. If you assemble a team of All-Stars, you most certainly would not expect their true mean to be the overall population mean. Just because you can't know precisely what that number is doesn't mean you should assume it is 100. Indeed, if all you have to go on is their 3-year performance (e.g., 149, 149, 149) then your best estimate of the true mean for the group is 149.

November 7, 2002 - MGL

Mr. James,

You need to read my protracted discussion on "regression" as it applies to baseball talent. I think it is in this thread, but I'm not sure.

Despite your moniker, you got no shot to "shoot me down" on this one!

A player's stats gets regressed to the mean of the population that he comes from. Yes, if we assemble ALL-STARS and choose players from that group, the mean is greater than 100. Same is true if we assemble right fielders and choose a player or players from that group. If we assemble a group of sub 6-foot players, our mean will probably be less than 100.

Tango looked at all ML players and chose those who had high lwts ratios (an average of 149) for 4 straight years. EVEN THOUGH THESE ARE OBVIOUSLY GREAT PLAYERS, THEY CAME FROM A GROUP OF ALL ML PLAYERS. That is why you regress toward the mean of all ML players (actually you regress toward the mean of all ML players who have played at least 5 years). If you assemble All Stars and choose from that group, THERE HAS TO BE AN INDEPENDENT REASON FOR YOU CALLING THEM ALL-STARS, OTHER THAN THE CRITERIA YOU USED TO SELECT PLAYERS FROM THAT GROUP. IOW, in order to regress those same 149 players to a mean greater than 100, you would need to assemble your ALL STARS first by using some creiteris independent of having a high lwts ratio for 4 straight years - say a high lwts ratio for the previous 3 years. If you do that, then you regress toward the mean of all players who have had high lwts ratios for 3 straight years and have played for at least 5 more years. Get it!

You regress towards the mean of the population that your player comes from! You cannot make any inferences about that population based upon you rsampling criteria! That's the whole point of regression!

Read this - it is important:

To put it another way, in Tango's example, he looks at the entire population of baseball players. They include all players of all true talent. They have a certain mean lwts ratio, which we can easily measure, and of course, we define it as 100. Next he ignores those players who have not had a minimum number of PA's for at least 5 years, right? So now we have a population of players who have had a min # of PA's for 5 straight years. We take the mean of that population, which is probably higher than 100 (say 105). That is the number we regress to! The fact that we now select only those players who have had at least a 125% lwts ratio for 4 straight years DOES NOT CHANGE THE POPULATION THAT THESE PLAYERS WERE SELECTED FROM. That is the popultion whose mean we rregress to! Yes, that group of > 125% players are ALL-STARS as a group. Their true lwts ratio is much greater than 105, but it is NOT 149, as we can see from the 5th year ratio of 142. By definition, when we regress to the mean (this is not my "opinion" it is a rule of statistics), we regress to the mean of the poulation from which we chose our players, regardless of what criteria we used to select those players. By criteria, I mean "What range of lwts ratios?", like the > 125% that Tango chose. If we choosze criteria (these are independent variables) like what position, or what weight, OR WHAT WAS THEIR LWTS RATIO IN THE PRIOR YEAR OR YEARS, then we have a new population and hence, a new mean to regress to. Anyway, I got off on a tangent as far as the important thing to read...

When Tango chose those players who had ratios above 125% for 4 straight years, the reason we regress at all is that those playres selected consist of: 1) players with a true ratio of less than 149 who got lucky, 2) those players with a true ratio around 149, the sample mean, and 3) those players who have a true ratio GREATER than 149. We don't know in what proportion they exist, but even though it is more likely that a player who has a sample mean of 149 is a true 149 player, and it is less likely that he is a less than or greater than 149 true player, there are many, many more players in our original group (our population) that were true sub-149 players, so it is much more likely that an "average" player in our 149 sample group of players is a true sub-149 player who got lucky. It just so happens that the proper mean to regress to is the mean of the original group, whatever that is (105?).

If we had chosen a group of ALL-STARS, based on, say, 3 years worth of lwts ratios above 125%, we now have a group whose true lwts ratios is around 135 or so. Now, if FROM THAT GROUP, we select those players who have had 4 MORE year of > 125%, then we have the same experiment as Tango's, except that that group is ALREADY made up of players who have a true ratio of around 140, as opposed to in Tango's example, the group he selcted from are ALL ML players who have played for 5 years (etc.). They only have a mean ratio of around 105. So in the second experiment, where we choose from a group of KNOWN good players (on the average, not all of them), many more of the players we select are good players who did not get lucky or got a little lucky for the next 5 years (after the initial 3 years of > 125% performance). Many more (percentage-wise) in the second group, as oposed to the first group, are also true > 149 players who got unlucky. That's why the true ratio in the second group is more than 142 (probably 145 or so). It is still not 149, since the mean of the group of ALL-STARS is only around 135, so we still have to regress the 149 sample mean to 135. The "reason" for this is that we still have some lingering players who are not very good, but managed to make it into the ALL-STAR group through luck, and ALSO managed to make it into the > 125% for the next 5 years group. Obviously not many players will make it through these 2 hurdles, but as long as there is a finite chance that any true sub-149 player will make it, the true ratio of the 149 group will ALWAYS be less than 149! You may say, wait, there is an equal chance that a > 149 players made it through both groups as a sub-149 player, so they would cancel each other out! In fact that is true! There is an equal likelihood that any player in our 149 group is a true 154 or a true 144 (each one is 5 points different from the mean). But here is the kicker that makes us regress downward in either experiment: There are many more players in either population who are true sub 149 players than there are who are > 149 players, so an average player in our 149 group is MORE likely to be a 144 player who got lucky than a 154 player who got unlucky, simply becuase there are more 144 players!

Now if we chose our ALL-STARS such that our estimate of the average true ratio in that group of ALL-STARS were 155 (let's say we selected all players who had a ratio for 3 years of greater than 140 - not too many players, of course), then if we did the same experiment, and the sample ratio of players who had > 125% for the next 4 years was stil 149, we would eactually regress upwards such that our estimate of the 149 group's true mean ratio would be like 153 or so!

I hope this explains everything, because I just missed my appointment at the gym!

November 7, 2002 - F James

You are confusing random sampling and selective sampling. If I select a group of major league players at random, by throwing darts at a dartboard, then I would expect them to regress to the overall population mean. But if I deliberately set out to choose the best players in the game, I would expect them to maintain their advantage over the rest of the league from year to year, except for a slight aging effect. Yes, there will be a few ringers in any group of All-Stars, average players who got extremely lucky for one year. There may even be one or two who can do it 2 years in a row. But by requiring at least 125 performance for 3 consecutive years you have virtually guaranteed that only the very best will qualify for your sample. This is a highly selective sample; it will NOT regress to the overall population mean.

November 7, 2002 - MGL

I'm done trying to explain how it works with baseball talent (or any similar experiment). Either we are misunderstanding one another or you are very stubborn or both. Maybe someone else can explain it to you or maybe we can just drop it.

If a sample of players (yes NOT a random sample), using a measurement criteria (like above 125% for 4 straguiht years), drawn from the population of baseball players DID NOT regress toward the mean then you would NOT see the 5th year at 142 - you would see it at 149, right?

Do an experiment which should make everything obvious - in fact, you don't even have to do the experiment - the results hsould be obvious:

Look at all BB players from 2001. Take only those who had an OPS above .900 (non-random sample - obviously). Their average (mean) OPS is something like .980. What do you think their OPS as a group is in 2002? It is the .980 regressed toward the mean of the entire population of baseball playersr (around .770), which will be maybe .850 or .880. The 2002 OPS is also the best estimate of these players' true OPS (given a large enough sample of players, the best estimate of those players' average true OPS is the next year's - or any large random sample - OPS). We KNOW this right? We know that we take any non-random sample of players defined by a certain performance (less than .700 OPS, greater than .800, etc.), their subsequent (or past) OPS will REGRESS (toward the mean of the whole population of BB players)! That is regression towards the mean (for an excperiment like this)! There does not have to be random sampling, although the exact same thing would happen if we took a random sample!

What do you think would "happen" (what would the future look like) if we looked at all those players who had an OPS of over 1.000 for one week? (See my thread on hot and cold streaks.) Would their future (or past) OPS regress towards the mean or wouldn't it? Would their average OPS (of around 1.100) remain the same? Do you not understand what I am getting at here? What do you think the true talent (OPS) of these one-week 1.100 guys is? I know you don't think it is 1.100, which means they will continue at a 1.100 clip. I know you know that it will continue at a pace closer to the league average (probably around .880). What do you call that other than regression to the mean?

(Light bulb went off in my head!) Now I see what you are saying! My apologies. Yes technically, these "higher than average groups" (the 149 for 4 straight years guys or the better than 1.000 OPS for one week guys) will regress toward THEIR mean lwts ratio or OPS and NOT the mean of the league as a whole. Yes that is true and I think that is what you are trying to say. Again, my apologies. You are absolutely correct. IN PRACTICE, you can use the mean of the whole league to regress to, because you don't know what the mean of the group you selected is - in fact, that is what you are trying to figure out. IOW, if we take the 149 ratio guys and want to figure out their true ratio or what their ratio will be in the next year (same thing, basically), then technically, we must use their true ratio to regress to, but that's what we are trying to figure out in the first place - what that true ratio is. SInce we don't know that and the onlything we know is the mean ratio of all players, then we have to regress that 149 towards that all player mean of 1056 or whatever it is. Yes, technically that 149 doesn't get regressed towards 105. It gets regressed towards ssomething less than 149 and mroe than 105, but since we don';t know what that is, it LOOKS like it gets regressed towards the 105.

Anyway, there is no argument anymore, unless you think that the true OPS of that 149 group is 149, rather than something like 142 (the 149 regressed towards the true ratio). IF you do, you must wonder why the next year comes out to 142. If you do, you must think that one year of a more than .900 OPS is a player's true OPS, again, in which case you must wonder why these guys show a .800 OPS or so the next year. And if you do, you must certainly wonder why a group that shows a 1.100 in a week does not continue at that clip, even though we did not randomly select this group of players (we only looked at players who had greater than a 1.000 OPS in a certain week)...

CHeers!

November 7, 2002 - tangotiger (www) (e-mail)

MGL, no F James specifically said that these 149,149,149 players would not regress, except for aging. In fact, they do regress to 142.

This group of players will regress towards THEIR mean, I agree. In fact, they will regress 100% towards their mean. But since we don't know what their mean is (without looking at other non-sampled years), we take the next best thing: the mean of the population they were drawn from. This mean is in fact 115%. Therefore, given the number of years (3), the number of players (I don't remember, let's say 100), and the number of PAs (let's say 500 / player / year), the best players will regress 7/34 (20%) towards the mean of the population they were drawn from. Different years, different # of players, and different # of PAs will regress differently.

Now, I know little of statistics, and perhaps Walt Davis or Ben V can put this matter to rest.

I'll be back in a week or two with detailed data, broken down by age.

November 8, 2002 - MGL

I'm gonna try one more time! Forget my last post!

Forget about the expression "regression to the mean". It is a generic expression which could have different meanings depending upon the context. Pretend it doesn't exist.

Remember that when I say that a value (call it value 1) gets "regressed" to another value (call it value 2), THAT MEANS two things and two things only:

1) Value 2 represents the direction in which we move value 1 (it can be moved either up or down).

2) If we don't know how much to move value 1 (which we usually don't in these types of experiments), value 2 represents the limit of the movement.

For example, if value 1 is 149 and value 2 is 135, we know 2 and only 2 things, assuming that we have to "move" value 1. One, we move it down (towards the 135), and two, we move it a certain unknown amount but we never move it past 135.

How does this very vague above concept apply to baseball players? I'm glad you asked.

First, I am going to call "value 2", as described above, "the mean" and I am going to substitute the word "regress" for the word "move" as used in the context above. This is literally an arbitrary choice of words. We might as well say "wfogbnlnfl to the slkvdn". I'm using the word "mean", not in any mathematical sense, but to represent the "limit of how much we move value 1". Likewise, I am using the word "regress", also not in any mathematical sense, but purely as a substitute for the word "move".

So "regression to the mean" from now on simply means "We move value 1 either up or down, depending on whether value 2 is less than or greater than value 1, and value 2 also represents the most we can move value 1."

Now here are some experiments in which I will attempt apply the above methodology. You tell me whether it should be applied or not (and if not, why). If you think that it is appropriate to apply, you must also tell me what value we should use for value 2. The correct answers will appear at the end of this post.

Experiment #1:

Let's say that we have a population of BB players and we don't know whether all players in that population have the same true OPS or not. Either everyone has the same true OPS (like a population of all different denominations of coins, where every coin has the same true H/T "flipping" ratio), or some or all of the players have different true OPS's (like if we had a population of coins and some coins were 2-sided, while others were 3-sided or 4-sided, etc.).

Now let's say that we randomly sample 100 of these players and look at a 1-year sample of each player's lwts ratio. We basically have 100 "man-years" of data that were randomly sampled (not exactly, but close enough) from a population of all baseball players.

Let's say that the mean OPS these 100 players is .780. This is our value 1, by the way. Let's also say that WE DO NOT KNOW WHAT THE MEAN OPS OF THE POPULTION OF ALL PLAYERS IS. Remember that we randomly sampled these 100 players and 1 year's worth of data for each player from a population of all players and all years.

What is the best estimate of the true OPS of these 100 players, given that the average OPS for all 100 players for 1 year each is .780?

In order to arrive at that estimate, did you need to determine a value 2 and did you need to move value 1 (.780) in the direction of value 2 and does value 2 represent the furthest you should move value 1? If the answer is yes to all 3 related questions, how did you arrive at value 2? If the answer is yes to some and no to some (of the above 3 questions), please explain.

Experiment #2:

Same as experiment #1, but we now know (G-d tells us) that the mean OPS of the population of all players is .750 AND we know that all players have the same true OPS.

Again, what is the best estimate of the true OPS of our 100 players chosen randomly (and their 1-year stats each)? This is a no-brainer right? It is not a trick question. The answer is as obvious as it seems.

Given your answer to the above question, did you move value 1 (the .780), is there a value 2 (and if so, what is it), and if the answer to both questions is yes, do we know exactly how much to move value 1 ("towards" value 2)? IOW, is there regression to the mean (remember my above definition - movement, direction, and limit, where "regression to" is "movement towards" and "the mean" is "value 2")?

Experiment #3: (We are getting closer and closer to Tango's experiment)

Same as above (#2) only this time not only do we know that the mean OPS of all players is .750, we also know (again from G-d, not from any empirical data) that all players in the population have different true OPS's. IOW, some players have true OPS' of .600, some .720, some .850, some .980, etc. In this experiment we don't know, nor does it matter, what percentage of players have true OPS's of x, what percentage have true OPS's of y, etc. We only know that different players have different true OPS's. So in our random sample of 100 players, each player could have a true OPS of anything, assuming that every OPS is represented in the population in unknown proportions.

Now, rememeber, like the previous experiments, the mean OPS of our 100 randomly selected players for 1-year each (at this point it doesn't matter that we used 1-year data for each player. We could have used 2-year or 6-months), is .780. Remember also that we KNOW the true average (mean) OPS of all the players is .750. And don't forget (this is what makes this experiment different from #2) that we KNOW that different players in the population have different true OPS's, of unknown values and in unknown proportions (again, the last part - the "unknown values and in unknown proportions" - doesn't matter yet).

So now what is your best estimate of the true average (mean) OPS of the 100 players? Is this an exact number? Do we use a "regression to the mean"? If yes, what is value 2 and do we know exactly how much to move (regress) value 1 (again, the .780) towards value 2?

Here are the answers to the questions in experiments 1-3:

Experiment #1 (answer):

The best estimate of the average true OPS of the 100 players is .780, the same as their sample OPS. There is no "regression to the mean". There is no value 2; therefore there is no movement from value 1. (Technically, we could say that value 2 is .780 also, the same as value 1, and that we regress value 1 "all the way" towards value 2.) The above comes from the following rule in sampling statistics:

When we sample a population (look at the 1-year OPS of 100 players) and we know nothing about the characteristics of the population, as far as the variable we are sampling (OPS) is concerned, the sample mean (.780) is the best estimate of the population mean.

Experiment #2 (answer):

The answer is that no matter what the sample OPS is (in this case .780), the true OPS of any and all players (including the average of our 100 players) is .750! This is simply because we are told that the true OPS of all players is .750! Any sample that yields an OPS of anything other than .750 MUST BE DUE TO SAMPLING ERROR

November 8, 2002 - MGL

I'm gonna try one more time! Forget my last post!

Forget about the expression "regression to the mean". It is a generic expression which could have different meanings depending upon the context. Pretend it doesn't exist.

Remember that when I say that a value (call it value 1) gets "regressed" to another value (call it value 2), THAT MEANS two things and two things only:

1) Value 2 represents the direction in which we move value 1 (it can be moved either up or down).

2) If we don't know how much to move value 1 (which we usually don't in these types of experiments), value 2 represents the limit of the movement.

For example, if value 1 is 149 and value 2 is 135, we know 2 and only 2 things, assuming that we have to "move" value 1. One, we move it down (towards the 135), and two, we move it a certain unknown amount but we never move it past 135.

How does this very vague above concept apply to baseball players? I'm glad you asked.

First, I am going to call "value 2", as described above, "the mean" and I am going to substitute the word "regress" for the word "move" as used in the context above. This is literally an arbitrary choice of words. We might as well say "wfogbnlnfl to the slkvdn". I'm using the word "mean", not in any mathematical sense, but to represent the "limit of how much we move value 1". Likewise, I am using the word "regress", also not in any mathematical sense, but purely as a substitute for the word "move".

So "regression to the mean" from now on simply means "We move value 1 either up or down, depending on whether value 2 is less than or greater than value 1, and value 2 also represents the most we can move value 1."

Now here are some experiments in which I will attempt apply the above methodology. You tell me whether it should be applied or not (and if not, why). If you think that it is appropriate to apply, you must also tell me what value we should use for value 2. The correct answers will appear at the end of this post.

Experiment #1:

Let's say that we have a population of BB players and we don't know whether all players in that population have the same true OPS or not. Either everyone has the same true OPS (like a population of all different denominations of coins, where every coin has the same true H/T "flipping" ratio), or some or all of the players have different true OPS's (like if we had a population of coins and some coins were 2-sided, while others were 3-sided or 4-sided, etc.).

Now let's say that we randomly sample 100 of these players and look at a 1-year sample of each player's lwts ratio. We basically have 100 "man-years" of data that were randomly sampled (not exactly, but close enough) from a population of all baseball players.

Let's say that the mean OPS these 100 players is .780. This is our value 1, by the way. Let's also say that WE DO NOT KNOW WHAT THE MEAN OPS OF THE POPULTION OF ALL PLAYERS IS. Remember that we randomly sampled these 100 players and 1 year's worth of data for each player from a population of all players and all years.

What is the best estimate of the true OPS of these 100 players, given that the average OPS for all 100 players for 1 year each is .780?

In order to arrive at that estimate, did you need to determine a value 2 and did you need to move value 1 (.780) in the direction of value 2 and does value 2 represent the furthest you should move value 1? If the answer is yes to all 3 related questions, how did you arrive at value 2? If the answer is yes to some and no to some (of the above 3 questions), please explain.

Experiment #2:

Same as experiment #1, but we now know (G-d tells us) that the mean OPS of the population of all players is .750 AND we know that all players have the same true OPS.

Again, what is the best estimate of the true OPS of our 100 players chosen randomly (and their 1-year stats each)? This is a no-brainer right? It is not a trick question. The answer is as obvious as it seems.

Given your answer to the above question, did you move value 1 (the .780), is there a value 2 (and if so, what is it), and if the answer to both questions is yes, do we know exactly how much to move value 1 ("towards" value 2)? IOW, is there regression to the mean (remember my above definition - movement, direction, and limit, where "regression to" is "movement towards" and "the mean" is "value 2")?

Experiment #3: (We are getting closer and closer to Tango's experiment)

Same as above (#2) only this time not only do we know that the mean OPS of all players is .750, we also know (again from G-d, not from any empirical data) that all players in the population have different true OPS's. IOW, some players have true OPS' of .600, some .720, some .850, some .980, etc. In this experiment we don't know, nor does it matter, what percentage of players have true OPS's of x, what percentage have true OPS's of y, etc. We only know that different players have different true OPS's. So in our random sample of 100 players, each player could have a true OPS of anything, assuming that every OPS is represented in the population in unknown proportions.

Now, rememeber, like the previous experiments, the mean OPS of our 100 randomly selected players for 1-year each (at this point it doesn't matter that we used 1-year data for each player. We could have used 2-year or 6-months), is .780. Remember also that we KNOW the true average (mean) OPS of all the players is .750. And don't forget (this is what makes this experiment different from #2) that we KNOW that different players in the population have different true OPS's, of unknown values and in unknown proportions (again, the last part - the "unknown values and in unknown proportions" - doesn't matter yet).

So now what is your best estimate of the true average (mean) OPS of the 100 players? Is this an exact number? Do we use a "regression to the mean"? If yes, what is value 2 and do we know exactly how much to move (regress) value 1 (again, the .780) towards value 2?

Here are the answers to the questions in experiments 1-3:

Experiment #1 (answer):

The best estimate of the average true OPS of the 100 players is .780, the same as their sample OPS. There is no "regression to the mean". There is no value 2; therefore there is no movement from value 1. (Technically, we could say that value 2 is .780 also, the same as value 1, and that we regress value 1 "all the way" towards value 2.) The above comes from the following rule in sampling statistics:

When we sample a population (look at the 1-year OPS of 100 players) and we know nothing about the characteristics of the population, as far as the variable we are sampling (OPS) is concerned, the sample mean (.780) is the best estimate of the population mean.

Experiment #2 (answer):

The answer is that no matter what the sample OPS is (in this case .780), the true OPS of any and all players (including the average of our 100 players) is .750! This is simply because we are told that the true OPS of all players is .750! Any sample that yields an OPS of anything other than .750 MUST BE DUE TO SAMPLING ERROR

November 8, 2002 - MGL

I'm gonna try one more time! Forget my last post!

Forget about the expression "regression to the mean". It is a generic expression which could have different meanings depending upon the context. Pretend it doesn't exist.

Remember that when I say that a value (call it value 1) gets "regressed" to another value (call it value 2), THAT MEANS two things and two things only:

1) Value 2 represents the direction in which we move value 1 (it can be moved either up or down).

2) If we don't know how much to move value 1 (which we usually don't in these types of experiments), value 2 represents the limit of the movement.

For example, if value 1 is 149 and value 2 is 135, we know 2 and only 2 things, assuming that we have to "move" value 1. One, we move it down (towards the 135), and two, we move it a certain unknown amount but we never move it past 135.

How does this very vague above concept apply to baseball players? I'm glad you asked.

First, I am going to call "value 2", as described above, "the mean" and I am going to substitute the word "regress" for the word "move" as used in the context above. This is literally an arbitrary choice of words. We might as well say "wfogbnlnfl to the slkvdn". I'm using the word "mean", not in any mathematical sense, but to represent the "limit of how much we move value 1". Likewise, I am using the word "regress", also not in any mathematical sense, but purely as a substitute for the word "move".

So "regression to the mean" from now on simply means "We move value 1 either up or down, depending on whether value 2 is less than or greater than value 1, and value 2 also represents the most we can move value 1."

Now here are some experiments in which I will attempt apply the above methodology. You tell me whether it should be applied or not (and if not, why). If you think that it is appropriate to apply, you must also tell me what value we should use for value 2. The correct answers will appear at the end of this post.

Experiment #1:

Let's say that we have a population of BB players and we don't know whether all players in that population have the same true OPS or not. Either everyone has the same true OPS (like a population of all different denominations of coins, where every coin has the same true H/T "flipping" ratio), or some or all of the players have different true OPS's (like if we had a population of coins and some coins were 2-sided, while others were 3-sided or 4-sided, etc.).

Now let's say that we randomly sample 100 of these players and look at a 1-year sample of each player's lwts ratio. We basically have 100 "man-years" of data that were randomly sampled (not exactly, but close enough) from a population of all baseball players.

Let's say that the mean OPS these 100 players is .780. This is our value 1, by the way. Let's also say that WE DO NOT KNOW WHAT THE MEAN OPS OF THE POPULTION OF ALL PLAYERS IS. Remember that we randomly sampled these 100 players and 1 year's worth of data for each player from a population of all players and all years.

What is the best estimate of the true OPS of these 100 players, given that the average OPS for all 100 players for 1 year each is .780?

In order to arrive at that estimate, did you need to determine a value 2 and did you need to move value 1 (.780) in the direction of value 2 and does value 2 represent the furthest you should move value 1? If the answer is yes to all 3 related questions, how did you arrive at value 2? If the answer is yes to some and no to some (of the above 3 questions), please explain.

Experiment #2:

Same as experiment #1, but we now know (G-d tells us) that the mean OPS of the population of all players is .750 AND we know that all players have the same true OPS.

Again, what is the best estimate of the true OPS of our 100 players chosen randomly (and their 1-year stats each)? This is a no-brainer right? It is not a trick question. The answer is as obvious as it seems.

Given your answer to the above question, did you move value 1 (the .780), is there a value 2 (and if so, what is it), and if the answer to both questions is yes, do we know exactly how much to move value 1 ("towards" value 2)? IOW, is there regression to the mean (remember my above definition - movement, direction, and limit, where "regression to" is "movement towards" and "the mean" is "value 2")?

Experiment #3: (We are getting closer and closer to Tango's experiment)

Same as above (#2) only this time not only do we know that the mean OPS of all players is .750, we also know (again from G-d, not from any empirical data) that all players in the population have different true OPS's. IOW, some players have true OPS' of .600, some .720, some .850, some .980, etc. In this experiment we don't know, nor does it matter, what percentage of players have true OPS's of x, what percentage have true OPS's of y, etc. We only know that different players have different true OPS's. So in our random sample of 100 players, each player could have a true OPS of anything, assuming that every OPS is represented in the population in unknown proportions.

Now, rememeber, like the previous experiments, the mean OPS of our 100 randomly selected players for 1-year each (at this point it doesn't matter that we used 1-year data for each player. We could have used 2-year or 6-months), is .780. Remember also that we KNOW the true average (mean) OPS of all the players is .750. And don't forget (this is what makes this experiment different from #2) that we KNOW that different players in the population have different true OPS's, of unknown values and in unknown proportions (again, the last part - the "unknown values and in unknown proportions" - doesn't matter yet).

So now what is your best estimate of the true average (mean) OPS of the 100 players? Is this an exact number? Do we use a "regression to the mean"? If yes, what is value 2 and do we know exactly how much to move (regress) value 1 (again, the .780) towards value 2?

Here are the answers to the questions in experiments 1-3:

Experiment #1 (answer):

The best estimate of the average true OPS of the 100 players is .780, the same as their sample OPS. There is no "regression to the mean". There is no value 2; therefore there is no movement from value 1. (Technically, we could say that value 2 is .780 also, the same as value 1, and that we regress value 1 "all the way" towards value 2.) The above comes from the following rule in sampling statistics:

When we sample a population (look at the 1-year OPS of 100 players) and we know nothing about the characteristics of the population, as far as the variable we are sampling (OPS) is concerned, the sample mean (.780) is the best estimate of the population mean.

Experiment #2 (answer):

The answer is that no matter what the sample OPS is (in this case .780), the true OPS of any and all players (including the average of our 100 players) is .750! This is simply because we are told that the true OPS of all players is .750! Any sample that yields an OPS of anything other than .750 MUST BE DUE TO SAMPLING ERROR, by definition. It told you this one was a no-brainer! In this experiment, there is "regression to the mean" (again, per my definition - re-read it if you forgot what it was). Value 1 (.780) gets moved towards value 2, (.750). It just so happens that we know exactly how much to move it (all the way). Value 2 is still the limit on how much we can move value 1 in order to estimate the average true OPS of the sample group of players. And in this case, value 2 is equal to THE MEAN OF THE POPULATION! How do you like that? In this experiment, regression to the "mean" is really to the "MEAN"!

Experiment #3 (answer):

Remember we still know the population mean (.750). This time, however, not only are we not told that all players have the same true OPS, we are told that they definitely don't. The answer is that the best estimate of the true OPS of our 100 player sample (with a sample average OPS os .780) is something less than .780 and something more than .750. We don't know the value of the "somethings" so there is no exact answer other than the above (given the information we have). So again we have "regression to the mean", with value 2 still being .750, value 1 still .780, and the amount of regression or movement is unknown. The movement must be down since value 2 is less than value 1, and the limit of the movement is .750 since that is the value of value 2. (We can actually estimate the amount of the movement given some other paramaters but that is not important right now - we are only interested in whether "regression to the mean" is appropriate in each of these experiments, and if it is, what is the value of value 2.) BTW, as in experiment #2, value 2 happens to be the population mean, so the expression "regression to the mean" is somewhat literal, although again, that is somewhat of a coincidence.

Back to some more experiments (leading up to Tango's)...

Experiment #4:

We know the average OPS of all players is .750 and we know that all players have the same true OPS. This time, however, we select X players, not randomly, but all those who had an OPS in 1999 greater than .850. Let's say that there were 25 such players and that their average (sample) OPS was .912 (for that 1 year).

What is the average true OPS of our 25 players? Again, easy (trick) answer! It is still .750, since you are told that all players have a true OPS of .750. Again it doesn't matter what criteria we chooose to select players or what any player's 1-year sample OPS is. All sample OPS's that differ from .750 are due to statistical fluctuation (sample error), by definition. Again, "regression toward the mean", where the "regression" is 100% and the "mean" (value 2) is the mean OPS the population. So we have "regression to the mean" even thgough we did not choose a random sample of players from the population. We chose them based upon the criteria we set - greater than an .850 sample OPS in the year 2000.

Experiment #5 (same as Tango's):

Same population of players. It has a mean OPS of .750. Unlike the above experiment, each player can have a different true OPS. This time we only look at players who had a sample OPS of greater than .990 during the month of June in 2000. The average (sample) OPS of this group (say 50 players) is 1.120.

What is your best estimate of the true OPS of this group of players? (This question is of course exactly the same question as "What is your best estimate of what this group of players will hit in July 2000 or August 2000, not counting things like changes in weather, etc.?") Well what is it, and how do you arrive at your answer? Is your answer reasonable?

In order to arrive at your answer, was there any "movement" from the 1.120, like there was in the last experiment (the .912 had to be moved toward the .750 - in fact all the way to it)? In which direction 0 up or down? Why? If you did move the 1.120 to arrive at a different value for the "best estimate of the sample players' true OPS" how much did you move it? How much should you move it? Is there a limit on how much you should move it? If you did move it, is there a value 2 that tells us in what direction to move value 1 (the 1.120). How did you arrive at the value 2? Does this value 2 (like all value 2's are supposed to do) represent the limit of the movement? If not, why not, and what is the limit of the movement? Was there "regression to the mean" in deriving at your estimate of the sample players' average true OPS? If yes, what value represented "the mean"?

I'm not going to answer any of above questions in the last experiment. If you answer them (and the others) correctly, you will know everything there is to know about Tango's and similar experiments and whether there is or is not "regression to the mean", in the generic sense, in determining sample players' OPS, no matter how we choose our sample (randomly or not), and what value is represented by "the mean", given that "regression" simply means "some amount of movement"...

November 8, 2002 - F James

First of all, your Experiment #5 is fundamentally different than Tango's. His requirement to enter his sample was that each player had to outperform the population as a whole by at least 25% for 3 years in a row. Your requirement is only that each player outperform the population for one specific month. ANY player can have an outstanding month, even Rey Ordonez. Thus, ANY player has a non-zero chance of showing up in your sample. Rey Ordonez has NO chance of showing up in Tango's sample, even if you simulated his performance over a thousand years. So your sample is fairly representative of the population at large. Tango's is NOT.

Not only are the Rey Ordonezes of baseball excluded entirely from Tango's sample (but not from yours), but I would go further and say that even average players are effectively excluded from his sample. As I said in my last post, "Yes, there will be a few ringers in any group of All-Stars, average players who got extremely lucky for one year. There may even be one or two who can do it 2 years in a row. But by requiring at least 125 performance for 3 consecutive years you have virtually guaranteed that only the very best will qualify for your sample."

Tango, you can help us out here. Of the 655 strings of players in your Study 3, what was their distribution of career performance scores prior to Year 1? That is, how many were under 100, how many were in the 100 to 109 range, 110 to 119, etc. And how about their career performance after Year 3?

November 8, 2002 - MGL

Well, Frank, not only did you not answer my last few questions, but it is obvious to me that you know virtually nothing about basic statistical principles (at least not enough to have an intelligent discussion about this kind of analysis).

Your statement:

"Rey Ordonez has NO chance of showing up in Tango's sample, even if you simulated his performance over a thousand years."

is of course patently wrong. Rey Rey or even my 13 yo cousin does not have NO chance of showing up in Tango's sample. Everyone has SOME FINITE chance, no matter how infinitesimal. The tails of a bell curve reach out to an infinite distance.

That being said, the discussion (with you) is probably over. Neverthless, my sample will contain slightly better than average players whereas Tango's sample will contain distinctly better than average players. We all know this of course.

Neverthesless, the answer to my questions will be same whether we use my sample or Tango's sample. Both samples, of course, are non-random, which was the crux of your original criticism. Tango's is very non-random and mine is slightly non-random. At what point to you switch answers (that you do or don't use "regression to the mean" to estimate the sample's true lwts ratio or OPS and that "the mean" is not the mean of all players)? You don't like "one week" because it encompasses most players (again, it is still non-random as the worst players will tend to not have any or many one week OPS's above .980, whereas the great players will have many - in fact probably around half of all their weeks will be above .950)? What about 2 weeks? 6 months? One year? 2 years? 4 years like in Tango's study? At what point do we no longer "regress to the mean of the population" in order to estimate true OPS of lwst ratio? Your argument is silly!

The reason I used one week is to make it obvious to you that even if we take a non-random sample of players from the population of all players and our selection criteria is greater or less than a certain OPS or lwts ratio, that the estimate of true OPS or ratio in that non-random group must be "less extreme" than the sample result. You know that; everyone knows that. The one-week experiment makes it obvious. You are just digging in your heels at this point, for whatever reasons. Rememvber that what we use for value 2 (the so-called "mean") is simply the number that represents the limit of our regression.

Read this!!!!

Here is where you are getting screwed up, no offense intended. In the one-week experiment, you know intuitively that the true OPS for our sample players is actually somewhere near the mean of the population (a little higher), so you don't mind using the population mean as the number to regress towards (remember that number, value 2, just tells us DIRECTION and LIMIT). In Tango's experiment, you know intuitively that the sample group is mostly very good players with very high true ratios - which is true - so you don't like using the population mean as value 2. That's fine. The true OPS of Tango's sample players is nowhere near value 2 (the mean of the population); we still use it though to give us direction and limit - that's all value 2 is - remember? The concept of value 2 being the limit becomes silly when we choose mostly very good players of course, but we still use it to give us direction because it is the only KNOWN value that we have. In those extreme cases, we don't really NEED it, as we can just say that the true ratio of Tango's sample players is "something slightly less than 149", but we can also say that it is "149 regressed towards 105" (or whatever the mean ratio of all 5-year players is). That's all I have been trying to say.

Basically, Frank, once you establish value 2 as your direction and limit of your regression (and value 2 is always, by definition, the mean of the population), then you can decide how much to regress your sample mean (the 149 in Tango's case). Without doing a regression analysis or having other information about the distribution of true ratios in the poulation, you can only guess at how much to regress. In my case - the one-week guys - you regress a lot, which you know intuitively. Still doesn't change value 2, does it, even though you know intuitively that the final answer (the best estimate of the sample players' true OPS or ratio) is going to be close to value 2? Let's take 6-month players. Intuitively, you know to regress more than you do for the one-week players, but still a lot. Again, we still use value 2, or the population mean, to tell us direction and limit. If you didn't use a precise value for value 2, you might make a mistake and regress too much! For example, if we did the same experiment and used one month samples above an OPS of .980, you would know intuitively that their true OPS's were a lot less than that, right? Let's true to guess .850. Is that too high or too low? Without knowing the population mean and using it as our value 2 (the limit of the regression), we can't tell! If I told you that the population mean is .750, then you would say "Phew, that's not too low!" Infact, you would probably nopw say that their true OPS was around .780 or .800, wouldn't you? IOW, you need to have that value 2 in order to have SOME IDEA as to how much to regress! And that value 2 is always the popualtion mean, arbitrarily and by defintion, simply becuase you know that the true OPS of your sample of players cannot be less tha that (since you selected your players based on a criteria of some time period of performnce GREATER than the average). How about 1-year samples above .980. We know to regress even less, but we still need a value 2 to give us SOME IDEA how much to regress and to make sure that we don't regress too much! When we get to 4 year samples, as in Tango's study, we don't all of sudden say "No more regression!" We still regress, only this time a very small amount! Do we need to know what direction? Well it's kind of obvious since we know from ur selection criteria that our sample includes more players who got lucky than unlucky. So we know to regress downwards. Do we need to know exactly what the limit of regression is - i.e., do we need to know value 2? No, of course we don't, but it doesn't change the fact that, like the 1-week, 1-day, 1-year, or 3-year experiments, there still is a value 2, which is still the mean of the overall population, and that technically that number established the direction and limit of our regression. If in Tango's study you don't want to call it "regression to the mean", that's fine. Who cares? It is only semantics. If you want to call it "a little regression donwards" that's fine too. It doesn't change my discussion an iota! I hate when someone takes an "expression" (out of context), critices it (with valid criticism), and then uses THAT criticism to invalidate a person's whole thesis. Why don't you just say "I agree with your whole thesis (in fact, as I said before, there is no agreeing or disagreeing - I'm just parroting you proper statisitical theory in my own words and realting it to these baseball experiments), but I don't like sentence # 42?" It's like if I tell you the entire and correct theory of the origin of the universe (I know that it is debatable), and I finish off by telling you that the universe is contracting rather than expanding and you tell me that I'm compleltely full of crap (political commentators do this all the time)!

Here's an example, BTW, of where knowing value 2, and making sure that it is the mean of the whole population, IS improtant, even when we take multi-year samples. Let's say we did the same experiment as Tango, but we select players who were only 5% above average for 2 straight years. Let's say that the average OPS for that sample was .788. Even though we are reasonable certain that our sample is made up primarily of above average players (no Rey Rey's in this sample), do we need to know value 2 in order to "hone in on" how much to lower the .788 to arrive at an estimate of the true OPS of our sample? Sure! Let's say I don't tell you what the mean OPS of the population of ALL 3-year players (players who have played for at least 3 or 4 years) is? You might choose .775 as the best estimate of our sample players' true OPS. Whoah! If i them told you that the mean OPS in the population (of all 3-year payers) was .776, you would know that you went too far! If I told you that it was .750, then you would know that you were in the ballpark (no pun intented). So while sometimes knowing value 2 is important and sometimes it is not (it is obvious in which direction to regress and it is obvious about how far), it doesn't change the fact that we always HAVE a value 2, and that it is always the mean of the population!

Whew....

November 8, 2002 - tangotiger (www) (e-mail)

F James: I think I wrote this already, but it might got lost with all of MGL's explanation, but the year before the 149,149,149 string was 141 and the year after the string is 142. Subsequent years drop off slightly from 142, and in fact matches what you would expect from normal aging. (This will become more clear when I do the breakdown by age... eventually, whenever that is.)

Essentially, MGL's point boils down to: whatever period you take, how many ever players you take, how many ever PAs that performance makes up, you have to regress to some degree. The amount to regress is related to the variables I just mentioned. By choosing 1 day, we are regressing almost 100%, by choosing 5 years of performance between age of 25 and 29, and in each of those years the player has 1 google PAs, you regress close to 0%. Everything in-between is subject to more analysis.

Given my sample of 3 years of 600+ players of about 500 PAs, the regression of the 149 player is 20% towards the mean of 115 to achieve the true talent level of 142 (more or less).