Tango on Baseball Archives

Advances in Sabermetrics (August 18, 2003)

A BPro reader asked the following:
what do you think are the most important advances in baseball analysis in recent years?
and the BPro author answered:
Off the top of my head, and only the stuff that’s publicly available: First, Keith Woolner’s study on defining replacement level...

If I were asked, I would answer the following off the top of my head:
- DIPS (concept of separating fielding-dependent PAs from fielding-independent PAs)
- MGL's UZR (easily the most state of the art fielding analysis publicly available)
- Regression towards the mean and sample size (understanding that the observed does not equal the true, especially with only 50 PAs)

What do you guys have off the top of your head as the most important recent advances in baseball analysis?
--posted by TangoTiger at 10:45 PM EDT

Posted 1:44 a.m., August 19, 2003 (#1) - Greg Tamer(e-mail)
I suppose we'll find out in a couple of months how well PECOTA did for this season. One season probably isn't enough to warrant a verdict on it, but I'm looking forward to seeing some results and how well it does versus other prediction models.

Also, since you're obviously not the self-accolade kind of guy, I'd say your research deserves mention. Off the top of my head - LI, your SABR 101, 201, 301 series, etc. I'm not sure how recent this stuff is, but you seem to be spitting out quality material on a weekly basis...more than everyone else combined.

Posted 7:15 a.m., August 19, 2003 (#2) - Tangotiger
Greg, thanks for the kinds words, as I wake up from a night of a crying baby. I don't think any of the stuff I did is a great advance... maybe I'm making a better hammer, a better tool, but UZR or DIPS is a building by comparison.

I think the next great advance would be to get "tools-based" analysis done properly. There's a ton of information in the heads of scouts that needs to be extracted and quantified in a more systematic and widespread use. Of coure, this may already be done by others, and we just don't know about it.

Posted 8:35 a.m., August 19, 2003 (#3) - Hantheman
I can't answer the question as posed very well, but since Tangotiger mentioned it, I think the next great advances will be

1 Proper assessment of catcher's defense
2 Determining pitcher injury probabilities based on over-use (and exactly what that means in terms of pitch counts, 4-man vs 5-man rotation)
3 Improvement in projections of hitter and pitcher development based on college and low minors stats

I myself have spent a lot of time on #1, and hope to make more progress in the next year

Posted 9:01 a.m., August 19, 2003 (#4) - tangotiger
I've actually been working on #1. If you send me an email, I'll let you know what I've done.

Your #2 and #3 are great ones. I think the #3 will be a huge advance if we can also incorporate scouting.

Posted 9:38 a.m., August 19, 2003 (#5) - Patriot
Base Runs along with DIPS.

Posted 10:09 a.m., August 19, 2003 (#6) - Rally Monkey (homepage)
I'd put reliever's leverage index up there. Before, we understood that a stopper's innings might have greater impact, but just how much more?

I don't think there's anything new about regression to the mean or sample size. For example, in spring 1984, did anyone mention Butch Davis (see above link) along with Dawson and Murphy as one of the best outfielders in baseball? If not, then there was understanding of small sample size.

Posted 11:19 a.m., August 19, 2003 (#7) - tangotiger
I meant more in say Brady Anderson or others with great 600 PA numbers, but not great 1800 PA numbers.

Essentially, and this happens in hockey too, you give a player a 4-year contract based on his first breakout year. And how did they know it was a breakout year? They were hoping/looking for it, and those 600 PAs confirmed it for them.

Unless something fundamental has changed about a player or pitcher's approach to a player, a breakout year is virtually impossible to find. They exist, but I don't think you can find it until well-after the fact. 1997 may have been a breakout year for Pedro, that maybe everything finally came together for him at that time. But we only know that if it's 1999 or 2000, if we use only the numbers.

Was 1988 a breakout year for Mark Davis? Well, he did even better in 1989, and then phhft. If you do a systematic view, I would be surprised if you can find a breakout year, based only on the numbers.

Your visual tools might have spotted Pedro's breakout in 1997, and you might have realized that Mark Davis might have been getting lucky in 88/89.

Posted 12:12 p.m., August 19, 2003 (#8) - dlf
Regression to the mean is certainly not a new advance in sabermetrics. I don't know who first identified the concept, but would suggest John McGraw's willingness to trade players after their big seasons was the same information presented in a less structured format. Bill James was writing about regression 20 years ago calling it the "plexiglass principle" and applied it to both teams and individuals.

Same with fluke years. Four decades ago, we talked about the '61 season of Norm Cash as a fluke, just as newer followers of the game did with Brady Anderson's 50 taters.

I think of things like DIPS, Woolner's Catcher ERA study, zone based defensive analysis such as ZR and UZR, as being significant changes. Arguably, measuring run production & prevention against team performance ala Win Shares is a significant change in direction, but I'm not sure the state of the art will follow that lead or will continue to try to isolate individual performance apart from team influence.

I expect the next step to be somewhere along the lines of the A's attempts at medical review of pitcher mechanics. I also doubt that the information will be divulged at the level (both quantitatively and qualitatively) we amatuers can participate in and move the knowledge forward.

Posted 12:33 p.m., August 19, 2003 (#9) - tangotiger
It's interesting that we've now got 2 readers saying such a thing, that this advance is 20 years old.

Let me ask a question then: all you know is the following 2 bits of information
- Johnny Walker has an OBA of .380, with 600 PA
- the league OBA is .340

What is your best guess as to JW's true OBA talent level? That is, if he were to have 1 million PAs, what's your single best guess as to his true OBA level? Is his chances at really being .380+ equal to, more than, or less than 50%?

Posted 2:39 p.m., August 19, 2003 (#10) - Ed
Your best guess should be that JW's true level is .380.
To guess otherwise is to misunderstand regression to the mean.

Posted 2:48 p.m., August 19, 2003 (#11) - Patriot
What if he had 100 PA? You still want me to guess .380? That's proposterous.

Posted 2:52 p.m., August 19, 2003 (#12) - tangotiger
Ed, this is where we will have to disagree. Our best guess is that it is less than .380.

A player's performance numbers are a sample of his true talent. It is observed.

If I had 1000 such players, regression towards the mean would say that as a group, they would move towards .340. Therefore, if the group moves towards the mean, more than half has to move toward the mean (unless you think these moves are not symmetrical enough that we can say such a thing).

Little Johnny Walker: .380 OBA, 50 PAs
League: .340 OBA

What's your best guess there?

Posted 3:14 p.m., August 19, 2003 (#13) - Rally Monkey
I agree that the best guess would be less than .380. How much regression is correct? That's a tougher question to answer.

I'll guess .367 is the true ability for Johnny Walker and .344 for Little J. I don't know if the amount of regression I used is correct, but its what I use to come up with next year stats in my APBA league:

Full time players: 2/3 prior ability + 1/3 current season
150-400 PA: 4/5 prior + 1/5 current
50-150 PA: 9/10 prior + 1/10 current
under 50 PA: use prior stats.

In this case I sub the league average for prior ability, since its unknown.

Throw in an age adjustment, and this has come up with realistic player progressions since 1991.

Posted 3:30 p.m., August 19, 2003 (#14) - dlf
Well, obviously Johnny Walker Black regresses much more to the mean than does Johnny Walker Blue and clearly Johnny Walker Red is the inferior performer to both.

OK, leaving the Harry Caray liquid diet plan.

Tangotiger, I can't answer your question. There is a formula (or series of formulae?) which can describe the mathematical concept of regressing to a population mean where that mean and the standard deviation therefrom is known, there is an observed sample such as in the example you set out. I don't know it and am unwilling to dig through college text books that have been gathering dust since Fernando Valenzuela was the best pitcher in baseball (or to carry the original idiom forward, since well before my most recent bottle of Glenlivet purple was laid down in oaken casks.) But the concept of regression as it applies to baseball players and teams is not new.

I don't discount the importance of regression. Rather I point out that it has been observed and intuitively understood for generations. Further, the saber-community has used it for decades, including James' "Plexiglass Principle" applied to teams. Also, the Brock2 system released in 1985 used a series of interlocking formulae to shape a player's projection based both on that player's own performance AND on league-wide norms. I'm willing to bet that James wasn't the first to publish articles or books which implicitly or explicity account for regression; just those two are the first examples that pop into my head.

Posted 3:31 p.m., August 19, 2003 (#15) - tangotiger
Rally, yup it would have to be under .380. And you are right, the second part, the degree of movement would be dependent on the number of PAs.

For 600 PAs, I'd guess a 30% regression on OBA, or .368. For 50 PAs, I'd guess a 80% regression, or .348. Just some guesses, and a little ingenuity really. Each component has its own regression factor based on # of PAs.

Posted 3:36 p.m., August 19, 2003 (#16) - tangotiger
I agree that James did do a good job of it. But, I'm not so sure he understood WHY he was doing it, or do I think the stat fan really understood the implication of all this. Even fans on this site will quote you the 11-20 shawn green has against Bacardi Rum, and that this means something of any significance.

James also "invented" replacement level, so I'm not sure what Gary H was talking about with Woolner's advances on that. Comparing the two, James inventing replacement level and regression towards the mean, and the extra knowledge that Woolner has added, and the extra knowledge that we've all come to know about regression from the statisticians around, I think the regression issue had a bigger advance.

Anyway, sorry for manipulating the topic. Any other advances, or advances-to-come?

Posted 6:28 p.m., August 19, 2003 (#17) - Scoriano
Any other advances, or advances-to-come?

Practical hydrogen fuel cells will replace the standard transmission and delivery power grid as energy is relocalized as in the pre-industrial state of affairs.

On the baseball front, I imagine efforts will be made to marry the improved statistical insights with the scouting and player development efforts in Hispanic jurisdictions. If the statistical insights are seen as valuable, then creating a regime in foreign countries that will allow those insights, and not only traditional scouting, to be applied to players would be a great advance.

Posted 6:39 p.m., August 19, 2003 (#18) - David Manning
Could someone here at Primer just state the policy that you're not only not going to link to BP articles, free or not, you're not even going to name those guys unless you can bash them or get free publicity? It's as disgraceful as the Akbar and Piazza posts. In a thread that regularly cites BP's people and stats, it's disingenuous not to credit them.

And as non-sabermetric as Carroll's stuff is on the injury side, his bringing it out for open discussion has to be considered a pretty positive advance. Whoever figures out how to do it objectively will be the Bill James of the field while Carroll will probably be remembered only by the purists and Pete Rose fans.

Posted 6:51 p.m., August 19, 2003 (#19) - Patriot
Yeah, mentioning what somebody said and then using that as the starting point for a discussion is really disgraceful.

Posted 6:52 p.m., August 19, 2003 (#20) - Scoriano
Could someone here at Primer just state the policy that you're not only not going to link to BP articles, free or not,

Primer did so a few times last week and has many other times done so. The links to the breaking of the Rose story seemed to me to be intended to disseminate potentially important news broken by BP.

Gary H and Woolner are credited by Tango in the lead in and/or thread.

This thread is not intended to link to the article as news but to start a discussion of an idea that Tango thought to put out for further doscussion and not to become a per se critique of the article. I think that is fair.

Also, what stats are referred to that are BPs that are not credited?

I am not sure what it is that seems to be bugging you.

Posted 8:50 p.m., August 19, 2003 (#21) - Tangotiger
I get these insulting messages every now and then, so I suppose I should address them every now and then.

Off the top of my head, I linked to a BP article regarding sample size and the Redsox and BIP and I praised them for taking the time to take a full paragraph to explain the limitations.

Nate had an excellent piece on replacement level, and I praised that too.

Sheehan had an article on pitcher workloads, and I can't remember what I said, but I was I think complimentary.

So, that was probably in the last 2 or 3 months.

Over at Clutch, I linked to the Jack Morris reprint article of Joe Sheehan. I remember another link to Craig Biggio, and then another to Andruw Jones too.

I'm pretty sure I had a direct link back to the article.

Oh yeah, I think I linked to a Randy Johnson quote that Ryan Wilkins had, but I don't remember if I did link to them.

I've made I think 2 links to Keith Scherer on balls and strike counts, though I distinctly remember at least one.

Ok, so I didn't put a direct link in the title to Brook Keisheickiek, but I did reference it in my comment, and it was not even germane to BP. I just liked the idea of dual players. And same here. I just like the question that the BP reader had. Do I have to link to the whole chat?

And finally, I make some tongue in cheek comments about "unnamed authors" at BP with their TP series (no bylines). So, I intentionally didn't name Gary in my opening piece as a play on that. But,who cares anyway?

Now, is that satisfactory to you?

In terms of BP and Primer referencing each other, I have to say that there are 100 BP links from Primer to every 1 Primer link from BP. If you want to ask about policy, don't ask me.

Now that you've p-ssed me off, and taken 10 minutes out of my life, I'm going to play with my kid now. Why do I waste my time? Sheesh...

Posted 10:10 p.m., August 19, 2003 (#22) - Scoriano
Tango, don't waste your time. You'll get a much better return on investment of your time and effort from your kid, and the silliness of these criticisms will be all the more apparent from just one junior primates smile.

Posted 12:32 a.m., August 20, 2003 (#23) - Ed
I think we have a conflation of concepts here. The question, as originally posed, was:

"Let me ask a question then: all you know is the following 2 bits of information
- Johnny Walker has an OBA of .380, with 600 PA
- the league OBA is .340

Now, we are not all being consistent in how we are using the term "true talent level." I am using it in the traditional frequentist sense. Call JW's true talent level T. I assume the 600 PA are a random sample of PA for JW. From that sample, we calculate an estimate of T, T*. Sampling theory tells us that the sample mean is an unbiased estimator of the population mean. That is, your best guess of T is T*. It is not T* plus or minus some value, depending on whether T* for JW is above or below the average OBA of a whole bunch of players. When we calculate T*, we can also calculate SE(T*), the standard error of T*, which reflects our level of uncertainty. T* is our best guess for T, whether the number of PA is 600 or 100. The standard error will be much larger for the latter, of course.

But what about the league average? Let's call it MLB-T to distinguish it from JW-T. Everyone wants to factor that in for our guess for JW-T, but unless one is willing to make a set of complicated dependence assumptions wrt the connection between the random draws for JW-T and MLB-T, MLB-T is irrelevant, given the question as posed, unless one wanted to go a bayesian route and make some auxilary assumptions. One could assume that the league OBA is always around .340, and treat the problem as an updating problem. We start with a prior for any given player, JW included, of .340. Then we update our beliefs about a player's ability (but not "true ability", which doesn't work very well in a bayesian framework) as information (PAs) comes in. At the end of the 600 PAs at a .380 level, we will have updated our beliefs to be somewhere between .380 and .340. With more PAs at .380, we will get closer to .380. And this is not regression to the mean. It's just a way to work MLB-T into the calculations.

I think people want to work in regression to the mean because of the well known regression effect (usually credited to Galton) that leads to declines above the mean and increases below the mean, given non-perfect correlations. If you want to forecast OBA(t+1) from OBA(t), the regression forecasts for players above the mean at t will tend toward the mean (on average) and players below the mean at t will have forecasts that will tend toward the mean. But the forecasts for t+1 are not the same thing as the best guess of the true values for individual players. How could they be? If there is measurement error or other random errors that lead us to believe that one set of 600 PA is only an estimate of a player's true value, we should believe the same thing about any set of 600 PA, right?

To sum up a windy post, I agree with what people are saying when they say that our forecasts will be somewhere between .340 and .380 (because of the regression effect). I disagree that that is the same thing as making a best guess as to JW-T, given the way the question was posed. For that, I'll lean on sampling theory and feel quite safe.

Posted 10:13 a.m., August 20, 2003 (#24) - tangotiger
But the forecasts for t+1 are not the same thing as the best guess of the true values for individual players. How could they be? If there is measurement error or other random errors that lead us to believe that one set of 600 PA is only an estimate of a player's true value, we should believe the same thing about any set of 600 PA, right?

Actually, the way I worded the question, by saying "1 million PAs", was a way to say "don't worry about random variations that will exist in all PAs, as they will cancel out, at least to the 3 significant digits after 1 million PAs." I actually don't know if 1 million PAs will give me something as less than +/-.0005 99.999% of the time, but whatever "1 million" should be, that's what I meant. I should have used the safe Austin Powers "100 billion".

Even if you knew that his true OBA was .368, that doesn't mean that in the next 600 PAs he will perform at .368, just that his performance will center around that, with some distribution around it. Just like flipping a coin.

Anyway, I'm using "true rate" and "future estimated rate, with five 9s, within .0005" interchangeably, even though, technically that is inaccurate.

Posted 11:20 a.m., August 20, 2003 (#25) - Jim R
Tango,

Maybe I'm missing something because there are other factors that you and the more learned posters here are using, but I'm not sure I understand how regression to the mean would be a recent advance either.

Let me take your two examples:

Little Johnny Walker: I wouldn't have a clue based on this information alone as to what his talent level would be.

Johnny Walker: I still wouldn't venture to guess, but my best guess would be .340+ in that context.

The reason that I point this out is that I would offer the evaluation in this context is not based on simple regression to the mean. True, in large sets of data, the data will typically take on a normal distribution, but I don't see the applicability in this concept for individual or isolated players. Either Johnny Walker may be underperforming their true talent level with the .380 MNP and we have no empiracle method of determining this from that data alone.

What I have seen many of you talented researchers do is look for strongly correlated points of data that show consistent indicia of talent. That is, attributes that are not likely to wildly fluctuate from year to year.

For example, you could give me and A-Rod four at bats against an A-ball pitcher, with these results:

Arod 1: Liner caught on the warning track.
Arod 2: Dinger
Arod 3: Missle right at a deep playing center fielder.
Arod 4: Popup.

Me 1: The Whiff
Me 2: The whiff
Me 3: squibber the roles down the 3b line and goes for an IF single
Me 4: The Whiff

This is a scenario that would not astond probability. We would both have an OBA .250, the league average could be .330, and you still aren't able to determine that A-Rod has a talent way above .330 and I have a talent way below .330.

Replace me in the extreme example with Tony Womack for 60 AB. Replace Womack with Furcal in 800 AB, and you still end up with the same scenario.

Now what you may be able to do is find attributes that are constant despite a small sample. For instance in the me and A-rod example, you may be able to determine, bat speed, approximate exit velocity of the baseball, and overall athleticism.

In the Womack example, you may also use (1) and (2), but they will be pretty close in (3) for just an observer without other instrumentation. Also this skill may reach an apex without the use of additional instrumentation. You may add other observation variables, such as ability to make contact on different pitches, swings at good and bad pitches, etc..

In the Furcal example, its entirely possible that my (1) (2) and (3) aren't noticeable without instrumentation. (I know ARod hits it harder but bear with me.) Some of the Womack observation criteria also start to break down, but some may be able to let you make a meaningful determination. Also at this point, it becomes a little more difficult to observe and remember, so you add instrumentation and start recording results. At this level, you can probably take away meaningful differences by using things such as slugging and walk rate.

In essence, the more data you have the more able you are able to discern the true ability from past performance. Each time, your data set increases, you start to include measures that are relatively constant within those intervals.

Nevertheless, regression to the mean does not help you better predict any single player. If you give me a data set that includes all players who overperformed the mean for a specific interval, I can guess for each player that their performance will drop. In doing this I would be right more times than I am wrong. If I can make bets or gain credibility by doing this, I would derive income and prestige.
Yet, using a regression to the mean would not help me with any given player. I will be right more times than I am wrong if I guess their performance will drop, but it doesn't help me in any way determine what that players true ability level would be.
Baseball performance is entirely different than running a simulation on a widget. When I run a simulation on the widget, I know that its true performance level is constant when I begin the exercise. Each subsequent rerun of the simulation can allow me to determine a different mean, and a level of confidence on this being the true performance level.
A single baseball player is different, because (1) their true performance level will differ over time with variables that are difficult to account for and (2) the basis of comparison are other baseball players who themselves have differing levels of a true performance level.

For instance, if you pose the following hypothetical (which is closer to Brady Anderson):

Johnny Walker Blue in 600 AB- .380 OBA
Johnny Walker Blue career in 1800 AB - .360

The model is closer to where you can use regression to the mean to determine Johnny Walker blue's performance in his next 600 AB.

Correct me if I'm missing something.

Posted 12:00 p.m., August 20, 2003 (#26) - tangotiger
I don't think you are wrong on anything.

In essence, the more data you have the more able you are able to discern the true ability from past performance.

I agree with this statement especially.

And for the regression towards the mean, I agree that the larger your group, the better you will be able to estimate the group's true mean by applying the appropriate regression factor, but as the group gets smaller and smaller (down to a group of 1), your confidence in that regression becomes smaller and smaller. While you may guess that that single player's OBA would be .368, that's really just an average estimate that is centered around .368, that could be between .310 and .440, with various probability rates for each point. In fact, even that exact point (.3680000000) is impossible.

The likelihood is that JW IS a below .380 true hitter, but I might only be [insert appropriate number here]% sure of that.

And like you mentioned, the more data you have (whether more "n", or more description of each "n"), the more reliable your estimate is possible.

Posted 12:08 p.m., August 21, 2003 (#27) - studes (homepage)
I think Ed raises a good point here. "Regression to the mean" isn't exactly what's going on, unless you know the player's true mean, right? Without it, we're using the league mean to estimate the player's true mean. And that's one step removed from regression to the true mean.

The more information we have about a player (ie. the longer he has played in the majors), the more we can estimate his true mean, particulary if we adjust for age. Barry Bonds is the example I'm thinking of here: I'm guessing that his current season shows some regression to his own true, age-adjusted mean, rather than the overall league mean.

I do think that UZR and DIPS are the biggest, most recent advancements in sabermetrics, and the most important underlying "technological" trend in the area is wider access to and use of pbp data. Tango, I also think that your leveraged index is a great advancement.

And I want to echo dlf's point: I'd like to see more analyses done within team contexts. I think tango and mgl intend to get there with slwts.

Posted 12:41 p.m., August 21, 2003 (#28) - tangotiger
A player will regress 100% towards his true mean.

A group of players will regress [insert number]% toward the population mean, from which they were drawn.

Regression towards the mean is the second case, and not the first.

We are trying to estimate the group mean as best we can by looking at the observed mean of the group, the observed mean of the population, and regressing a certian amount based on other factors (correlation between the two samples of the population).

If we already knew the individual player's true mean, we wouldn't need regression towards the population mean. We already know his true mean.

Posted 12:59 p.m., August 21, 2003 (#29) - studes (homepage)
Thanks, Tango. I really should go back to school. Just because I was last there 20 years ago is no excuse!

Posted 7:32 p.m., August 21, 2003 (#30) - David Smyth
"A player will regress 100% towards his true mean."

What does this mean? The way it is worded, it seems silly. Bonds has regressed from 2001/2002, but who is to say that his 2003 performance is his true mean? I assume that I am misunderstanding that statement, but it needs to be clarified.

Posted 11:01 p.m., August 21, 2003 (#31) - Tangotiger
His 2003 performance is a sample of his true mean.

We cannot know ever a player's true mean. We jsut know that every single day he is performing at his true talent level, which differs day-to-day, based on his conditioning, state of mind, etc.

I was just pointing out that regression is towards the population mean.

There's no need to think of regression towards his own mean, since, by definition, he will always play to his own mean.

Posted 7:25 a.m., August 22, 2003 (#32) - studes (homepage)
Did you see that Will Carroll mentioned "regression to the mean" in his August 21 column? Here's the quote:

"Luck--or whatever you want to call it--tends to even out. Perhaps injuries have a regression to the mean, but the explanation is secondary to the result."

I'm not exactly sure what he means, but it's nice to see the intent.

Posted 7:38 a.m., August 22, 2003 (#33) - Tangotiger
To answer David directly, yes it was silly of me to say that you regress 100% to your own mean, and I probably made that more confusing that it was. My post 31 hopefully makes that clearer.

As for the Carroll statement, I'm reading and reading it, and I'm not sure what he's after there. That there's a large luck component to getting injured, and that other than your personal history and maybe position, there's not much more to it than that? Probably.

Posted 8:32 a.m., August 22, 2003 (#34) - Ed
"A player will regress 100% towards his true mean."

I know what you, um, mean, Tango, but I think a better way to put this to avoid confusion is to say that in the long run, players converge to their true abilities. This can be justified by the law of large numbers, or what some people call the law of averages. It also provides the underpinning for estimating a population mean with a sample mean.

Our best guess for a player with 9791 PA, an OBA of .482 during a period of LgOBA of .356, say, is that his true OBA is .482, not a little less than .482. For these types of problems, think convergence, not regression.

Posted 9:59 a.m., August 22, 2003 (#35) - tangotiger
9791 PA is not enough, and our best guess is that his .482 OBA IS a little less, probably regressed 5 to 10% towards the population mean, or .470 to .475 or so.

A player will always play to his true mean for every play, and this mean will be different play to play. As his sample number of plays approaches infinity, his average performance level in those plays will approach his average true talent level over that time span of plays.

So, I should never have said the "100% towards his own mean". I just meant the above paragraph.

So, a player himself does not regress towards the population mean, but we regress his sample performance towards the population mean to infer as a best guess what his average true talent level was over that time span.

Posted 2:17 p.m., August 22, 2003 (#36) - Rob Wood
Yes, this issue is confusing due to the different approaches associated with classical statistics versus bayesian statistics. Ed's post #23 does a good job describing the different approaches.

I have always taken "regression to the mean" to be related to the bayesian updating approach. That is, updating our best guess as to the player's true ability level, taking into account league average, the player's observed average, and his number of plate appearances.

Lastly, I think that 10,000 plate appearances has got to be enough sample for us to be pretty darned sure of the player's true ability. I cannot believe that even with 10,000 PAs, we would need to regress 5 or 10% to the league average. I'll try to dig up the standard updating formulas in this case to see what the regression pcts are at different sample sizes.

Posted 4:03 p.m., August 22, 2003 (#37) - tangotiger
The regression factor would be different for various events. For example, with OBA, the year-to-year r might be .70 for 600 PA, and therefore, you would regress 30% towards the mean. But, the year-to-year r might be .50 for BA, and so you regress 50% towards the mean. Each metric has its own regression factor.

My guess is that at 10,000 PAs, the OBA needs to be regressed between 3 and 10%, while the BA needs to be regressed between 5 and 15%, towards the population mean.

Rob, I'm looking forward to your results.

Posted 4:12 p.m., August 22, 2003 (#38) - Jim R
"A player will always play to his true mean for every play, and this mean will be different play to play. As his sample number of plays approaches infinity, his average performance level in those plays will approach his average true talent level over that time span of plays."

I was with everyone until this clarification by Tango. So, I'll do what I always do, when I'm confused--I'll try to impose my own language and see if you guys will tell me where I'm wrong.

First, I think we have two different means that we are talking about

(1) Performance Mean - This is what it is. This is a player's mean performance in some category over some interval. You can adjust the performance mean for other factors, but the performance is the same. By definition, this is a straight line, backward looking measure and thus it does not regress. We can hypothesize that it will regress in the future (more on this later). When tango is saying "...100% ..." we are referering to performance mean that has an interval of 1. Its reasonable to presume through documented evidence that the performance statistics of all major league baseball players take on a normal distribution in the interval of one season after performance statistics are adjusted for the most egrgious factors affecting our performance instrumentation (e.g. park).

(2) Ability Mean - This is either a hypothesis/theorem/law (herafter hypothesis), and is the primary tool we use for making projections. We hypothesize that a player has a true talent level within any interval. We hypothesize that the number of variables that affect ability mean are not so large or egregious as to make the ability mean of all major league players within a single season not take on a normal distribution level.
By definition ability mean can include an interval for future events. We do not have the instrumentation to measure ability mean, but pursuant to the law of large numbers, we hypothesize that ability mean and performance mean converge for a single set of data when the interval becomes large enough. While it is true that a player will have the same ability mean for any interval (not necessarily 1), we are not able to observe what this mean may be.
Because of our previous hypothesis, we can presume that ability mean and performance mean for all major league players over the course of a season are equal within a tolerable degree of error.
However, we hypothesize that the data set for a single player in one major league season is not large enough where ability mean and performance mean converge.
Rob hypothesizes that 9971 AB is an adequate sample where ability mean and performance mean will converge for a single player. Tango says that it is not large enough to make this judgement.
If we pick a random player with no performance data and do not have any other information about their performance level, we can say with x degree of confidence that this player's ability mean will match the league ability mean, which equals the league performance mean. Thus we can project his performance mean to be equal to the league performance mean.
If we can pick a random player with some performance data, we can say with y degree of confidence that his ability mean is the same as his performance mean. Yet because the numbers are too small, the y is usually going to be too small for us to have any degree of confidence.
Instead, we can start with a guess of x degree of confidence that his ability mean=league ability mean=league performance mean. We can use his performance measures to adjust our hypothesized ability mean with actual performance numbers. As we do this, we will converge on a number within a tolerable degree of confidence of the player's ability mean.
So the only remaining questions would be:

(1) At what number of events does the performance mean converge on the ability mean. It seems we are in agreement that it is greater than 600 but less than 450,000.
(2) When/How do we adjust for the time dependent variables in each mean (age, era, etc.)

Posted 6:51 p.m., August 22, 2003 (#39) - Tangotiger
We regress a player's sample performance mean towards the his population mean as an estimate of his true talent mean, such that we minimize the error for the group.

A player's sample performance was done at discrete points over a certain amount of time (days, years, etc). His true talent level at those discrete points was not constant (since he is human... except maybe Bonds).

So, seeing that you know how he performed, seeing how you can account for his context, and seeing how the population (i.e., the average player) does in that context, you're now ready to regress a group of player similar to your player a certain amount towards the mean.

*****
A true .400 player, over 10,000 PAs will have a stdev of .005. So, that true known .400 player will be between .390 and .410 95% of the time over 10,000 PAs (hopefully I'm doing this right, going from true to observed and not the other way around.... been a long day for me too).

Get 100 times more PA, and your factor multiplies by 10, and so that .005 becomes .0005. So, at 1 million PAs, your true talent and your performance levels converge (at +/- .0005 95% of the time).

I think.

Posted 10:39 p.m., August 22, 2003 (#40) - Tangotiger
To confirm the above numbers, I ran a sim, where I gave my true .400 OBA player 10,000 PAs each season for 600 seasons. The standard deviation was .00495.

Our expectation was sqrt(.4*.6/10000) = .00490

So, 10,000 PAs is not enough to say that performance ~ true talent.

Posted 12:10 p.m., August 25, 2003 (#41) - Sylvain(e-mail)
So, back from holiday, on top of my head:

Major improvements (and a great thanks to those who contributed):
DIPS, LI, PAP and pitch counts, Replacement Level; as the DIPS Solved thread showed, better "usage" or "understanding" (from me at least) of stats (regression to the mean...) (which implies I'll have to go find back my college books); pbp data usage...

Future improvements (on top of my dreams):
Linking PAP and biomechanics (delivery type, pitch thrown...), DIPS and pitch by pitch data, catcher's influence on the game, protection and hustle, and who deserves the ROY award? (I hop I'll be among those who participate)

Sylvain

Posted 3:43 p.m., August 25, 2003 (#42) - Robert Dudek
Base Runs, Leveraged Index, DIPS, Win Shares.

One question, Tango. Doesn't it make more sense to regress to an individual player's established norms rather than the league average?

E.g. Suppose Sammy Sosa's HR/AB rates are

year 1: 600 AB, 8% HR
year 2: 650 AB, 7 % HR
year 3: 630 AB, 7.5% HR

year 4: 500 AB, 9% HR

Let's assume, for the sake of simplicity, that the league average for the period as a whole is 3% (homeruns) of ABs. The evidence suggests that Sosa's "true" homerun rate for year 4 is much greater than 3%. Regressing to something like 7.5% would make more sense, wouldn't it?

Posted 3:59 p.m., August 25, 2003 (#43) - David Smyth
You're not supposed to regress "to" the mean; you're supposed to regress "towards" it.

Posted 4:01 p.m., August 25, 2003 (#44) - Robert Dudek
I guess the question I'm asking is: why use league average performance as representative of the "population a player was drawn from"? Shouldn't we form a sub-group, say power hitting outfielders, as more representative of the population Sammy Sosa was drawn from?

Posted 4:54 p.m., August 25, 2003 (#45) - tangotiger
Yes! You can use anything you want really, as long as you specify your criteria not based on the numbers you are measuring. That is, is Sosa a power hitter because the variable you are studying says so about him?

But, yes, you can select RF who swing hard and are bulky and make that your represenatative population, and draw Sammy from there.

Posted 11:52 p.m., August 25, 2003 (#46) - Rob Wood
I programmed the bayesian updating formula and here are the results.

At bats Pct to regress toward league average (prior mean)
500 38.9%
1000 24.1%
2000 13.7%
3000 9.6%
4000 7.4%
5000 6.0%
6000 5.0%
7000 4.3%
8000 3.8%
9000 3.4%
10000 3.1%

Hope this helps.

Posted 11:52 p.m., August 25, 2003 (#47) - Rob Wood
I programmed the bayesian updating formula and here are the results.

At bats Pct to regress toward league average (prior mean)
500 38.9%
1000 24.1%
2000 13.7%
3000 9.6%
4000 7.4%
5000 6.0%
6000 5.0%
7000 4.3%
8000 3.8%
9000 3.4%
10000 3.1%

Hope this helps.

Posted 7:50 a.m., August 26, 2003 (#48) - Hantheman
of course, most of the answers re: Walker's sample OBA of .380 are assuming classic textbook statistics, that we know NOTHING other than the resuts of the samll sample. In reality, most of the time there is much more a priori knowledge of a hitter, so that reaonsbale estimates looking forward will combine the a priori knowledge (which might say something like based on his colege record, his KO rate, his power, his speed, etc, his OBA projects to be .360 +or- .060.) with the sample OBA=.380, 50 PA, sample stdev = .070. As the sample PA go up, the a priori knowledge becomes less relevant. By 10000 PA, it is washed out.

Posted 10:12 a.m., August 26, 2003 (#49) - tangotiger
Rob, great stuff! I can confirm that between 500 and 2000 PA, those results are inline with empirical results of year-to-year r, with the regression towards the mean being 1-r. Great!

Posted 11:04 a.m., August 26, 2003 (#50) - Casey Jones
Do the regression towards the mean figures being provided only apply to OBA? What about SLG?

Posted 11:20 a.m., August 26, 2003 (#51) - tangotiger
I think they only apply to OBA, if OBA is dependent only on the batter's skills. Therefore, I think you have to regress a little more towards the mean with OBA, alot more with BABIP, etc, etc.

For SLG, it's more complicated. YOu have to have successes/trials, and SLG won't work that way.