Tango on Baseball Archives

MGL takes on the Neyer challenge (January 13, 2004)

This article is by MGL.

Rob Neyer recently wrote an article on whether a player’s (or team’s) quality of competition (QOC) was important or not. In the article, he wondered aloud, how much of an effect on a player’s stats does the average quality of that player’s opponents have. On the BP web site, they give us a list of Batter’s Quality of Pitcher’s Faced. As you will see, in my opinion, the words quality of pitchers faced is a bit misleading, unless it is interpreted to mean sample of the quality of pitcher’s faced.

Not one to turn down a (Rob’s) challenge, I did some work to establish how much every player last year was helped or hurt by the true quality of his competition (the average opposing pitcher for batters and the average opposing hitter for pitchers). There are two ways to do a true QOC adjustment. One is the right way and the other is the wrong way. The way that BP did it is the wrong way.

The wrong way is to look at the performance of a player's average opponent in the same time frame as that which you want to adjust, or any particular time frame for that matter. By that I mean if you did it that way, you would adjust a player’s 2003 stats according to the 2003 stats of his average opponent. That is wrong. That is what BP did. Since you are trying to adjust a player's stats according to the true quality of his opponents, you have to translate his opponents' sample (one-year) stats into true stats.

To illustrate how "wrong" the BP method is and what kind of erroneous results it could lead to, imagine that all pitchers were of the same quality. Of course, their one-year stats will be different, and in fact, will give the illusion that they were NOT of the same quality. If you adjusted each batter's stats by the sample stats of these same-quality pitchers, you will be doing erroneous adjustments, since each batter actually faced pitchers of equal quality, and therefore should not be adjusted at all. In fact, some of those batters will have faced a sample of pitchers who had very extreme stats (by luck alone, since we are saying that all pitchers are of the same quality), and thus their stats will be adjusted a lot, making for some very erroneous results.

Basically, what happens when you do it the "wrong" way is that all players are getting adjusted for QOC a lot more than they should be, since the variation in sample quality among a player's opponents is much larger than the variation in their true quality.

The correct way to do the QOC adjustments, as with park adjustments, is to take a player's opponents separately, and for each opponent, use an estimate of his true quality (not his sample quality for that year only), and then average all of the opponents' true qualities, weighted by how many times that player faced each opponent of course. This will yield much different (more conservative) results than the "wrong" method, and is the only (or at least the most) correct way to do it.

The way I estimated the true quality of a player's opponents (basically every player in the league) is much like how you would do a projection (since true quality, i.e. ability, is essentially the same thing as a projection - the only difference is that a projection sometimes includes a context and sometimes refers to a particular time-frame and hence a particular player age). I take each player's last 3 year's stats and average them using a 5/4/3 weighting for batters (each year is worth around 25% more than the previous year) and a 5/3/2 weighting for pitchers (each year is worth 50% more than the previous year). After doing a weighted average of the last 3 years (2001-2003 in this case), I then regressed each component stat toward a certain constant, according to a "chart" of appropriate regression percentages and constants (the constants are essentially the mean values of all similar players). (See my Primate article on regressing batter and pitcher component stats.) How much each stat gets regressed depends on two things of course: one, the ratio of luck/skill in the particular component, and the size of the sample (the number of PA's that the player had in those 3 years). When I refer to a component, I mean a component rate stat, like BB's per PA or HR's per BIP.

I realize that it is somewhat complicated and confusing, so suffice it to say that I first made a “list" of every player's true quality. The "list" consisted of every player's true stat line (s, d, t, hr, bb, and so, all per PA). It is these stats, and not their actual one-year (2003) stats, that were used for the QOC adjustments. So in the list below, when you see that "so-and-so" pitcher faced a pool of batters with an average OPS of .767, or an average OPS+ of 102, that means that the average true quality of all the batters he faced was an OPS+ of 102. The actual (for that year) average OPS+ of that pitcher's competition was probably a lot higher, but we don't care what it was - we only care what the true quality of that competition is/was.

Here are the 10 pitchers and batters who faced the best/worst competition last year. To qualify for the list, a player had to bat at least 300 times or pitch to at least 300 batters. Keep in mind that no adjustments were made for the average handedness of a player's opponents - only overall true quality of those opponents. For batters, if you really want to compare one to another on a level playing field, technically, you should also adjust for opponents' handedness. An extreme example would be a platoon player, who has the advantage of batting mostly against opposite side pitchers. His stats are going to be biased (in his favor) as compared to a player who bats against everyone. Again, my numbers here do not include any adjustments for opponents' handedness.

For pitchers, it is probably unfair to adjust for opponent's handedness. A pitcher’s overall quality should include the types of batters that managers choose to bat against them. For example, Randy Johnson should not be penalized (adjusted) for the fact that almost no LHB's are used against him. Basically, for the most part then, it is not appropriate to adjust for opponent handedness when it comes to pitchers. There are some exceptions. For example, if you really want to know the true overall quality of a LOOGY, you might want to adjust his stats for the fact that he faces more than his fair share of LHB's. Then again, the practical value of a LOOGY comes from the fact that he is mostly brought in to face LHB's. On the other hand, if you want o know how he might fare as a regular reliever, you might want to adjust his sample stats for opponent handedness before attempting to assess his true value.

Anyway, without any further ado, here are the players and numbers promised:

Batters vs. best pitchers in 2003

Name, PA, average lwts of opp (per 500 PA), average normalized ERC of opp (4.00=league average)

D. Rolls, 404, -1.9, 3.86
C. Everett, 313, -1.9, 3.86
L .Bigbie, 319, -1.8, 3.86
B.J. Surhoff, 354, -1.7, 3.87
T. Lee, 613, -1.7, 3.87
J. Lugo, 482, -1.7, 3.87
J. Gonzalez, 346, -1.7, 3.87
L. Matos, 485, -1.5, 3.89
S. Halter, 393, -1.4, 3.89
M. Anderson, 535, -1.4, 3.89

These batters were actually better in 2003 than their stats suggest.

Interestingly, every player in the top 12 is on Bal, Det, TB, or Tex. The reason of course is that most of those teams are in the AL east which had the toughest pitching (Bos, NYY, and Tor), and all those teams did NOT have great pitching themselves, especially Texas. You might infer then that much of the QOC adjustment "comes from" not playing against your own team AND from the imbalanced schedule.

Most importantly note how little the effect of QOC is once you do it the "right" way. These are the players in the AL and NL who faced the best pool of pitchers and yet the most they were affected was by less than 2 lwt runs per 500 PA. Once you see the top 10 batters who faced the worst pitchers (below), you might conclude that QOC for batters in one season is worth around plus or minus 2.5 runs per 500 PA, which is not much at all.

Batters versus the worst pitchers in 2003

S. Stewart (min), 304, 2.5, 4.19
C. Patterson, 347, 2.3, 4.18
M. Ensberg, 442, 2.3, 4.18
M. Lawton, 429, 2.3, 4.18
J. Bagwell, 702, 1.9, 4.14
J. Uribe, 343, 1.9, 4.14
B. Phillips, 393, 1.9, 4.14
E. Karros, 365, 1.8, 4.14
B. Ausmus, 509, 1.8, 4.14
L. Berkman,, 658, 1.8, 4.14
L. Rivas, 521, 1.8, 4.14

These players were worse than their 2003 stats suggest.

Most of these players are from Hou, Min, and the Cubs - Hou and the Cubs because of their good pitching and Min because of the poor pitching in the rest of their division.

Notice also that most of the players on the lists have relatively few PA's. That is because the fewer PA's a player has, the more he is likely to have faced a biased opposition. Once you get into several years of data for a player (a high number of PA's), the more these QOC adjustments will shrink, even with imbalanced schedules; even more of a reason why QOC is not very important and hardly worth the effort.

Pitchers who faced the best batters in 2003

Name, PA, lwts per 500 PA, effect on ERA

E. Dubois, 305, 3.1, .22
K. Jarvis, 413, 2.8, .20
R. Lopez, 663, 2.8, .20
V. Zambrano, 836, 2.7, .19
C. Vargas, 491, 2.7, .19
F. Heredia, 303, 2.6, .19
J. Johnson, 858, 2.6, .19
C. Lewis, 594, 2.1, .15
J. Gonzalez, 668, 1.9, .14
J. Haynes, 448, 1.8, .13

These pitchers performed better in 2003 than their stats suggest. The last column is the amount in runs that QOC affected their ERA.

Pitchers who faced the worst hitters in 2003

R. Stone, 350, -2.9, .21
R. Johnson, 489, -2.8, .20
J. Santana, 644, -2.8, .20
J. Speir, 319, -2.7, .19
O. Daal, 434, -2.7, .19
E. Loaiza, 922, -2.7, .19
J. Nathan, 316, -2.6, .19
R. Harden, 324, -2.3, .16
K. Rogers, 851, -2.3, .16
J. Rincon, 370,. -2.2, .16

These pitchers pitched worse than their 2003 stats suggest.

Only one of these pitchers came out of the AL east, which had the great Yankee and Boston offenses, so again, you can see how much of an impact the imbalanced schedule does have.

You will also notice that the pitcher QOC's are more pronounced than the batter ones, even for a large number of PA's. That is mostly because true quality among hitters is much more variable than true quality among pitchers, at least to the extent that we can estimate it. It would be fair then to say that the effect of QOC on pitchers in one season is plus or minus .20 runs in ERA or so.

Overall, although it is an interesting exercise, you can see that adjusting for QOC, even with the new imbalanced schedule, is not of much value, and may not even be worth the effort. On the other hand, every little bit helps. For the record, QOC adjustments are included in the batting portion of my Super-lwts.

--posted by TangoTiger at 10:22 PM EDT

Posted 10:25 p.m., January 13, 2004 (#1) - Tangotiger
This appeared at Clutch Hits

Here's a quick way to figure out the impact of quality of competition.

Suppose that the true talent of your average opponent was a .536 record. With a 10-1 runs to wins, that's +.36 run differential, or +.18 on offense and +.18 on defense (I'll assume that your opponent was just as good on off as on def). Remember, this is just a rule of thumb.

If the average team scores 4.5 RPG, we see here the impact is 4% (4% x 4.5 = .18).

So, a hitter who came out with 80 RC was really 4% too high, or 3 RC. Someone with 125 RC was 5 runs too high.

I understand that you should look at how good the opposing pitchers were, and not the whole team (absense play-by-play).

But, this is basically the extent of the impact.

Posted 11:00 p.m., January 13, 2004 (#2) - Dackle
The impact could be double though if comparing a guy facing a .536 opponent with another guy facing a .464 opponent. It's a 36-point swing from .500, but 72 points when comparing them with eachother.
For teams last year I found Minnesota gained .043 in w% due to their opponents, and the Mets lost .043. That's a 14-game swing in the standings, which is huge. They were 23-1/2 games apart but the difference was more like 9-1/2 games. Mets had 72 games against the Braves, Marlins, Phils and Expos and another 12 against the Giants and Yankees. That's 84 games against tough teams which Minnesota only played 10 times. Twins had 38 games against the Indians and Tigers, and 31 games against the Devil Rays, Padres, Brewers, Rangers and Orioles. Total -- 69 games against weaklings that the Mets only had the luxury of facing 18 times.

Posted 12:30 a.m., January 14, 2004 (#3) - MGL
Again, and I know this is a hard concept for some people to swallow, when you adjust team records for "strength of opponent," as Dackle (and others) did, you CANNOT use the other team's actual records! You must use a regressed version of their records. In fact, you must take each team that a team played, one by one, and regress each of their records separately, and THEN average them all to get a team's strength of schedule.

I realize that everyone wants to and has done it the "old" way (it seems intuitively correct), but I can't emphasize enough that you cannot assume anything from the actual w/l records of a team's opponents. As in the explanation/example in my article, if all teams in baseball had the same talent (complete parity - Selig bite your tongue), you would be making a big mistake adjusting any team's record according to the actual records of their opponents! (Why should the Met's care about the records of teams they played if those records didn't mean anything, i.e., they were just statistical fluctuations?)

What if there were nearly parity in the league (i.e., all teams were between .490 and .510 in talent). Then we would still be making a mistake, albeit not as large, in adjusting teams' records according to the actual w/l records of their opponents, which might be .420, .610, or whatever. In oder to avoid those mistakes, we have to first figure out the actual parity in the league and then apply that parity to each team's w/l record in order to do the adjustments. The way to do that is to simply regress each team's w/l record to estimate it's true w/l record and then use these records for the opponent adjustments. Tango can easily tell us the correct regression percentage for a team's w/l record in a 162 game schedule.

I suppose in some philospohical sense (D. Smythe, you can stop reading now lest I bore you!), since the resultant adjusted w/l records of each team after doing the "strength of schedule (SOS)" adjustments are just samples of their true talent anyway (albeit "better" samples after the adjustment is done), you might as well do the SOS adjustments the old-fashioned way and then just regress the whole damn thing even more afterward if you want to know a team's real talent, but that's a different and complex story altogether....

Posted 2:42 a.m., January 14, 2004 (#4) - Michael (homepage)
Here's a nice little link that explains why you need to regress the talent level in general in an experiment. But as MGL gets around to you could do the SoS adjustments and then regress or you could regress and then do the SoS adjustment.

At the team level when estimating team quality and the effect of unbalanced schedule and SoS on team results I think it is most intuitive to approach both simultaneously (but that's just me). If you take a maximum likelihood approach you can say we know roughly what the distribution of team quality is like by looking at ML history and knowing what the rough distribution of quality for teams are (take the observed results of previous years teams and regress by the appropriate amount that is based on the year-to-year correlation between team wins). We can assign each team this probability mapping. Then for each given probability amount we say what is the likelihood that we'd observe the outcome (team a wins 84 games, team b wins 96 games, etc.) given the (true talent of team a is X, team b is Y, etc.). Then we take this amount for all possible (true talent of team a X, team b Y, etc.) and multiply it by the probability that you'd get a team a of X quality and a team b of Y quality etc. which was based on our original probability function. Ok, my explanation is not the clearest so let's use a simplified example:

Let's suppose there were only three types of people in the world, those with talent .6, .5, or .4. Further let's suppose that 10% of people are .6 and .4 respectively while 80% are .5.

Now 4 teams A, B, C, D play each other 16 times but A plays B 10 times and C and D each 3 times (likewise B plays A 10 times and C and D 3 times each while C and D play each other 10 times each). At the end A has a 7-3, 1-2, 2-1 record respectively. B has a 3-7,1-2,1-2 record respectively. C has a 2-1,2-1,5-5 record (for games vs A, B, D respectively). D has a 1-2,2-1,5-5 record. Now for each of the 3^4-tuples of probability for true talent of A,B,C,D [from A=.4,B=.4,C=.4,D=.4;A=.4,B=.4,C=.4,D=.5;...;A=.6,B=.6,C=.6,D=.6] find the likelihood that we'd see the above described records (In otherwords when A=.4,B=.4,C=.4,D=.4 what is the probability that A vs. B is 7-3 AND A vs. C is 1-2 AND ... AND C vs. D is 5-5). Then take this likelihood and multiply it by the probability that A=.4,B=.4,C=.4,D=.4 by the likelihood of that a priori given the original distribution (.1^4 given my initial supposition). Then take all of these numbers and look for the most likely true talent levels.

Now in baseball we'd probably say there was a continuous distribution of talent rather than my 3 values so you'd be integrating the cdf once you'd multiplied the probability functions, but the idea would be the same.

Of course, if you are willing to use more information you might be willing to say that each baseball team isn't actually eqaully likely to fall anywhere on the MLB team talent level since we could use previous year's data to say that a team that won 100 games last year is not most likely to win 81 games next year and is more likely to win 101 games than 61 games. But that is just a minor tweak as you'd perform the same process you'd just have different initial a priori probability functions for each team that you'd use in the multiplication step.

Did anyone understand what I just wrote?

Posted 3:16 a.m., January 14, 2004 (#5) - MGL
Michael (Humphries?), yes that is the exact Bayesian approach that I have also attempted to describe (it ain't easy) many times here an on Fanhome. That is the rogorous and precise way to estimate a true value from a sample value if you know or can estimate the distribution of true talent in the population. I'm glad that someone else recognizes this method, as I was beginning to think I was nuts! Anyway, if you assume a somewhat normal (even with a skew) distribution of true talent (either among teams or players), you can do a shortcut of course (at least that is what Tango and others have said and done). That shortcut is to look at the observed variance in sample talent (e.g., variance of w/l records of all teams in a given year) and compare to what is expected (in variance) if everyone were of the same talent, and the difference in variance, I think, is atributed to the true talent distribution, or something like that! I'm not sure how if at all the skew of the true talent distribution afffects the validity of this "shortcut." I think that Tango would say that it is not a shortcut, but the real thing. But I think that is only the case if the true talent distibution were exactly normal (no skew), but I'm not sure.

As you said, or implied, since the true talent distribution is smooth and continuous, regardless of what the curve looks like (there is a finite chance that a team can have a true w/l record of anything), I think that the rogorous Bayesian method that you describe would have to use integral math (calculus), but I'm not sure as I am no mathematician, although once upon a time, many years ago, I majored in math in college (I switched to another major when the math started getting weird and the professors even weirder).

Michael, what a great frickin' article (the homepage link)! There is Tango's regression = 1-r formula! I've never seen that anywhere else, even in the many statistics books and websites I have consulted over the years. Also what a great explanation of "regression" in general! Should be must reading for all sabers and aspiring sabers! Thanks for the great link! I bookmarked it...

Posted 9:05 a.m., January 14, 2004 (#6) - David Smyth
---"(D. Smythe, you can stop reading now lest I bore you!)"

I said "boor", not bore. And to add a bit of fuel to the fire, it is "boorish" that you don't know how to spell my last name after a few years of seeing it most days. :)

Anyway, nice little article. Everyone remember--if you want to impress, you must regress!!

Posted 9:43 a.m., January 14, 2004 (#7) - tangotiger
MGL, further proof of your lack of memory... this is getting real old.

That article that Michael linked was the very very first article I ever linked to on "Regression Towards the Mean". And, yes, that's where I got the 1-r.

Posted 12:57 p.m., January 14, 2004 (#8) - FJM
If you want to do regression toward the mean on team W-L records, shouldn't you use different regression parameters for each margin of victory or defeat? For example, if a game decided by one run was replayed (or simulated) one hundred times, wouldn't you expect it to come out close to 50/50? Whereas a game decided by 10 or more runs would almost always result in the same winner, albeit generally by a smaller margin. That's the basis for the Pythagorean formula, and that's why it can be off significantly for teams that have very good or very bad records in close games.

Posted 1:10 p.m., January 14, 2004 (#9) - MGL
FJM, sure, and that is why I said (somewhere) that you are better off (going to be more accurate in your estimation of that team's true talent) not using a team's actual w/l record at all. As you say, you should use their pythag record! If you do, you need to use the approriate regression percentage for pythag records and not real records; the former will be smaller than the latter since a pythag record will correlate better with a pythag record (or a real record) in another time frame better than a real record/real record corr.

So you can do it either way and you should use the appropriate regressions. If you have a choice and the resources, pythag record is better. Good point!

Of course, you can get more granular than using a team's pythag record, which will also yield more accurate results. Use a team's total park adjusted offensive lwts and their lwts against for defense. Then compute their theoretical w/l record from that. That SHOULD be better than their pythag record. Regress each value first, or regress the final estimated w/l from the lwts.

Even better than that, take each indidvidual player's multi-year value (lwts, OPS, and lwts against or OPS against for pitchers, or whatever) regress each one accoriding to PA (like you are doing an OPS projection for each player), add all the players up, prorating by number of actual PA's in the year in question, and THAT is probably the best estimate yet! As the procedure gets more complicated, gotta be careful about not introducing too much error. It is a tradeoff and balance between rigor and accuracy of the results (getting closer to the team's true talent) and the possibility of tainting the results with all kinds of potential errors in the complex calculations....

Posted 1:11 p.m., January 14, 2004 (#10) - Dackle
What if the question you want to answer is not "What is the true value of this player/team's opponents" but rather "How much were this player/team's statistics displaced by the distribution of its opponents within a self-contained season?" In that case I'd use the old method. MGL, how would you calculate the "true value" of a team? Is it done the same way as players? Do you use 5-4-3 weights?

Posted 1:29 p.m., January 14, 2004 (#11) - tangotiger
If you're going to regress each game one at a time, independent of the other games, the regression will be virtually 100%.

If on the other hand you regress each game, but dependent on the team's identity, I don't think you'll get much further than just taking the actual RS/RA distributions of the teams, and coming up with the expected win%.

Essentially, the reason you want to take it one game at a time is for the "playing to the score" variable. The likelihood is that there's no such thing, so why bother doing it that way, and just use the scoring distributions of the teams in question.

And, even taking the scoring distributions of the teams in question, and assuming that both teams are 4.5 RPG teams, the result will not be that much different than assuming they have the same scoring distribution (I worked it out once, and at the very extremes, I think it worked out to a 4.5 RPG team against a 4.5 league might win .520 games with an extreme distribution).

So, you are essentially just back to using the mean RPG.

Take the team's mean RS and RA, regress it, and come up with the probability of winning, using the Tango Distribution.

Posted 4:17 p.m., January 14, 2004 (#12) - MGL
MGL, how would you calculate the "true value" of a team? Is it done the same way as players? Do you use 5-4-3 weights?

Dackle, you probably didn't see my post #9, when you posted #10. I think I answered that question.

What if the question you want to answer is not "What is the true value of this player/team's opponents" but rather "How much were this player/team's statistics displaced by the distribution of its opponents within a self-contained season?" In that case I'd use the old method.

Let's say that a team (or player), team A played against another team, team B, once (let's say) that had an overall w/l of .600 in a 100 game season, and there were 101 teams or something like that. Other than using it to estimate that team's true talent, of what relevance is it to team A what team B's record was against other team's? That is why your question makes no sense (I don't mean that to criticize your question). IOW, team A's record in that one game with team B is only "displaced" by team B's true talent not by team B's record against other teams!

Like I said before, let's say that all teams had the same talent. Then why would team B's sample record against all other teams have ANY relevance to that one game of team A versus team B. Why would you want to adjust or displace that one game by team B's record verus other teams? You might as well adjust or "displace" it against the record of some team they never even played! The only reason you use team B's (or any team that you played) record against other teams (and against you of course) is to help you to estimate team B's true talent, which is the ONLY thing you are interested in! Like I also said, it's like an MLE. First you havbe to establish the true talent level of the environment. Then and only then can you do the adjustments or translations. Whatever will help you to establish that true talent, you use. Sometimes it is a team's record against all teams, sometimes it is that AND something else, and sometimes it is not even that!

Posted 8:05 p.m., January 14, 2004 (#13) - David Smyth
I have a quick question for MGLE :) or Tango. The article gives the weights 5/4/3 for batters and 5/3/2 for pitchers. After that, each component is regressed independently. But say you want to do a quick projection for players who are "full-time" over the last 3 years (using ERA for pitchers and RC/G for batters). How should you weight the regression to avg? Should it be (for batters) 5/4/3/2, or 5/4/3/1, or 5/4/3/3 ? And similarly for pitchers?

Posted 9:14 p.m., January 14, 2004 (#14) - Tangotiger
Always use 5/4/3/2 for hitters, regardless of # of PAs. Rmember, the "2" means 2x600.

Posted 11:07 p.m., January 14, 2004 (#15) - Dackle
From post 12 --

"Let's say that a team (or player), team A played against another team, team B, once (let's say) that had an overall w/l of .600 in a 100 game season, and there were 101 teams or something like that. Other than using it to estimate that team's true talent, of what relevance is it to team A what team B's record was against other team's?"

Because you adjust team A and B's strength by the strength of their opponents. You also adjust the strength of team A and B's opponents (the other 100 team in the league, say teams C through Z) by team A and B's strength. Following the game between team A and team B, the strength of team C through Z's opponents (which includes teams A and B) has to be adjusted slightly.
Does it bother anyone about pythag that removing a 16-1 win from an 81-81 team scoring and allowing 787 runs reduces its pythag wins by 1.8? I wonder if a game-by-game calculation would make it more accurate? I'm all for the granular approach, but it doesn't explain that elusive way in which teams turn components into wins. I'm starting to believe that wins explain the flaws in runs scored/allowed, not the other way around.
It would make my day if someone came up with a correct set of weights, both for prior seasons and at any point during the current season. A good vigorous least-squares regression analysis on all players with 20 PAs from 1900 to 2003 would hit the spot nicely. I suppose that "someone" will have to be me, one of these years.

Posted 11:31 p.m., January 14, 2004 (#16) - Tangotiger
but it doesn't explain that elusive way in which teams turn components into wins.

It's not elusive at all. The Tango Distribution explains it pretty well. The combination of the various components (h,hr,bb, etc) gives you the mean and the variance.

Once you've got that, it's rather trivial to come up with an expected win% given 2 distributions.

A good vigorous least-squares regression analysis on all players with 20 PAs from 1900 to 2003 would hit the spot nicely

I'm not sure what you are asking here, but a player's overall talent level can be determed by regressing the performance by
209 / (209 + PA)

Posted 11:37 p.m., January 14, 2004 (#17) - Tangotiger
David, to expand on the regression, say you have
2003: 200 PA
2002: 600 PA
2001: 100 PA

How do you regress?

Marcel says:
performance PA = 200 x 5 + 600 x 4 + 100 x 3 = 3700
league mean PA = 600 x 2 = 1200

So, 1200 / (1200+3700) = 24.5%

That's what Marcel the Monkey says.

How about somethign a bit better?

To get "effective" PAs, I'd do:
effective PA = 200 x 1 + 600 x 0.8 + 100 x .6 = 740

regression = 209 / (740 + 209) = 22%

You'll note that 740 is really just 1/5 of 3700.

To get Marcel in-line with this, I should actually do 500 x 2, and not 600 x 2.

The "209" came from another recent thread.

So, we know exactly how much to regress knowing how many PAs.

Posted 2:17 a.m., January 15, 2004 (#18) - MGL
Because you adjust team A and B's strength by the strength of their opponents. You also adjust the strength of team A and B's opponents (the other 100 team in the league, say teams C through Z) by team A and B's strength. Following the game between team A and team B, the strength of team C through Z's opponents (which includes teams A and B) has to be adjusted slightly.

I know how to do SOC iterations. What I am trying to say (said in the articlew that it is hard to understand AND hard to swallow) is that the "strength" of a team is NOT necessarily defines by its record. I think we all understand that. A teams record is a sample of it's strength, but it is not its strength per se. If we want to know a team's strength we can take it's sample overall record and use that to estimate its strength.

How much a team's overall record can be used as a substitute (proxy, estimate of, whatever) its strength depends on the length of the schedule (try doing a SOC or QOC adjustment the "old" way for a 3 game schedule and see what kind of screwy results you get!) and the distribution of talent in the league. As I said, we are trying to adjust each team's record by the true strength of it's opponents and NOT by their actual records. These 2 things are sometimes close and sometimes they are not. For very short schedules they will not be close to the same thing. If the true talent distribution of teams were such that all teams had about the same true strength then they would also not be close. Etc. Sure you could just go ahead and adjust everyone by the actual w/l records of their oppoennts and you wouldn't be that far off in a 162 game season. But you would be making some bad mistakes if you did that for a much shorter season or if baseball had a lot more parity than it does.

If my explanation is still not making sense maybe someone (Tango) can help me out here...

Posted 12:00 p.m., January 15, 2004 (#19) - Dackle
Pythagorus shouldn't work as a proxy for strength. A team which has a pythag w% of .750 after five games is not a .750 team.

"True talent" is just one way of looking at the question. I'm more interested in how the won-lost records have been displaced by the schedule. If I learn that a .536 team would be a .550 team (using the old method) with a balanced schedule, I'm not assuming that the "true strength" of the .536 team is therefore .550. I'm just recasting the won-lost record of the .536 team, and its leaguemates, in a way that removes the distortion of the schedule. There is nothing wrong with accepting won-lost records at face value. We don't, for example, adjust the games-behind column in the standings for the "true talent" of the teams involved. Many schedule adjustments or power ratings, which use the old method, are really just advanced extensions of the winning percentage and games-behind columns.

Posted 12:40 p.m., January 15, 2004 (#20) - tangotiger
But, I don't think it does.

If we know that every team is equal, but that, as it turns out, the opposing actual win% of the Yanks was .530, we can't assume that they played an unbalanced schedule.

In fact, since we know that every team is equal, Yanks played a group of .500 team that managed to play .530 over their games.

While I agree that these SOS type things do assume that the opponent was .530 in this case and therefore the Yanks played an unbalanced schedule, in fact they did not.

The problem is that we don't know that every team is the same (and we are pretty sure they are not). Therefore, we need to regress the opponents record to account for this.

Perhaps AED can explain how he handles the SOS that takes this into account.

(SOS = strength of schedule)

Posted 3:36 p.m., January 15, 2004 (#21) - AED
It's a tricky problem. Regressing opponents' rankings will wash out real disparities in schedules. In other words, if my opponents have a true ability of 0.450 and indeed go 0.450 (the average outcome), regression to the mean would give me credit for having played a tougher schedule than I actually did. On the other hand, not regressing opponents' rankings leaves you more susceptible to errors.

The optimal approach is to get the most accurate preseason projection possible and regress each team to its projection. Of course doing so will bias your system severely, so it would be unacceptable for the usual computer ranking purposes.

What I do is a two-step process, in which I first compute all team strengths with no regression and then compute each team's ranking using its prior and the opponent strengths.

Posted 5:51 p.m., January 15, 2004 (#22) - David Smyth
Perhaps a slight bit off-topic, but I have a question, which will betray my duffer's understanding of regression. Let's say that the spread of observed W/L % in a lg is "always" about .350-.650. And let's say that, after the weighting by recency of season and regression towards the means (.500 in this case), the "true talent" only ranges from .400-.600. So every year, the observed range is greater, due to chance. It has been said that the true talent is the most accurate projection. But if you use that here, you are projecting a lesser spread of results than actually occurs. This implies, therefore, a built-in limit on the accuracy of the projections. But why not try to "project the luck", since it seems to have a consistent overall effect on the distribution? So, if you expect to have a .650 or so team, which team is most likely to achieve that? It should be the true talent .600 team, since they have the least discrepancy between their true talent and the .650 actual expected. And if you work that scheme out, you are essentially canceling out the regression to the mean.

Where am I going wrong here? Is it in my initial assumption that projections which feature regression towards the mean have a tightewr spread than what actually occurs? Is it wrong that the "distribution" of a projection system should be about the same as that which actually occurs?

Posted 6:05 p.m., January 15, 2004 (#23) - MGL
Pythagorus shouldn't work as a proxy for strength. A team which has a pythag w% of .750 after five games is not a .750 team.

Sure, pythag just gets you closer to a team's "real" strength so that you don't have to regress as much!

Your mixing 2 things. One thing is the w/l records of the teams (either before or after the SOS adjustment). No reason we have to worry about their "true strength." People just want to know a team's records. Not too often do you hear someone say, "But have you regressed the Rockie's record to estimate it's true record?" But, if you just want to see "how other teams displaced their record," you have to worry about the true w/l record of their opponents, and not the actual w/l record of their opponents! It makes no sense to adjust the w/l record of a team by the w/l record of it's opponents! The result doesn't mean anything, unless you KNOW that the actual w/l records of your opponents is a good estimate of their actual strength! You can do it, and it's technicaly not wrong. It's just not the best way to do it. In practice, in baseball, it is OK to do it that way, because for a 162 major league baseball schedule, you can be sure that a team's actual w/l record will likely be pretty close to its actual strength. But that is NOT always the case, which is why it is important to understand what's going on.

What about 10 days into the season? Would you be comfortable doing a traditional SOS adjustment for each team? Why not? What about 50 days? At what point do you feel comfortable? Doing the appropriate regression allows you to not have to worry about how many games have been played!

If you are doing "rankings," to some extent, it changes everything. For example, if you are ranking teams based on actual strength, you don't have to do any regressions, assuming all teams have played the same number of games. You can safely and correcty rank all teams based on their actual w/l records. Regressing first won't chnage anything...

Posted 7:20 p.m., January 15, 2004 (#24) - Tangotiger
David, you minimize your errors by projecting the true talent as the actual record, rather than increasing the spread of the true talent to match the expected spread of actual records.

Posted 8:40 p.m., January 15, 2004 (#25) - Michael
David you are sort of right that if you were trying to predict what will be the highest w% of any team next year that it will probably be .650 even though the most talented few teams might be true talent .600. But for each of the given .600 teams you'd make a most accurate prediction by predicting they'd win at a .600 rate. But inherently you know there will be some random fluctuation. Even if you knew a priori the exact skill level of all 30 teams then you'd still have some error in your predictions just based on random fluctuation (and you could model this easily in a simulation).

P.S. MGL I'm not Humphries, I'm Bodell.

Posted 3:22 a.m., January 16, 2004 (#26) - Dackle
Yes, I'm comfortable doing a SOS 10 games into the season. If Baltimore starts 9-1, it doesn't bother me that their w% is .900, even though they were 71-91 last year, 67-95 in 2001. And so if they'd played seven of those games against the Yankees, three against Boston, and I do schedule adjustments and bump them up to .923, that's fine. I know they aren't a .900 team, or a .923 team. But .900 is an accurate description of their play thus far, and .923 is another description which combines Baltimore's won-lost record and the performance of its opponents. Maybe the root of our differences is this: I want to describe the past (the sample data), you (MGL) want to predict the future (the true value which caused that sample data, and what sample data will likely emerge in the future).

Posted 3:50 a.m., January 16, 2004 (#27) - MGL
Dackle, no you are mixing up the 2 things again (the past and future of the records you are adjusting and the past and future of the records you are using for the adjustments). The question about the 10 game season is not whether you would want to adjust the 9-1 team to account for SOS, but whether you would want to adjust, say a .500 team (5-5), who played that 9-1 team 3 times already? Let's say that your .500 team played that 9-1 team 3 times and for their other 7 games, they played .500 teams. IOW, they haven't played the teams with bad records yet. So their SOS would be 3 games versus .900 teams and 7 games versus .500 teams, or an average opponent of .780. Do you really want to adjust that .500 team by .780 and say that their w/l record, adjusted for SOS is now 8-2, because they played that 9-1 team 3 times already? You may want to, but that information doesn't mean very much. Doing some computation and spitting out the result is silly if the result and the computation don't mean anything. Yes, we know that our 5-5 team played 3 games against a tough opponent (the 9-1 team) so their 5-5 record is not really fair. That means something! But how unfair is it? It depends on how tough that 9-1 team really is! To just go ahead and adjust the 5-5 team using the tough team's 9-1 record is arbitrary and yields a result (my 5-5 team is now 8-2) which very likely has no meaning or truth other than "the result of adjusting my .500 record by .780." Doesn't mean anything. Yeah, we know that the 5-5 record is NOT fair and should be adjusted by something, but NOT the 9-1 record of the team you played 3 times. You might as well say "I make my 5-5 team 6-4 now becuase they have played a tough schedule so far." That would be closer to the truth. Where is the "magic" in using the 9-1 record to adjust the 5-5 team's record? As I've said before, you might as well adjust a team's w/l record by the actual w/l record of their opponents AND the temperature on the day of each game. The specific w/l record of a team's opponents don't mean ANYTHING other than as a very weak (for only 10 games) approximation of the opponents' actual strength. Why use that exact number (9-1)? Why not 8-2 or 7-3?

If you don't get what I'm saying, we're better off dropping it now. There is no value in arguing an untenable point unless it helps you to understand the tenable one, although in this case it is a little bit of a semantical argument...

Posted 12:23 p.m., January 16, 2004 (#28) - tangotiger
And so if they'd played seven of those games against the Yankees, three against Boston, and I do schedule adjustments and bump them up to .923, that's fine. I know they aren't a .900 team, or a .923 team.

Dackle, what you are saying is fine, but ONLY if you know what the "true talent" win% of Bos and NYY were.

There are two issues here:
1) recasting the 9-1 record into something "true"
2) establishing the opponent's strength

As far as Dackle is concerned, he doesn't care about 1), and neither should we. If KC happens to go 9-1, then that's what they did, and we don't take it away from them. Sosa goes 3-3, with 3 HR and 10 RBIs, and so he gets to keep that performance.

But who did KC do the 9-1 against? Who did Sosa goes 3-3/3HR against?

If you look at KC's opponents, and even if all of KC opponents we "knew" were .500, they would not have performed at .500 over the 10 games. Therefore, if we know the opponents are .500, but they actually PERFORMED at .600, this does NOT mean that we set KC's opponent's strength at .600 (and reset their .900 record to .920 or something). The opponents ARE a .500 team, but they just happened to play like a .600 team. KC's 9-1 record was done against a .500 team, and therefore, no adjustment needed.

Same thing with Sosa. If he happens to do that against Pedro Martinez, but Pedro in his two only previous games was shelled, we don't say "oh, Sammy's strength of opponent had a 7.89 ERA". The limited sample of the opponent does not carry enough information about what the opponent is truly capable of.

In order to do SOS, you need to establish the true talent level of the opponent (whether by team or player).

This does not mean that you regress KC's 9-1 or Sammy's 3 HR night (for what Dackle is trying to do anyway).

Posted 1:54 p.m., January 16, 2004 (#29) - villageidiom
In keeping with my new year's resolution to foster recursion, here's a link to Neyer's latest article, in which he references this study and also provides a link back to Baseball Primer.

Granted, his link is to Clutch Hits... but at least he was well-intentioned.

Posted 2:36 p.m., January 16, 2004 (#30) - RossCW
If you are trying to prove that there is very little difference between players the best way to accomplish that is may be using MGL's method - to the extent his method is described here (just once I'd like to see a reproduceable description of one of these "studies"). It starts by using three years of data and then regressing that to further limit the defined difference. The not surprising conclusion is that there isn't much difference in the competition.

But that hardly says anything about the impact of competition on the raw data of a single season for a single player.

Posted 3:46 p.m., January 16, 2004 (#31) - MGL
Amazing! A scintilla of a compliment from Ross! Maybe I'm reading it wrong...

Posted 9:23 p.m., January 16, 2004 (#32) - RossCW
Maybe I'm reading it wrong...

No - you did a very good job of manipulating the data to get the result you wanted. I guess that's a compliment.

Posted 10:39 p.m., January 16, 2004 (#33) - Dackle
Tango, if you take Kansas City's 9-1 record at face value, then you have to do the same for its opponents. You can't mix actual records with opponents "true talent," because Kansas City is an "opponent" for 13 other teams. This would result in one set of calculations using KC's 9-1, and another set for the other 13 teams using 6-4 as KC's record. The league is a self-contained interlocking unit, where every team is also an opponent. Because of this, you have to treat teams and opponents the same way. KCs 9-1 record is as much an anomaly as a 7.89 ERA by Pedro, but nevertheless, it is a description of what has actually happened on the field. I see no problem with modifying that descriptive information with a schedule adjustment which relies wholly on other descriptive information.

Posted 11:21 p.m., January 16, 2004 (#34) - Tangotiger
Dackle, I think we are at a definite impasse, as both sides understand (but don't accept) each other's perspective.

Posted 11:47 a.m., January 17, 2004 (#35) - Dackle
I accept MGLs method as an alternative to the old method. It's just not the way I'd like to proceed.

Posted 8:01 p.m., January 17, 2004 (#36) - AED
Dackle, that's not correct -- it does not have to be a self-consistent system. If you want to know a player's opponent-adjusted contributions, you should do so using the best estimate of opponent strengths (which involves a prior) but the player's actual contributions. Were MGL attempting to compute players' opponent-adjusted skills, he would indeed want a self-consistent system where everything was regressed.

Ross, use of a prior is not an attempt to manipulate the data; it is in fact demanded by Bayes' theorem. Failing to use a prior is a violation of statistical principles, and only justified if the prior is too difficult to measure accurately.

Posted 12:54 a.m., January 18, 2004 (#37) - Dackle
I want a system where I can take adjusted team records, compute expected wins by each team against every other team, and have the results equal the actual major league standings. I call the difference between the adjusted and actual records a "schedule adjustment." If I make a little dice league and replay the season, each team's chance of winning has to be calculated by working backward from the actual won-lost records. Using regressed values doesn't work, because it just leads to regressed results, when actual results are what I want.
I would also feel very uncomfortable if the league totals didn't add up to .500, which is a possibility if the system isn't self-contained.
Finally, it seems misleading to say "The Royals' 9-1 record was helped 0.8 wins by the schedule," when the 0.8 is calculated using regressed values. Especially when the speaker goes on to argue that this means quality of competition is unimportant, because the adjustment is so small. You should say "The Royals' 6-4 regressed record was helped 0.8 wins by the schedule." But if the listener is more interested in knowing the adjustment in terms of the actual record, then you should say: "The Royals' 9-1 record was helped 2.8 wins by the schedule," where that 2.8 was calculated using actual records.
I admit though that if we travelled back to April, 2003, each team's true value was unchanged, and we played the season again, the standings would come out differently. I have a feeling that it is this "true value" that MGL, AED and Tango are interested in. This is valid and interesting, but it doesn't tell me what I want to know. I want, on average, for the 2003 Red Sox to go 95-67 in my theoretical dice replay. I can do this with the actual records and just one piece of additional information: the schedule. The difference between the actual and adjusted records will then reflect the schedule and nothing else.

Posted 1:27 a.m., January 18, 2004 (#38) - MGL
The Royals' 9-1 record was helped 0.8 wins by the schedule," when the 0.8 is calculated using regressed values. Especially when the speaker goes on to argue that this means quality of competition is unimportant, because the adjustment is so small. You should say "The Royals' 6-4 regressed record was helped 0.8 wins by the schedule.

You got it backwards again!

Dackle you just have something which makes little sense stuck in your head and you refuse to give it up. You got some really smart people telling you that what you're trying to do is nice but makes little sense other than the "numbers add up nicely." To each his own...

Posted 4:45 a.m., January 18, 2004 (#39) - Dackle
Hmm, I must be on the right track. No criticism from MGL besides familiar bombast.

Posted 8:27 a.m., January 18, 2004 (#40) - Tangotiger
I want, on average, for the 2003 Red Sox to go 95-67 in my theoretical dice replay.

And I suppose you want the Royals to want to go 9-1 each time, right?

Under those requirememts then, Dackle's approach is the only one that can fit the bill.

Posted 11:46 a.m., January 18, 2004 (#41) - Dackle
And I suppose you want the Royals to want to go 9-1 each time, right?

Exactly. Otherwise it wouldn't be a fair description of what happened.

Posted 3:43 p.m., January 18, 2004 (#42) - AED
Dackle, you've basically reinvented several existing ranking systems, albeit without the necessary prior. The reason a prior is needed can be illustrated easily. Suppose that the Royals were 10-0 instead of 9-1, and that their average opponent's schedule-adjusted record was 0.550. Following your logic, this means that the Royals' schedule-adjusted record was 1.050 -- 10.5 wins in 10 games.

Finally, it seems misleading to say "The Royals' 9-1 record was helped 0.8 wins by the schedule," when the 0.8 is calculated using regressed values.
It isn't the least bit misleading. After 162 games have been played, the number of wins contributed by opponents in the Royals' first ten games will probably be a lot closer to 0.8 than to 2.8. Using 2.8 would therefore be misleading, because it's probably very far from the truth.

Posted 4:37 p.m., January 18, 2004 (#43) - Tangotiger
Actually, AED, there's no reason to make the SOS additive. I think this was just an example.

In the case where KC is 9-1 and the SOS was .600, I would do:
KC Odds: 9:1
Opp Odds: 1.5:1

KC adj Odds: 13.5:1 (9x1.5)

KC adj win% = 13.5/14.5 = .931

***

The problem here is Dackle trying to justify his approach, when really all he cares about is replicating KC's 9-1. From that standpoint, all the discussion and debate goes away. What Dackle wants is to replicate KC's 9-1, and therefore, he has no choice but to make that it's true talent. And therefore, everyone else is in the same boat... they all have their true talents kept. So,from that perspective, Dackle is fine.

His premise though is hard to accept. But, given his premise, I think we have to accept his methodology.

Posted 4:39 p.m., January 18, 2004 (#44) - Tangotiger
Essentially, whether KC's .900 is done in 10 games or 10 million, Dackle WANTS the same result.

Posted 4:54 p.m., January 18, 2004 (#45) - MGL
Here is one more way to look at it, using a hypothetical discussion:

So the Royals are 9-1 so far this season, huh? Wow, they must be a great team!

Sure, they seem pretty good so far, but that 9-1 record is a litte misleading.

Oh really, why is that?

Well, they played some really crappy teams.

Really, how do you know that?

The combined record of all the team's they played so far is 20-80!

Wow, I guess they did play some crappy teams. So their 9-1 record is kind of misleading, huh?

Yes it is, it should be more like 7-3 or 6-4 or naybe even 5-5. I'm not really sure.

Hey, I've got an idea, let's "adjust" their 9-1 record to account for how bad their opponents were and then we can tell people "Here's what the Royal's record SHOULD be or here's what it WOULD be if they had played average quality teams!

Great idea, but how should we adjust their 9-1 record?

Well let's adjust it by the collective 20-80 record of their opponents! There are some really great formulas out there, like the log5 and odds ratio methods that can tell us exactly how to adjust a 9-1 record for the "quality of competition"!

Now I'm confused! Why are we adjusting the Royals' 9-1 record?

Because they played crappy teams, on the average! Haven't you been following the discussion?

OK, I get that their opponents are PROBABLY crappy since they had a 20-80 record, but are you sure they are crappy and that 20-80 wasn't just bad luck?

Well, I'm not exaclty sure, but it is a good bet that they are crappy since they have a 20-80 record!

Yeah, I guess you are right! So let me get this straight. We are going to adjust the Royals' 9-1 record to account for the fact that their opponents were probably bad?

Right!

Why are we going to use the 20-80 record of their opponents to make that adjustment?

Because that's what their record was!

Yeah, but I thought the whole idea of adjusting KC's 9-1 record was to account for how BAD their opponents were! Why are we using 20-80 to represent how bad they were?

Becuase that's what their opponent's were, 20-80! I already said that!

Yeah, but do we know that they were realy that bad?

No, we don't know for sure, but that was their record!

Well, are we adjusting the Royals' 9-1 record by 20-80 becuase "that's what their opponents' record was," or "becuase that's how bad we think they were?"

Hmmm... I think both!

But we really want to adjust that 9-1 record by how good their opponents were, right!

Well sure!

So if God came down and told us that their opponents were really average teams, and that theie 20-80 record was just bad luck, what we do then?

Well, then we wouldn't adjust the 9-12 record anymore you idiot, becuase the 9-1 is the right record! If we told everyone that the 9-1 record should be 7-3 or 5-5 we would be lying to them, because we told everyone that we were going to figure out what the Royals' record would be if they had played average teams, and if we knew they were average, we wouldn't want to adjust the 9-1 record! But God is not coming down is he?

No, but why are we using the 20-80 when you said that the ONLY sensible reason to adjustthe 9-1 records is so we can tell everyone what a "fair" record for the Royals is? You know, what their record would be if they played average teams.

Cause that's all we have to go by, since I doubt that God will come down and tell us how bad those teams really are!

Oh, I get! The only reason we are using the 20-80 record is becuase that's all we have to be able to guess how bad those opponents really are. If we knew they were really average, we wouldn't adjust the 9-q at all. And if we knew that they were only a little bad, we would only adjust the 9-1 A LITTLE - not by the 20-80?

Of course. If we KNEW how good their oppoetns were, why would we want to use the 20-80 record to adjust the 9-1 record? That would be stupid. It wouldn't be good information to use for anything, would it? Like you said, if we KNEW that those opponents were really average teams, and that their 20-80 record was just bad luck, we would be misleading everyone if we adjusted that 9-1 record to 6-4 or 7-3 or 5-5 or whatever it would come out to. What would that adjusted record MEAN if their opponents were really average? Nothing that I could think of!

OK, but since we don't KNOW how bad the Royals' opponents were and how lucky or unlucky that 20-80 record was, we pretty much have to go with the 20-80 record as an estimate of how bad those opponetns actually were?

Now you got it!

Wait a minute, I forgot, some really smart people told me that if you want to estimate how good or bad a team is, you don't use their w/l records!

Really? There's a better way to figure out how bad the Royals' opponetns were than using their actual w/l record of 20-80?

Yup!

Well, why didn't you say that in the first place? What is it?

It's something to do with "regression" or somethig like that. And it depends on how many games each team plays.

Well, that makes sense, since a team that is 3-7 is probably not as bad as a team that is 30-70!

Are you sure that this regression method is a better way of guessing how bad the Royals' opponents were than just using their 20-80 record?

Yup, I'm positive!

Boy, I see your point, but I really want to use that 20-80 record to adjust KC's 9-1 record! After all, that WAS the record of their opponents!

Yeah, but that gets back to my original question - why use that 20-80 record just because that is the EXACT RECORD of their opponents, when the whole point of the adjustment is to let people know what the Royals' record SHOULD like like if they played average opponents?

Cause that's how bad their opponents really...Ah! I got you! That's NOT how bad they really are - that 20-80 record! If I want to guess how bad those opponents really are so I can come up with a fair record for thr Royals, I have to do that regression thing! If I use the 20-80 record to do the adjusting, even though it "seems" like the right thing to do, I'm really not using a very good number to adjust the Royal's record! There are better numbers I can use to accomlish what we want to accomplish! Probably 21-79 is better since that is closer to how bad their opponents really are. Even 22-78 is probably better! Using 20-80 is not only arbitrary (even though it happens to be their actual record - so what!) it is not a good number to use if we want to present a "fairer: record for the Royals! If we use the 20-80 to adjust that 9-1 reocrd, it may "seem" like the right thing to do, and we can call it something "nice" like "an opponent or schedule adjusted w/l record" but it doesn't mean anyhting. If it means anyhting at all, it as an attempt to present the Royals record as if they played average teams, but it is a poor attempt! Right?

Right!

Dackle, that's the best I can do! The rest is up to you!

Posted 6:54 p.m., January 18, 2004 (#46) - AED
Tango, how do odds ratios work for unbeaten teams? Forget the 9-1 example, a 10-0 (or 0-10) team shows why a prior has to be used.

Posted 7:05 p.m., January 18, 2004 (#47) - tangotiger
Essentially as the odds approaches infinity, the rate approaches 1.

So, regardless of the opposition (unless that opposition is also 100%), the opposition's strength is ignored for a team with a win% of 100.

Posted 7:52 p.m., January 18, 2004 (#48) - AED
So every unbeaten team has a schedule-adjusted record of 1.000, no matter how easy or difficult its schedule? In football terms, Carroll College of Montana (14-0) and St. John's of Minnesota (14-0) should be ranked ahead of LSU (13-1)? If you don't use priors, you're making a serious mistake.

Posted 9:04 p.m., January 18, 2004 (#49) - Dackle
Tango's #43 post, I think, explains my perspective very clearly and fairly. AEDs #48 brings up a thorny problem of doing iterations backward from the real records, although I don't think the answer is using priors. Sometimes at really low game levels the iterations don't interlock perfectly, and the culprit is either: (a) every team has not played every other team yet, so we can't judge the strength of the opponents by the opponents' opponents (if that makes any sense); or (b) there are some unbeaten/winless teams. In the case of (b) I would start every team with a .500 record and let the iterator run 10,000 times. The unbeaten teams get up to .998 or thereabouts. But if they're a 14-0 team against really shitty competition (you would know this by the records of the competition's opponents), they won't get nearly as high as .998. It's a flaw, but it's a mathematical problem that could be solved with a more clever approach. I don't think it means that priors are required.
It's hard to know exactly what people want to know when they ask "How much was Minnesota's 90-72 record helped by playing in the Central?" If I say it was 6.97 wins, I agree that might comprise 4.5 wins due to luck and 2.47 wins due to the opponents' true talent (or lack of), and so 2.47 wins should be the correct figure, not 6.97. It depends on your perspective and the degree of sophistication you require. But if you give me 2.47 wins, then I'd like to see 90-72 broken down by luck and skill as well.

Posted 11:20 p.m., January 18, 2004 (#50) - AED
Dackle, the system you suggest is EXACTLY like Sagarin's Elo-based system, but without the ratings. And I can assure you that such a system would give all undefeated teams infinitely high rankings.

Posted 11:21 p.m., January 18, 2004 (#51) - AED
Sorry, typing faster than thinking...

"Dackle, the system you suggest is EXACTLY like Sagarin's Elo-based system, but without the *priors*. And I can assure you that such a system would give all undefeated teams infinitely high rankings."

Posted 9:41 a.m., January 19, 2004 (#52) - tangotiger
And I think that's exactly what Dackle wants.

Treat any 9-1 record as really being 9 billion wins to 1 billion losses.

Or 5-0 being 5 gazillion wins and 0 losses.

Posted 12:31 p.m., January 19, 2004 (#53) - Dr Doppler_s Cthulhuite Spawn Counterpart
Hm, I think I understand some things about what you are doing, but I have some questions.

1. Why do you use regressed performance from three previous years? You weight each year based on the observed correlation between past and future performance, right? And shouldn't 4-years-ago performance also have some correlation with future performance?

2. Also, when you are done, you have a Quality-adjusted estimate of a player's true value, right? Isn't this even more accurate than the 'true' values you used before? Shouldn't you therefore re-do the entire process, using the Quality-adjusted values for the previous three years for each player to determine a 'true+' value, which would be better than the 'true' values you originally used? Presumably you would keep iterating until the Quality-adjusted values stopped changing much between iterations.

Posted 12:50 p.m., January 19, 2004 (#54) - MGL
Dr. Doppler's Cthulhuite Spawn Counterpart,

Yes and yes to both questions. Using 3-years, rather than 4 or more years is just convenient. Since each prior year is given 20% less weight than the subsequent year, there is diminishing value in using lots of years, plus many players don't even have more than 3 years of major history of course. QOC adjustments end up being so minor, that a perfect "true value" estimate for each player is NOT necessary.

After one or two iterations, nothing changes, again becuase the adjustments end up being so minor AND because the "true value" estimates are based on at least 3 years of data. When you adjust 3 years of data for QOC, you get almost no changes in sample performance...

Posted 11:35 a.m., January 22, 2004 (#55) - Sam M
Well, I'm coming really late to this, but I do have a question, if MGL in particular is still looking around here. In # 45, you wrote:

So their 9-1 record is kind of misleading, huh?

Yes it is, it should be more like 7-3 or 6-4 or naybe even 5-5. I'm not really sure.

Do you mean that literally, or just to illustrate the point? Isn't 10 games just too few to even do the adjustment with any confidence in it as an actual indication of what "it would be if they had played average quality teams"? In other words, it seems to me that 10 games against crappy opponents (or against ANY competition) tells us nothing about how good they are, or how they would do against average teams, simply because 10 games just isn't enough of a sample to tell us that. And adjusting the 10 games doesn't tell us much of anything, either.

It seems to me the truly correct answer, if we're talking about only 10 games, would be, "Well, they've played crappy opponents, so that 9-1 record may well be even more unreliable as an indication of quality than any 10 game sample inherently is, but either way, 10 games just isn't enough to tell us (a) if they're any good or (b) how good they truly are."

Or am I underestimating what we can learn from 10 games, even from 10 games against a particular level of competition?

Posted 9:03 p.m., January 22, 2004 (#56) - MGL
I'm still here. Again, you're mixing up two diffent things. I'm not talking at all about the 9-1 team's true strength based on that 10 game sample. I'm simply talking about taking that 9-1 record, for whatever value it has and "re-doing it" to account for the true strength of their competition. For whatever that is worth. I meant the language in my little dialogue literally. "Converting the 9-1 team's record into a "true record" is another story altogether. OK, the two things are related. If I say, so-and-so is 9-1 so far, what does tha tell us. It tells us that unless we know that a priori that all teams are really the same strength, that our 9-1 team is probably better than average. How much better or what their true w% is, we don't know unless we have more information. Could be 90% or it could be 50%, or anything in between. Now, if we find out that the 10 teams that it played had a true or estiated true w% of 40%, that gives us more information that we can use to estimate the 9-1 team's true strength! That's all it does, and that is the proper way to do it. So we convert the team's 9-1 record into what it "would have" done had it played average teams. That is trivial. We use the log5 or ratio odds method to do that. Now we have a QOC adjusted record of, say 8-2. We an leave it at that if we want to, and just say, here is an "opponent neutral" version of my team's record now. It is still a sample record, but it is closer to the truth than the orogonal 9-1 record now that we know that we played bad teams. Or we can go further and estimate the "true w% of our now 8-2 team. I don't really see the confusion.

BTW, we are getting so used to regressing everyhting that we have forgotten that the default rule is that a smaple mean is the best estimate of the population mean! No regression! IfF all we know is a 90% sample result, then the true value is 90%! It is only when we have more information that we may start to regress! And heck, we may even have to regress a 90% sample result upwards, depending uponthe information! The extra info in this hypo is that these are some sports team's records and we know that the mean of the population is by definition .5 and that all sports leagues have some degree of parity! If we didn't know any of that, then we wouldn't regress!

Posted 10:03 p.m., January 22, 2004 (#57) - jto(e-mail)
MGL,
I was wondering if you would willing to submit your OPS and ERA player projections for 2001-2003. I am trying to do an independent research project during my final semester at college comparing various projection systems. Being the poor student that I am its been hard to come across the needed data. Please email the above address if you could help me out. I would really appreciate it! Keep up all the good work. I enjoy reading your input on all these threads. (I probably should have posted this on a different thread)

Posted 4:29 p.m., January 23, 2004 (#58) - MGL(e-mail)
jto, why don't you e-mail me directly. You can tell me about the project...

Posted 4:02 p.m., January 30, 2004 (#59) - Dr Cane_s Cthulhuite Spawn Counterpart
If anyone is reading this thread anymore, I have another question for MGL &c. Does your system deal with adversarial situations?

For example, imagine a team's 25-man roster is split into a Strong squad and a Weak squad, each with 9 position players and 3-4 pitchers. Which players go on each squad are determined by scouting/scrimming results at the start of the season. When the team plays an opponent with a winning record, it uses it's Strong squad, and when they play opponents with a losing record, it uses it's Weak squad.

Or imagine that every team has Strong and Weak squads. Every day, they choose to play with the same squad as their opposition; on the first day of a series, they play Strong vs Strong, then the next day Weak vs Weak, etc. So all the Strong players never play the Weak players, and vice versa.

In both cases, at the end of the season, the Strong and Weak players might have stats with identical average values and distributions, but actually the Strong and Weak players would have very unequal relative skill/values.

These are obviously 'devil's advocate' situations, but for many individual players (e.g. platoons based on opposing pitcher, pitchers who only use a certain catcher (or a groundball pitchers whose manager backs him with better-fielding but weaker-hitting defenders than the rest of the staff normally gets), teams trying to use/avoid using their ace against a team, pinch hitters brought in mostly in close games = probably against a comparable-quality team, etc.) this sort of thing can happen to an obviously noticeable extent - I mean, managers specifically try to do these things.

Perhaps what I am trying to say is that the method you use puts a lot of stock in the predictive value of the metrics you have - yet in the 'devil's advocate' cases, the metrics are perfectly predictive, but still radically wrong about relative value of players and teams (because the Strong players are rated as being equal with the Weak ones). So how can these methods detect how much devil is in the details? Sure, you can use 'common sense' and look at a manager's strategy and say "obviously, the Strong squads are better", but there is nothing inherent in the method that will notice that.

So how do you know that the method doesn't suffer from these same problems (to a smaller extent) with platoons/aces/etc. that I mentioned above? Or you could imagine that trying to use a single system to rate/value minor and major league players on the same scale would have similar problems. After all, you can't detect the problem by simply looking at predictions, since it will give mostly correct predictions of performance - it will just be consistently wrong about player skill.

Posted 10:57 p.m., January 30, 2004 (#60) - Dr Wily_s Cthulhuite Spawn Counterpart
Just to add a note of clarity, I'm not saying that a system should be able to magically overcome a radically adversarial situation; if the Strong players only take the field against other Strong players, and the Weak against the Weak, then no performance-based system could tell which group was generally more skilled.

However, it would be nice if a system could give some sort of confidence score or estimate based on how well-interwoven a player's playing time was with other players. Probably this isn't much of a problem, but I wonder how much an extreme real player could be affected by something like this...

Posted 2:05 p.m., January 31, 2004 (#61) - RossCW
it would be nice if a system could give some sort of confidence score or estimate based on how well-interwoven a player's playing time was with other players.

Be careful - even suggesting the Emperor is only wearing shorts is not well-received here.

The problem comes when the confidence level is less than the differences measured by the performance measuring system. Everyone probably agrees that Coors has some impact on almost any player's performance, its hard to make that argument for Yankee stadium. The Rockies have never figured out how to adjust their lineup to take advantage of Coors, the Yankees have always considered their stadium in constructing their lineup.