Tango on Baseball Archives

Do Win Shares undervalue pitching? (December 15, 2003)

For those who are not going to read the whole article, here's a few snippets of how I responded:

36% of win shares goes to pitchers. So, an average pitcher with 252 out of 1440 innings will get 15.3 win shares. A non-pitcher NL, with 535 out of 680x8 PAs will get 15.3 win shares. 127 games for a 3B against 36 starts of 7 innings for a starter being equals? I don't think so. Do you?
...
Here are the boundaries that would ensure that you have a .300 team on offense or defense:

RS / AVG / RA
3.0 / 5 / 7.8
2.3 / 4 / 6.4
1.64 / 3 / 5

As you can see, the boundary levels should not be 50%/150%, or 52%/152%, but rather at the Fibonacci levels of 61%/161%.

--posted by TangoTiger at 04:24 PM EDT

Posted 6:06 p.m., December 15, 2003 (#1) - David Smyth
Ahhh, the old WShares analysis. The Fibonacci is much better, but as I recall from doing lots of playing around with this topic, there is no inherent reason to think that Fibonacci must be correct. So far, I have not been able to call up properly the article cited by Tango.

Posted 6:14 p.m., December 15, 2003 (#2) - David Smyth
No offense to anyone, but I posted the Fibonacci thing at least a year ago on Fanhome. And the Fibonacci is not perfect either, because it is just a "ratio" based calculation.

Posted 6:25 p.m., December 15, 2003 (#3) - tangotiger
David, I do remember you bringing up Fibonacci. I think I, and Patriot as well, said that .58 and it's recipricol was the best one, and then you brought up Fib.

Posted 7:09 p.m., December 15, 2003 (#4) - AED
It really has nothing to do with either 1.52/0.52 or the Fibonacci values. The real question is how many runs a replacement-level player would cost the team at each position. Consider baseball vs. slow-pitch softball. Assuming that pitching is important in baseball and unimportant in slow-pitch softball, you would find that (compared with baseball) "win shares" for a slow-pitch softball league should be awarded more heavily for offense than defense -- say, 1.30/0.30.

The overall assignment of Win Shares on offense and defense seems reasonable, but the problem is that batting and fielding replacement levels are not independent. A player who is both a replacement-level fielder and a replacement-level hitter would be below replacement level overall. Actual replacement level is only slightly worse than replacement-level hitting combined with average fielding.

Posted 8:30 p.m., December 15, 2003 (#5) - tangotiger
AED, the overall assignments are NOT reasonable at all. I've already shown that an offensive team scoring 3 and allowing 5 will win .30 games, an a bad def team scoring 5 and allowing 8 will win .30 games. So, what makes you say that WS are reasonable between off/def? They are most definitely not. Win Shares sets the boundaries so that 2.5 off and 7.5 def are equivalent. See? WS is shortchanging the def by 0.5 runs per game, and overcrediting the off by 0.5 runs per game. That's a huge, huge problem.

My comments regarding .6/1.6 is meant as a shorthand explanation to this.

Your comments on fielding are definitely valid.

Posted 10:00 p.m., December 15, 2003 (#6) - Charles Saeger(e-mail)
So, I'm a little lost ... how would you go about fixing this? Something like multiplying Claim Points by ERA+ or something?

Posted 11:05 p.m., December 15, 2003 (#7) - studes (homepage)
Time for me to tackle Fibonacci. I was sort of sneaking up on this. I printed out the old Fanhome discussion a long time ago. So I'm going to plug in 61/1.61 just for kicks and see what happens. I'll get it done in a day or two.

Posted 11:26 p.m., December 15, 2003 (#8) - tangotiger
I haven't thought about how to fix this.

1 - If you want a guy who starts 36 games and averages 7 innings to be equivalent to a CF or 3B who plays in 127 games (with all players performing at league average), you better be prepared to prove it. Right now, all people who accept Win Shares accepts this. (This is as bad as the replacement level thing with EqA.)

2 - That 3 run scored and 5 runs allowed has the same win impact as 5 runs scored and 8 runs allowed. You can't make it 2.5 and 7.5 and think that you can make the system "work".

Posted 11:42 p.m., December 15, 2003 (#9) - Guy
Tango, you say that "WS is shortchanging the def by 0.5 runs per game, and overcrediting the off by 0.5 runs per game. That's a huge, huge problem." However, the RS/RA ranges you are talking about don't actually happen. Yes, there are .350 teams, but none that are .350 solely because of poor hitting or poor defense -- it's always some of both. At the team level in today's game, a great offensive team is one run per game above average, a very weak team one run below. Assuming a 5 run environment, a top offensive club will be 6RS/5RA or .59, while a top defensive club will be 5RS/4RA, or about .61. If you use James' .52/1.52 bookends, the good offensive team is credited with +3.4 R/G and the good defensive team is at +3.6, very close to the correct relationship (which I assume is why the .52/1.52 were selected). I could see how in the 1960s low-run environment James might have a small problem, but still nothing like the magnitude you suggest.

The real issue is VARIANCE, as AED suggests and you said in your original post. To try another metaphor, in little league games that use a machine to pitch, the game is presumably something like 75% hitting, 25% fielding, and 0% pitching. If every "pitcher" is the same, pitching no longer matters for explaining wins and losses. If every single MLB hitter performed in a range of .780 to .800 OPS, but we had the current variance among pitchers and fielders, baseball might indeed be 75% pitching. And that would still be true even if those MLB hitters were much better hitters than the rest of us, as long as there were enough of those hitters to fill all ML rosters (admittedly, an unlikely distribution of human athletic talent). The distribution of hitting and defensive talent seems fairly balanced in the current game, but that doesn't mean it is exactly the same. If there is more variance in RA than RS, then defense becomes a bit more important (James seems to say this in justifying his 52/48 split, but the case is murky). I could also see how this could vary in different historical eras: when 4 starters provided most of a teams' innings, pitching variance may have been greater (or smaller); modern gloves probably affected the range of defensive abilities in the game; etc.

That said, it seems to be the case that the offensive and defensive performance ranges are never greatly different, and this presumably reflects some fundamental facts about athletic ability and the rules of baseball.

Posted 1:09 a.m., December 16, 2003 (#10) - studes (homepage)
Guy, I'm not completely following everything you say, but I do want to point out that Tango's numbers do occur today. The 3/5/7.8 distribution is pretty close to today's averages.

Also it's very important to remember that moving the thresholds to 1.61/.61 DOES NOT imply that pitching/fielding are more imporant than batting. It's a mathematical thing. It means that a run not allowed helps slightly more than a run allowed hurts.

Or, from the batting point of view, a run not scored hurts a bit more than a run scored helps. Defense and offense are still 50/50 responsible for the outcome of a play.

Also, regarding AED's comments on replacement level -- I think I understand the issue, but I'm not convinced it's a big deal with Win Shares. Yes, Win Shares does use a pseudo-replacement level for batting and fielding, but it's a relatively low replacement level. It's not nearly as high as Baseball Prospectus's, for instance.

Once/If I get to it, I have a feeling we'll find that replacement level for a position player is something like eight Win Shares. It won't matter if the Win Shares are from batting or fielding -- eight will about do it. And I think it will be lower for pitchers (at least, given how the current system is configured). Something like four.

I may be wrong (I often am), but it seems to me that you don't need a zero-based system to avoid double counting replacement levels.

Posted 8:46 a.m., December 16, 2003 (#11) - tangotiger
Variance: yes, it's all about the variance to split up the distribution. The RS/RA standard deviation is virtually the same for any era since 1900. So, 50/50 is the best split, based on that granularity. It's NOT so obvious if you include hitting, baserunning, pitching, fielding as the various distributions (though 3 of these 4 are not independent of each other). In any case, 50/50 should be the given, and others should prove against it.

Assuming a 5 run environment, a top offensive club will be 6RS/5RA or .59, while a top defensive club will be 5RS/4RA

Those are not equals, are they? But these are:
RS/RA
4.0 / 5.0
5.0 / 6.2

This will give you about the same win%. So, that's 80% of league and 124% of league.

And, in any case, James uses 52/152, so HE's setting the boundaries at a low level. 61/161 is the more in-line with what he wants to do.

Posted 9:41 a.m., December 16, 2003 (#12) - studes (homepage)
I plugged in 61% this morning. Pretty interesting impact.

The percent of WS attributed to batting/baserunning drops from 48% to 39.5%. Pitching increases from 36% to 42%. Fielding goes from 16.6% to 19%. So it appears that offense gets definitely underweighted.

The WS leader (these are not the original numbers -- they include some of the methodological changes I've made) was Delgado at 33, but he drops to 31 and ARod is the leader. He only drops from 33 to 32. In fact, everyday players, other than the DHs and 1B's, don't see their totals move too much.

One of the unintended consequences (but maybe a good one) is that starters are helped a bit more relative to relievers. Their innings pitched help them garner a larger percentage of pitching Win Shares.

For instance, Hudson moves from 24 to 27, while Foulke moves from 20 to 22.

So I think the practical implications of Fibonacci are too extreme.

Posted 10:50 a.m., December 16, 2003 (#13) - Steve Rohde
One question I have had, when thinking about replacement level, is whether the replacement level for position players should vary depending on the position a player is playing. I agree with the notion that we shouldn't think of position players as having seaparate replacement levels for their offense and their defense -- logically it seems to make more sense to think of a replacement level for the overall package of offense and defense. However, in the win shares system, it seems clear that as a practical matter, playing certain positions provides the opportunity to accumulate more defensive win shares than in playing other positions. At the extreme, being a DH provides no opportunity to accumulate defensive win shares. So instead of concluding, for example, that the replacement level for position players is something like 8 win shares and the replacement level for a pitcher is something like 4 shares, maybe the overall replacement level for a postion player should vary to some degree based on the position he is playing.

Posted 11:01 a.m., December 16, 2003 (#14) - ColinM
I'm late to the game here, but I think AED and Guy are on to something here. Actually, I believe I posted something very similar to AED's comments regarding replacement level about a month ago on one of those never ending Win Shares threads.

I can only make a short comment right now but I want to say this:
The 70-30 split makes sense when comparing runs saved to an AVERAGE baseline. There's no reason to assume this holds true when comparing to some hypothetical 0-level baseline like Bill James uses. The split can still be 70-30 compared to an average team, but it may be 80-20 or something when compared to a marginal team. I would imagine the split moves as the replacement line moves. I feel this is actually the source of the undervaluing of pitchers, more of those marginal runs at the really low level belong to the pitcher, not the fielders.

I may elaborate later if anyone cares...

Posted 11:07 a.m., December 16, 2003 (#15) - studes (homepage)
Steve, I agree completely. I didn't mean to make a categorical statement. I do believe replacement value will vary by position. Colin, I think you have a very valid point too. If you could elaborate (here or in the thread on my site) that would be great.

In the end, it may well be that varying replacement levels by position and function is the only thing that can make Win Shares whole.

Posted 11:43 a.m., December 16, 2003 (#16) - Guy
Tango: On RS/RA, if one run saved = 1.2 runs scored, which I accept, then it seems that James' .52/1.52 benchmarks need tweaking, but are not "half a run/game off" for both offense and defense. Is that fair?

It occurs to me that Tango's point matters much more for at individual pitcher level than for the question of team level allocation. No team is much more than 1 R/G above or below average. But Pedro takes 2 R/G away from the opposing team, and it's concentrated in 30 or so games. So perhaps his contribution has proportionally more impact than WS suggests (where his runs prevented per inning are treated as only twice as valuable as preventing 1 R/G or scoring extra 1 R/G). Does this seem like a useful distinction? Seems to address widespread perception (which I share) that WS undervalues great starters.

If std dev for RS and RA has always been the same, then 50-50 is absolutely right for offense/defense split. Interesting it's been so stable.

Tango: To go back to original question of pitching/defense split, you say you've concluded it's 70-30. Could you post the cite? However, I think you've said elsewhere that responsibility for BIP is about 50-50. If that's true, and you then credit pitchers for Ks, HR, and BB, won't pitchers account for far more than 70% of total variance in RA?

Seems to me that if we give defense 50%, give pitchers 75-80% of that, and credit top starters for full impact of high level of runs prevented per game, then starters will get their due in WS.

Posted 12:09 p.m., December 16, 2003 (#17) - tangotiger
The 52/152 split: we can't just come up with numbers to fit our liking. They have to be derived somehow. Since James is intent on doing a .300 replacement level by off and by def, then the PROPER levels to set are 61/161. If you want to do a .400 replacement level, then 80/125 is more like it. (Essentially you want the recipricol.)

Now, the pitcher and hitter effect are different on a game level. While the RPW converter might be 11 for a hitter, it would be 9 for a pitcher. However, James applies, essentially, a constant RPW converter. Therefore, to counteract that, you may have to "tweak" the 61/161 to something else so that it adds up properly. Essentially, fudge one number and fudge another so you come up with something right.

As for the 70/30 split: the run value of a safe and out play for a BIP is almost exactly the same as the safe and out play for a non-BIP (i.e., the HR is worth 1.4, the BB is worth 0.33, and so on average, that's almost exactly the same as a 1b.2b.3b). There are about 75% BIP and 25% non-BIP. Giving the pitcher 60% of the BIP, and we get:
.6 x .75 + 1.0 x .25 = .70

Posted 12:50 p.m., December 16, 2003 (#18) - Guy
On 70/30 split: Have you also factored variance into this? My perception was that the variance in teams' park adjusted H/BIP rates was relatively smaller than variance in HR/SO/BB rates (certainly, we know individual pitchers can be very successful with high H/BIP and vice versa), which would mean pitchers' performance could explain more than 70% of defense variance, even if BIP account for 75% of all RS. Is the variance actually about the same, or am I thinking about this wrong?

Posted 12:58 p.m., December 16, 2003 (#19) - Guy
In addition to needing different RPW converter for pitchers vs. hitters, I think the other important implication of Tango's analysis is that preventing 2 R/G is more than twice as valuable as preventing 1 R/G: the first run increases W% by .095, second by .106. Thus, great starters have disproportionate impact on game outcomes.

(Conversely, each additional marginal RS is less valuable than the preceding RS -- first is worth .083, second is .070. Perhaps this is underpinning of old saw that "great pitching beats great hitting?")

Posted 1:46 p.m., December 16, 2003 (#20) - studes (homepage)
Conversely, each additional marginal RS is less valuable than the preceding RS -- first is worth .083, second is .070. Perhaps this is underpinning of old saw that "great pitching beats great hitting?"

Well, as I tried to say before, this is a mathematical phenomenon. In particular, it's a result of basing a system on variance away from a .500 level, which is what Win Shares does. If you build a system on a bottom-up level, like WARP or Win Advancement, I would think you'd have different impressions of the relative weight between the two.

Posted 3:38 p.m., December 16, 2003 (#21) - AED
Tango, you are implicitly assuming that a team with average offense and replacement-level defense would have the same record as one with replacement-level offense and average defense. I don't see this as a given. In my example of slow-pitch softball, the wins cost by replacement-level defense would be less than the wins cost by replacement-level defense in baseball. Requiring the two numbers to be reciprocals of each other is only valid if you believe that replacement-level defense is as bad as replacement-level offense.

I disagree that 0.62/1.62 (or 0.52/1.52, 0.50/1.50, etc.) is a reasonable definition of replacement level. Replacement level is any two values that would produce a Pythagorean (^2, ^1.8, or whatever you prefer) win percentage of 0.30. If you believe that replacement-level offense is equally bad as replacement-level defense, you end up with something around 0.80/1.25. If you're playing home run derby (all hitting), it's 0.62/1.00. Regardless, the problem here is that you have to decide if you're trying to compute marginal wins relative to a zero-level team or relative to actual replacement level.

Fundamentally, I think it's pointless to try to quantify specific replacement levels for fielding, pitching, and hitting. A replacement-level player is one whose overall contribution is at the replacement level, not whose fielding, batting, or baserunning skills specifically are at replacement level. Unless the correlation between fielding skill and hitting skill is exactly one, the overall replacement level at a position is better than the combination of replacement-level hitting and fielding.

Posted 4:52 p.m., December 16, 2003 (#22) - tangotiger
Tango, you are implicitly assuming that a team with average offense and replacement-level defense would have the same record as one with replacement-level offense and average defense.

First of all, let's not forget what Bill James is trying to do. He's NOT after wins above replacement, but wins above zero. James is SOMEHOW trying to get there by doing wins above some baseline, and when he adds up these wins he again SOMEHOW ends up with an absolute wins total.

Therefore, in terms of apportioning ABSOLUTE wins, you've got to assume that the split is 50/50, since the standard deviation of RS and RA are the same across all eras. If you want to use something else, you've got to prove it.

I am not at all saying that a replacement level def and avg off, and vice-versa, are equals. This is not a question about replacement level as the rest of us talks about it, but as Bill James talks about it.

A team that scores 3 and allows 5 will win as often as a team that scores 5 and allows 8. Therefore, those are the boundaries. Why would Bill James' boundaries of 2.5/7.5 be more right than mine?

I don't see this as a given. In my example of slow-pitch softball, the wins cost by replacement-level defense would be less than the wins cost by replacement-level defense in baseball. Requiring the two numbers to be reciprocals of each other is only valid if you believe that replacement-level defense is as bad as replacement-level offense.

The real replacement level, unlike the Bill James baseline level, can be completely determined by the overall talent distribution of the pitchers and non-pitchers. My best guess is that the talent distribution of the nonpitchers is wider than that of the pitchers, and therefore, the average pitcher, relative to the replacement level, is LESS than the average nonpitcher. By how much, I don't know (yet).

I disagree that 0.62/1.62 (or 0.52/1.52, 0.50/1.50, etc.) is a reasonable definition of replacement level.

We should probably use different terms. The real replacement level is .80/1.25. The Bill James marginal baseline level should be .61/1.61.

Fundamentally, I think it's pointless to try to quantify specific replacement levels for fielding, pitching, and hitting.

I never said as such, and I apologize if I said something that might have been construed as such.

A replacement-level player is one whose overall contribution is at the replacement level, not whose fielding, batting, or baserunning skills specifically are at replacement level.

Yes, I know, as I railed against BP's WARP calculation precisely for this reason.

Posted 6:24 p.m., December 16, 2003 (#23) - David Smyth
Bill James has admitted, I think, that WSh undervalues pitchers. It should not be assumed that there is only 1 cause--it may be that there there are problems on the different levels (offense vs defense, pitchers vs fielders, pitchers vs other type pitchers) that happen to go in the same direction instead of canceling out.

In any case, the main subject of this thread is the larger offense vs defense level. It is interesting that, because James has put himself "out there", it is very easy to trace what has gone on.

Tango used the phrase "house of cards" to describe another system (where it doesn't really apply), but it applies to Win Shares like paint applies to drywall.

James first constructs the system with the "ideal" .5/1.5 ratio. Then he realizes that pitchers (or, more properly, defense as a whole) are being "undervalued". So he starts moving off the .5/1.5. But when he gets to the region (say, the Fibonacci) which takes care of *that* problem, another has surfaced--that of having regular players and even a few teams which are below the zero point on offense. So James, having no clear conceptual underlying fabric, tries to compromise by moving to a point (.52/1.52) which does nothing other than produce an end result which is the most palatable to his own sensibility.

But that doesn't mean that he has solved the problem within the system. And, he is quite aware that he hasn't. His "house of cards" is still standing, but it is weak. To make us (and perhaps himself) think that it is strong, James adopts the strategy of acknowledging the problem (of undervaluing defense)--thereby selling us on his honesty and awareness--but at the same time writing in a tone which is intended to minimize (in our minds) the magnitude or significance of this drawback.

The system is state of the art and must go on, he says. Every system has imperfections and is capable of being refined, he says. Twenty years from now, someone will solve the problem, he says--maybe even his own son.

Sorry Bill, but these types of problems are not really capable of being solved within the system--because that's not where they are located. The problem is the system itself.

Posted 6:49 p.m., December 16, 2003 (#24) - AED
Yes, there's a big difference between whether you are measuring contributions above replacement level or contributions above the Bill James baseline. Replacement level is something that can be easily determined by averaging the typical contributions from replacement-level players. More to the point, replacement level actually means something in terms of a player's value. If a player is below replacement level, he has zero value since you can find someone better to replace him without too much difficulty.

So I'm not sure why one would prefer Win Shares instead of value over replacement, but back to that topic. The conversion from Pythagorean winning percentage to run differential is merely a first-order Taylor series approximation:
N*(s^2)/(s^2+a^2) = (s-a+N*m)/(2m) = (s-N*bs)/2m + (N*ba-a)/2m
where N is number of games, s is runs scored, a is runs allowed, m is the league average per game, and bs and ba are the baseline values for runs scored and allowed. This approximation is only valid where the winning percentage is between 0.3 and 0.7, not over the entire range of possible values. The baseline values of ba and bs are set where the winning percentage is zero in the approximation but much higher than zero (about 15%) in the Pythagorean equation. Therefore, using the Pythagorean equation to find the baseline values is not correct. Instead, you should look back to the the fact that the Taylor series approximation is centered on the league average, where a run scored is equally valuable as a run prevented. As such, the most reasonable choice for baseline values is 0.50/1.50.

Adopting Tango's 70/30 split for defense, this would give 50% offense, 35% pitching, and 15% fielding. However, to avoid the double-counting of "value over replacement" problem I mentioned before, one might assume that the variance in offense and fielding is largely independent. In other words, the total variance from 50% from one variable and 15% from another is similar to 52.2% from only one variable. This is equivalent to giving position players 60% of the win shares and pitchers 40%. Dividing the position players' contributions back for the right offense vs. fielding breakdown, this means that win shares should be 46% offense, 40% pitching, and 14% fielding. This means baselines of 0.54/1.54 and a pitching/fielding split of 0.74/0.26.

Posted 8:45 p.m., December 16, 2003 (#25) - ColinM
Damn AED, that's some great stuff. I was going to elaborate on my earlier post but all I can really say now is... yeah, what he said. This is exactly the way I was thinking when I mentioned a moving piching/fielding split depending on the replacement level that you set. (Except I won't pretend to have been thinking in such nice, clear, statistical terms).

Posted 9:55 p.m., December 16, 2003 (#26) - studes (homepage)
In other words, the total variance from 50% from one variable and 15% from another is similar to 52.2% from only one variable. This is equivalent to giving position players 60% of the win shares and pitchers 40%.

I apologize, but this conversation is beyond me. However, AED, I particularly did not follow this thought, and would like to. How did you get to the 52.2%, and then to the 60/40 split?

Thanks.

Posted 2:43 a.m., December 17, 2003 (#27) - Michael Humphreys
This is one great thread.

I'm on board with the basic idea that replacement level is the appropriate measure (though it would be an enormous task actually to measure it, and require all sorts of subjective judgments, particularly for non-starting pitchers), that outstanding pitchers are obviously significantly undervalued (just look at Pedro's runs allowed--can't get any more direct than that), and that fielding ratings are way too compressed--Derek Jeter deserves fewer Win Shares; Mike Cameron more. Given these very obvious and fundamental mismeasures of value, one has to wonder whether it's worth refining the system.

Perhaps there is a way. And maybe what may help us find a way is thinking about how Win Shares manages to do a good job on *offense*.

We all know how to value offense reasonably well. And because we can, we're able to draw a very clear line as to what constitutes a "zero" value (.250?) hitter under Win Shares, so we can add up Runs Created per player above that line in such a way that they actually add up to runs scored by the team above the league-average-minus-50% line. Studes has some great graphs showing how well Win Shares works on offense.

It seems to me that if we're going to make Win Shares work on its own terms (more about the benefits of that), we need to find a way of explicitly and accurately *measuring* that ".250" line per pitcher (independent of fielding) and per fielding position (independent of pitching).

Somewhere in Win Shares (the book) James admits that he adjusted the fielding baselines to get more "variance" into fielder ratings. Maybe I missed it, but I don't believe that James anywhere says that "such-and-such combination of Ks, BBs and HRs (to use a DIPS approach) constitutes a .250 pitcher," or "such-and-such level of (context-adjusted) assists constitutes a .250 shortstop." He just sets baseline for fielders ("Subtract .200 from each Claim Percentage"--page 67) and looks to the pitcher's ERA.

Here's how we could do it. We need to examine--using DIPS, UZR, DRA, AED's system, David Pinto's system--what a team of .250 pitchers and fielders would actually look like, position-by-position, in terms of runs "allowed" relative to the league average. In other words, we need to find the lowest baseline at each position (based on actual levels of atrocious performance) that would cause the team collectively to give up 50% more runs than the average team, in the same way that we can (obviously far more easily) "construct" a team of ".250" hitters. (And by the way, offense and defense do in fact have approximately a 50/50 impact on wins.)

Now maybe there's no way of "mapping" such UZR, DRA or AED values "onto" pitching and fielding Win Shares. James believes it is impossible to context-adjust individual fielding statistics. So he first fixes fielding and pitching Win Shares at the team level before allocating them around using various 40/30/20/10 and other formulas. He also puts rigid limits on minimum and maximum team fielding win shares, which probably adds distortions.

Though the same "top-down" principle is applied to offense, it somehow seems different--as if he really calculates Runs Created per player *first* and *then* adjusts them to "fit" the number of runs the team actually scored.

But perhaps all we need to do is de-emphasise DPs and errors in allotting gross team fielding win shares, increase the weight for (park-adjusted) DER, and refine the weights James assigns for walks, strikeouts and homeruns allowed. And maybe we could try a DIPS approach that causes the pitchers' ratings to add up to the team SO/BB/HR rating. (Yes, I know pitchers have some effect on BIP. I have my own simple approach to this problem and there are others out there as well.)

Another complicating factor is that James effectively position-adjusts *offense* by adjusting *fielding* win shares. (See the cryptic reference to the "intrinsic weights in modern baseball" on page 67). He grants catchers a lot of fielding win shares to get their overall ratings up without grappling with the issue of whether catchers really add that kind of value *in the field*. Maybe what we could do is simply ignore the "intrinsic weights" in actually trying to measure ".250" fielding and then add/subtract explicit position adjustments.

Now you may ask, "Why the hell should we go through all the effort to measure whatever ".250" fielding and pitching is, when we could go straight to a measure of *replacement level" performance? After all, .250 is not replacement level--it's far below it. And the problem is further compounded in that we're effectively adding greater-than-.250 value on both offense and fielding, which happens with Davenport's system. The resulting system will probably still accord somewhere between 5 and 10 Win Shares to a full-time player who adds no "major league value", i.e., value above replacement players available in the minor leagues.

Two answers. First, it's not clear what the replacement level is--.350? .425? To some extent, as James so aptly says somewhere (though not in connection with replacement value), "You just have to pick a number." At least when you're comparing full-time players, you'll still probably come out with marginal differences in Win Shares that reflect real differences in contributions to wins.

Second, the Win Shares system needs that extremely low baseline to prevent wins-per-run distortions when teams out- or under-perform their Pythagorean projection. I think you can guess how the math works without an example, but think about it this way--if we used a .425 line, and the team outperformed its Pythagorean projection by 5 wins or so (which happens all the time), you'd get some huge win shares for the marginal runs created or saved. Even using the .250 baseline, you get some fairly big distortions--think of the 1969 Mets players. Cleon Jones was not a 30-Win-Share player, even in 1969.

But I like the idea that player Win Shares actually add up to team Wins (multiplied by three). A more "accurate" replacement level system would probably have to forego that nice feature.

Maybe all of this is too much effort to make a system work that is based on a fundamental contradiction between marginal and absolute value, and that is unbelievably complicated and prone to clear errors. Nevertheless, I'm pretty confident that if we actually measured that .250 line for fielders and pitchers we'd see a lot more Greg Luzinskis "zeroed out" in the field (as they should be and currently aren't), which would free up more Win Shares for strong pitchers and fielders. And we wouldn't have to resort to mathematics dating from the Middle Ages (pace Fibonacci) to get the values to make sense.

Posted 3:28 a.m., December 17, 2003 (#28) - Michael Humphreys
Sorry for the long post and for failing to make the most important point explicit (I'm recovering from a brutal exam and my brain is fried).

The whole idea of the above is that we need to measure actual ".250" fielding and pitching (independent of each other) and give credit for value (claim points, runs saved, whatever) above that level. James just posits "floors" without explanation, and the floors selected result in terrible fielders getting defensive win shares that belong to better fielders and to pitchers. If that were fixed, we could go with the 50/50 offense/defense split that actually reflects reality and get the pitcher ratings up.

Posted 6:45 a.m., December 17, 2003 (#29) - AED
Studes, sorry for making a few leaps of logic. I'll step through it.

1. The approximation of Pythagorean wins to a linear function of run differential is centered at 50% wins (zero run differential), so by definition the marginal win value of a run scored equals that of a run prevented.

2. Win Shares per skill (offense, defense, or fielding) are awarded proportionally to the standard deviation in wins gained or lost because of that skill. This is pretty straightforward, as it means that if you are comparing batters to 2-sigma bad batters to compute their values, you should also be comparing pitchers to 2-sigma bad pitchers and fielders to 2-sigma bad fielders.

Putting 1 and 2 together, one concludes that Bill James, at least, feels that the standard deviations of offense, pitching, and fielding have a ratio of 52:35:17. Since most of offense and fielding are from position players, one can estimate that the ratio of standard deviation of total position player contributions to that for pitchers is 54.7:35, or 6:4. This means that pitchers really should have 40% of the win shares.

Dividing the 60% of win shares given to position players back up into 75% offense and 25% fielding, the final distribution of win shares is 45%, 40%, and 15%. (This works out differently from the previous example since I'm using James' breakdown of pitching/fielding rather than Tango's. For Tango's breakdown the final distribution is 46/40/14.)

This isn't quite right, though, because a standard deviation of 7 in pitching and of 3 in fielding gives a defensive standard deviation of 7.6, rather than 10 (a fact I overlooked in my earlier post). Accepting that the standard deviation of runs scored equals that of runs allowed, let's call the ratio of offense to defense 50:50. If pitching and fielding are statistically independent (mostly true), a 7:3 ratio of pitching:fielding standard deviations would give an offense:pitching:fielding ratio of 50:46:20 in standard deviations. Again assuming that offense and fielding are uncorrelated, this gives a ratio of 54:46 for position players to pitchers; breaking down the position player standard deviation into a 5:2 ratio of offense and fielding would distribute win shares by 39% offense, 46% pitching, and 15% fielding. In other words, the .52/1.52 baseline should become .61/1.61 (albeit for entirely different reasons from Tango's Fibonacci argument), and the average distribution of defensive win shares should be 75% pitching and 25% fielding.

With a win advancement program, you can calculate these values directly as well as check for correlations between team fielding, batting, and pitching. My calculations of this sort using Retrosheet's 1991 and 1992 data indicate that win shares should be distributed something like 37% offense, 45% pitching, and 18% fielding.

I'm not sure which breakdown is better, but it does seem that pitchers deserve much more than the 35% of win shares they are currently allocated -- something closer to 45-46% would be right.

I'm not sure Win Shares is worth spending any more time tweaking, aside from the overall distributions. As Michael and many others have pointed out, there are serious problems throughout, so it's useful only as a nifty toy. If you really want absolute wins, you should compute wins contributed relative to replacement level and then sprinkle the remaining wins among a team's players using 39/46/15, 37/45/18, or some other reasonable percentage of offense/pitching/defense.

Posted 9:10 a.m., December 17, 2003 (#30) - tangotiger
Great stuff AED! I agree that Win Shares has a multitude of problems. Though it seems that people prefer handling problems, as evidenced by the posts in this thread!

I've done some preliminary work on the talent distributions of hitting, pitching, fielding, baserunning. From this point on, I'm going to talk on a PER-PLAY BASIS. While the distribution is in that order, when you consider that hitting,fielding,baserunning are really the same player, something wonderful happens. The talent distribution of the pitcher and nonpitcher are close to each other. Again, on a per-play basis. (I don't want to say that they are the same, because this is all preliminary work, and I still have lots of work to do.)

However, the nonpitcher is involved in 63% of the plays and the pitcher is involved in 37% of the plays. I'm not sure if that means that the split should be that.

***

Using Win Advancement, which I'm always working on on-and-off, I get about a 38% allocation to pitchers on wins above replacement (the "real" replacement level). However, I don't have the problem that James has about capping the pitchers on the bottom at 0, and as I noted in another thread, RJ and Pedro come out much much higher than James does, on a wins above average metric.

***

Still lots of work to do.

Posted 12:16 p.m., December 17, 2003 (#31) - studes (homepage)
AED, that's great. Thanks for taking the time to go through the details. I'm not a mathematician (my son got those genes, which seem to skip a generation) but I'll spend a bit of time with your description and see what I can apply to Win Shares.

I know Win Shares drives some folks crazy (and I understand why) but hey, I'm having fun with it on my site, and a few people seem interested. If nothing else, all this work will help me better present some of the stats and graphs on my site next year. I certainly don't intend to re-work all of Win Shares, but a few tweaks can make the output more valid.

Having said that, Michael, if you want to try and tackle some of the methodology you outlined, I'd be happy to try it. I'd need a lot of supervision, however!

Anyway, thanks for the dialogue.

Posted 1:35 p.m., December 17, 2003 (#32) - studes (homepage)
Putting 1 and 2 together, one concludes that Bill James, at least, feels that the standard deviations of offense, pitching, and fielding have a ratio of 52:35:17. Since most of offense and fielding are from position players, one can estimate that the ratio of standard deviation of total position player contributions to that for pitchers is 54.7:35, or 6:4.

This is where I still get lost. Has this methodology been covered in another thread that I'm missing? How does one make the leap from 52:35:17 to 54.7:35?

Posted 1:49 p.m., December 17, 2003 (#33) - tangotiger
I don't know how, but you are adding two distributions, and so you would expect the spead to increase.

new dist ^ 2 = dist1 ^ 2 + dist2 ^ 2

So, if you have a standard deviation of "3" for the first one, and a "1" for the second one, the standard deviation for the new distribution would be sqrt(3^2+1^2) = 3.16

So, if the "3" corresponds to "52", then 3.16 would correspond to 54.77

Posted 2:28 p.m., December 17, 2003 (#34) - AED
Right. If you have two independent variables with standard deviations of x and y, the standard deviation of the sum of the variables equals sqrt(x^2+y^2).

So in this case, sqrt(52^2+17^2) = 54.7.

Posted 2:37 p.m., December 17, 2003 (#35) - tangotiger
Assuming a 70/30 split between pitching/fielding, 90/10 between hitting and baserunning, and a 50/50 on off/def, and using the stdev^2 = stdevA^2 + stdevB^2, then we get the following breakdowns:

hitting = 41%
pitching = 38%
fielding = 16%
baserunning = 4%

How does this match to the 50/50 off/def split?
41^2 + 4^2 = 38^2 + 16^2

So, the pitchers get 38% and the nonpitchers get 61%. (Obviously, I shouldn't have rounded off. Any case, these are just estimates. )

As well, the pitchers are also fielders and hitters, so you can probably say that nonpitchers and pitchers would have a 60/40 split.

Posted 2:37 p.m., December 17, 2003 (#36) - studes (homepage)
Ah. Thank you.

Posted 3:29 p.m., December 17, 2003 (#37) - AED
Tango, that's not right, unless there is a perfect correlation between hitting, fielding, and baserunning. In reality, the correlations are close to (but not exactly) zero. This means that the hitting, fielding, and baserunning standard deviations should be added in quadrature; this makes a ratio of sqrt(41^2+16^2+4^2) = 44% nonpitchers and 38% pitchers. So pitchers get 46% of win shares and nonpitchers get 54%.

Posted 4:27 p.m., December 17, 2003 (#38) - Guy
I think there may be a problem with the starting assumption that "the standard deviations of offense, pitching, and fielding have a ratio of 52:35:17." Assuming that the ratio of offense SD to total defense SD is 52:52, and the ratio of pitching SD to fielding SD is something like 2:1, it does NOT follow that the ratio of offense to fielding is 3:1 (52:17). For the very same reason that one can't simply add the SDs for offense and fielding and get the combined SD for position players, it would follow that the inividual SDs for pitching and fielding -- also largely uncorrelated -- can't be added to equal the total SD for defense. So if SD for total defense does indeed equal 52, then the SD for pitching must be more than 35 and/or the SD for fielding is larger than 17, using this scale. Right?

Can anyone provide the actual team-level SDs for all these factors? Might help clarify the discussion.

Posted 4:54 p.m., December 17, 2003 (#39) - AED
Guy, this is what I did in my earlier post (#29). I mentioned that Win Shares seems to be based on the assumption that 52:35:17 is the ratio and ran through the math (to answer Studes' question), but then corrected for the statistical independence of offense and fielding, which is how I got 50:46:20. This results in pitchers getting 46% of the win shares.

In the same post, I also used actual team-level SDs of the factors computed using win advancements to estimate 45% pitching Win Shares and essentially confirm the result obtained from assuming equal offense and defense and a 70/30 pitching/fielding division of defense.