Individual Poster Page

See copyright notice at the bottom of this page.

List of All Posters

 


Aging Patterns

June 26, 2002 - tangotiger (www) (e-mail)

In a study I did in March 2001 (which included the hitter's last year, but used a much larger sample of players): hitters improve their walk ratio virtually every year, they strikeout the least at age 29, get their best HR ratio at age 27, their balls in play success goes down almost instantly, their line drive power stays pretty flat for a long period of time, their speed as measured by triples goes down instantly, their speed as measured by SB peaks at 24 and goes down almost at the same rate as the triples.

My intention is to eventually re-run that study but with the new information I've discovered recently regarding the last year effect.

A thread with all the data can be found here - http://baseball.fanhome.com/forums/showthread.php?threadid=662692#post1958322


Aging Patterns

June 26, 2002 - tangotiger (www) (e-mail)

As to your other question on different aging patterns for different types of players, I also took a look at this a while ago. My sample set was pretty small, and so, I wouldn't want to make any strong conclusions based on it, but the evidence was showing that all types of players age the same way. The Tim Raines class of runners would lose his abilities across the board (SB, HR, Hits) to the same extent that the Wade Boggs class of runners would. This is another area that I will be (eventually) looking at.


Aging Patterns

June 26, 2002 - tangotiger (www) (e-mail)

I usually only look at 1919 and later because I don't think that "power" is well-represented in the pre-1919 time period. Even HR are not true representation of "power", but it's still pretty good to use.

In that March 2001 study (which I'll reiterate used a slightly different methodology), I found virtually no difference between the aging patterns of the various skillset between 1919-1979, and 1979-1999.

In that study, I concluded the following: "...the historical averages match up very well with the recent period. While today's ballplayers may be better, and playing longer, the "curve" of their aging is the same. There is no age bias with today's regimen of training and medication. It affects all age groups the same."


Aging Patterns

June 26, 2002 - tangotiger (www) (e-mail)

jmac: yes, I agree that you have other forces at work. This is why I first presented that large chart breaking up the performances by number of years in the league (for players debut at age 25).

What I can do is present a similar chart for players debut at age 22, 23... etc, so that we can determine a more specific pattern. The only reason I did not do so is that it would be so overwhelming that the reader wouldn't know where to begin. As well, this kind of analysis suffers from sample size issues, and so conclusions will not be reliable. Let me think how I can best present such data.


Aging Patterns

June 27, 2002 - tangotiger (www) (e-mail)

In other words (and more simply), age is calculated as of Dec 31st.

I was wondering when someone would say that. Yes, I simply make Age = Year - YOB . Not only is it a snap to calculate, you also don't need to know them month the player was born.

Do you know that batting averages ...follows a normal istribution?

I seem to recall looking at this a long time ago to determine how many single-hit and multi-hit games a player would have, if it did follow such a distribution. And it did.

Do you stick to that in strike years, when 300 PA were harder to come by?...

Yes, on all counts. It's not as if I can say that the reliability of 300 PA in 1982 is similar to 200 PA in 1981. 300 PA is 300 PA. In the cases like this, my sample size goes down somewhat. My other option would be to limit my sample set so that 1981 is not part of the study. I can do this for pairs of seasons, but at the extent that I looked at this issue, I needed to have 5 and 10 and 15 consecutive years. Removing 1981 from such a study would drastically reduce my sample size. Your point is valid however.

I do think the league and park adjustments should be done, so that there is more confidence in the conclusions.

I agree. As my sample size goes down, these adjustments become much more important. As for followup articles, I think I'm going to have to make it a whole series of them because there are so many results that this dataset will give us.


Aging Patterns

June 28, 2002 - tangotiger (www) (e-mail)

One of the previous posters mentioned that players who play longer will have different aging curves than those who don't.

Working always from the same sample, I broke my main sample into three subsets. The first subset is those players who play between the ages of 24 (or earlier) and 33 (or later). This is a group of players that have at least 10 years experience, and who have had the chance to play during the "traditional" peak years. The second sample are those players whose career was over by the age of 31. Therefore, these might be those guys who you think might have peaked earlier. The third sample is everyone else. This means, it's a group of players whose career was over after the age of 31. These are just a whole bunch of different types of guys, but skewed slightly towards the older players.

Because of sample size issues, some of the results might look strange. In any case, here it is:
Age # Long # Short # Rest
19 2 0.565 1 0.810
20 5 0.687 4 0.871
21 25 0.816 15 0.921 3 0.936
22 61 0.857 36 0.939 9 1.025
23 99 0.930 81 0.999 16 1.007
24 140 0.936 131 0.987 21 0.972
25 140 0.982 167 0.994 79 0.929
26 140 0.989 191 0.994 119 0.973
27 140 1.000 181 1.000 166 0.991
28 140 0.987 141 0.994 185 0.989
29 140 0.981 80 0.966 203 1.000
30 140 0.979 218 0.969
31 140 0.977 162 0.980
32 119 0.954 114 0.976
33 88 0.938 80 0.951
34 58 0.930 55 0.945
35 39 0.885 35 0.930
36 27 0.887 20 0.905
37 17 0.856 11 0.937
38 11 0.842 4 0.906
39 5 0.809 1 1.014
40 4 0.790 1 0.942
41 2 0.672
42 2 0.563

Hopefully, the formatting comes out here.

What do we see? As you would expect, those players with long careers centered around the expected peak years did just that. They had their best years at the 25-29 level, peaking at age 27. They had a bit of a jump prior to that. Then they had a slowly declining phase to age 34, after which they plummeted. However, don't forget we specifically selected our subset for players who played between the ages of 24 and 33. Therefore, we should not be surprised to see demarcation points close to these ages. Furthermore, the peak point was still age 27. The slowly declining phase is a result of the selective sampling.

The second set of players is far more interesting. These are players, for whatever reason, had their careers over by the age of 31. These players did not have the traditional aging curve. Essentially, they stayed at a peak level between the ages of 23 and 29. Is it possible that there is a class of players that don't get to the next level? That perhaps, there is a class of players that peaks at age 23 and stays there? Or is this again a bias in our sample? That because we specifically chose our players whose careers ended prior to age 31 then this is exactly what we expected to see? This is a much more likely result. Management did not give the players a chance to show their stuff, and simply cut them before their truly good seasons could be shown.

The last set of players has a bias to be an older type of player, and the results show this. These players peaked between the ages of 26 to 32, peaking at age 29.

Selectively choosing your sample set leads to many biases.


Aging Patterns

July 1, 2002 - tangotiger (www) (e-mail)

Do you feel the Linear Weights Ratio is a much better measure of offensive performance than OPS? If so, what evidence do you have? If not, may I suggest you use OPS+ in your future studies? On June 26, I posted the following on fanhome "[OPS]'s extraordinarily useful and practical because: - it's readily available - it's made up of the two most important rate stats we have - it's highly correlated to runs scored - it can be used in research when you have the power of sample size that masks its deficiencies

It is NOT useful because: - you can not count on it for game-level decisions - you can not count on it to evaluate players of weird profiles - it does properly weight all the events

So, depending what you are trying to do, OPS is either a godsend or a bane.

The reason I hate it is that people use it for the exact reasons that it is not useful."

In sum, OPS is used as a stand-in when you don't have something better. LWR is something better.

I'm not sure why I need to show "better" evidence to use LWR over OPS. All the deficiencies of OPS are taken care of in LWR. I can use LWR to convert it into Runs Created or Runs over average, or really anything, in a simple one-step process. LWR is the best "rate" measure we have. (If you want more discussion on LWR, you can check out my site at http://www.geocities.com/tmasc/lwr.html which will give you the full formula, as well as a link that discusses LWR.)

******** Not only will this take care of the adjustments for park and year which you indicate are necessary, but it will make it easier to incorporate your findings with other studies that are OPS+ based...

I have the data to calculate the park/year adjustments, I just didn't want to add another layer of complexity. (If someone wanted to reproduce the above research, they could. If I added park and year adjustments, they couldn't.) As I indicated, I'll add that layer the next time. All studies that are OPS+ based are flawed for the reason that they rely on an indicator that has deficiencies that are circumventable.


Felipe Alou: Is He Afraid of the Walk?

November 13, 2002 - tangotiger (www) (e-mail)

I know that Walker hated the way Felipe would talk to him about hitting approach.

The first poster is doing exactly what I said we shouldn't: looking at team totals.

However, the first poster is correct that Alou does have a choice from within the 15 hitters which ones to play. But Alou was not dealt a good hand. If you're given the bottom of the barrel, you should expect to have low walk totals period.

The team was selected by the GM, and therefore, the low team walk totals is more an indication of the type of team that the GM has selected.

I'm going to continue the analysis tomorrow, looking at it from another angle. I don't know what the results will be, so I'll report whether they favor Alou or not.


Banner Years

October 31, 2002 - tangotiger (www)

Good comments, guys. I actually meant to address these two issues, and I'm glad you brought them up.

Age: definitely should be looked at, but I can tell you that there is no big age bias, even with the 149 group. I will do the breakdown, hopefully before the end of the day.

Banner selection: one of the considerations I had was that I did not want to select players that were say 110-110-110-140, because of the "regression towards the mean" issue I brought up. That is, even though you've got a guy who you think is a 110 level for three years, he might actually be 107 or 113, etc. The closer you are to 100, the more likely it is that this is a 100 player. Furthermore, by introducing all players, then I get into trouble with losing players. While it is unlikely that you will lose a player from the pool who is 100-100-100, it is very possible that you might lose a player who is 80-80, thereby introducing a bias. Of course, this also depends on position.

Having said that, I was thinking about running through the data anyway, and see what happens. And with the much larger sample size this would allow me, I can select 35% above "previous 3 years" to really highlight the banner years. I'll try to get to this next week.


Banner Years

October 31, 2002 - tangotiger (www)

Walt, good comment at the end, and this is exactly what I did with the HR study I linked to. And rather than seeing a "retention", we see essentially that what the player did the year after the banner year was repeated the year after that as well.

Again, what I am talking about is not "retention", even though I used that term. We are presuming that a player's performance level is a sample of his true talent level. Therefore, by selecting 130-100-100, I am choosing those players that had a great year followed by 2 average years. This does not imply that this player had an injury or something that forced him to go down to 100. The more years I tack on, the smaller my sample size. You are correct that I can simply show year1, select on years 2 through 4 (whether 100-100-130, or 130-100-100), and then look at year 5. My guess is that year 5 production will be only slightly different than year 1 production, age notwithstanding. This is a good idea, and I will run that next week as well.

Walt: any comment about the Hank Aaron issue?


Banner Years

October 31, 2002 - tangotiger (www)

MGL, yes, I agree with almost everything you said. Two points:

1 - yes, not my best writing work, as I wrote it in 30 minutes, but what is it that is unclear? Was it the weighting thing at the end? It basically means that you put more weight on the most recent year, and you have some weight to regress towards the mean. Or was it something else?

2 - As for the 149,149,149, which I selected for, the 4th year was 142. However, you then say that this group is actually a "147" group . Not! Because my group is "fairly large", then I would say that this selected group of 149,149,149 is a 142 player. And, if I looked at the 5th year, I would bet that this group would also exhibit 142. I would also bet that the year prior to 149 would also be 142. I would say *every* year around the 149 years would be 142. Do you agree? (Age of course is an issue if I start going crazy and start to consider pre-24 and post-36, etc years.)

However, for a single player, if I had a 149,149,149,142 player, since I didn't select such a player, then I would have to guess that he is a 147 player.

I think we are on the same page, but I'm not sure.

*** As for parks and changing teams, etc, yes that is always a problem. It's "possible" that the park may play an influence in the selection of my players, but I doubt it. The banner year was 25% above the base years, and so, while playing at Coors does increase the chances that he will be selected in the banner year, I don't think this is the case. I'll look into it though.

*** By the way, the more I look into this, this is just like MGL's hot/cold streak study. While he is looking at 15-day periods, I am looking at 3-year periods. We are (or will in my case) looking at the pre-selected and post-selected period, and we are (or might/will in my case) finding that those two values pretty much match, regardless of the intervening period.


Banner Years

October 31, 2002 - tangotiger (www)

First off, I'm not trying to capture ALL banner years, just some of them. As well, I am not suggesting at all that 149,149,149 is banner performance. I am using that type of player to show that a 149,149,149 player is not in fact a 149 player but a 142 player.

So, when you look at a 130,100,100 player, a player that certainly had a banner year, we should treat the 130 with some hesitation, since, as we've seen, this performance was "lucky" in some respect.

****

Anyway, I've re-run, so that we have "x", 149,149,149, "y". That is, how did the players with 3 great years do just before the "banner 3 years", and just after? Here are the results

Year 1 1.42 Year 2 1.49 Year 3 1.49 Year 4 1.49 Year 5 1.41

This population of players had 593 strings.

Now, if we break it down by age (in Year 5), this is what we get: Age 1 2 3 4 5 n

34+ 1.46 1.51 1.51 1.48 1.37 173 30-33 1.44 1.49 1.49 1.47 1.41 229 29- 1.36 1.46 1.49 1.51 1.45 191

Again, as you can see, the "3 selected years", were pretty constant around that 149 level. The before/after years are consistent with the age grouping. But, in all cases, the before year was less than the selected period, even for the old guys.

There is also about an annualized 2% change in performance level between year 1 and year 5, which is also consistent with my findings in aging patterns previously done.

So, the "true talent level" is year 1 and year 5, and everything in-between is "lucky".


Banner Years

October 31, 2002 - tangotiger (www)

MGL, sorry for the bugaboo.

To go back to your question, let me amplify. The 149 performance is regressed 14% towards the mean to match the "expected probable" true talent level of 142. So, generally speaking, we should regress all 3 year performances by 14%.

Now, of the three remaining components (year x, year x-1, year x-2), we weight the most recent seasons (x) as 38%, and the other two as 24% equally.

As a shorthand, rather than remembering kooky percentages, you can apply integer weights of "3" for "x", "2" for "x-1","x-2", and "1" for "mean". Maybe I should have skipped this part, as it's probably more confusing than it should be.


Banner Years

November 1, 2002 - tangotiger (www)

Good job, MGL!

The mean of the players who played in those 5 year spans, with at least 300 PA is 115%. Now, this may sound like alot, but don't forget, we have alot of repeating players in there (like Aaron).

I don't think that the regress towards the mean would regress to 115%, but I'd like to hear from the statistics-oriented fellows about their thoughts on this matter. I would guess at this point that the Aaron situation comes up, and I should identify unique players only.


Banner Years

November 1, 2002 - tangotiger (www)

Since age is an issue, and I can easily control for it, I will re-run using that.

As well, the "mean" of the players is 115%. If we look only at one age group for the 5 year period (say ages 26-30), we see that each year they average 115%. If we select any other time period like 24-28, you also get similar results. And of course, no player could possible exist more than once in each age group. Therefore, the mean is 115%.

Therefore, I should probably select players that center around 115%, and that center around the 27 age group. I'll get to this next week.


Banner Years

November 1, 2002 - tangotiger (www)

MGL, maybe you missed my last post, but if I only look at one 5-yr period, say ages 24-28, then of course Aaron can only exist once in this string. And, the players in this group are 115% of league average. Now, if I select some other age group, the unique players in that group are also 115%.

However, if I decide to combine the two groups, I might have two Aarons, and two Ruths, etc. I don't see why I would want to remove one of them from the groups.

I think it would be easier to keep all the age groups separate (24-28, 25-29, 26-30, etc, etc) and report on each one separately. This removes the conflicting players, but addresses the Aaron issue. However, I don't see the problem in then combining these three age groups afterwards, AND KEEPING the mean at 115%.

Or maybe I'm missing something?


Banner Years

November 2, 2002 - tangotiger (www)

Contrarian: I've already admited my shortcomings in many areas, including statistics. I've taken enough that I can follow conversations, but that's as far as I would take it. I also know enough to apply the basics. This is no news to people who've been reading me, and any of my comments should be taken like that.

I am always interested to hear from Walt Davis, and frankly I just missed his second post (the way Primer regenerates the site, there is a lag, and Walt's post got sandwiched in-between).

I have no problems with people criticizing my approach, or my comments, or anything I do. It would be nice though if you would provide an email address so that we can correspond privately, and you can elaborate further.


Banner Years

November 4, 2002 - tangotiger (www)

Sancho, thanks for the links. The first one I had not seen, and is a not bad one. As for Albert, I'm frankly disappointed. There's a long list of math professors who have tackled baseball issues, and really either miss something, or write so dry that I miss something. (Of course, there's an even longer list of sabermetricians who miss some math issues as well.)


Banner Years

November 5, 2002 - tangotiger (www) (e-mail)

Shaun, I agree age should be taken into account, and I'm currently working on this. I should have something to show as soon as I get the time (which these days is not too much).

As for contract status, certainly this would have an impact. However, by having an aggregate of players, this impact should not be noticeable too much. And of course, since my data is from 1919 onwards, there's an even smaller population which would even be affected by this at all.

As for learning and improving, etc. This is the issue. Is it the case that the player is learning and improving, or is it simply random chance that the player happens to have a banner year. Hopefully, with the new data I have, we'll have a better answer.


Banner Years

November 7, 2002 - tangotiger (www) (e-mail)

MGL, no F James specifically said that these 149,149,149 players would not regress, except for aging. In fact, they do regress to 142.

This group of players will regress towards THEIR mean, I agree. In fact, they will regress 100% towards their mean. But since we don't know what their mean is (without looking at other non-sampled years), we take the next best thing: the mean of the population they were drawn from. This mean is in fact 115%. Therefore, given the number of years (3), the number of players (I don't remember, let's say 100), and the number of PAs (let's say 500 / player / year), the best players will regress 7/34 (20%) towards the mean of the population they were drawn from. Different years, different # of players, and different # of PAs will regress differently.

Now, I know little of statistics, and perhaps Walt Davis or Ben V can put this matter to rest.

I'll be back in a week or two with detailed data, broken down by age.


Banner Years

November 8, 2002 - tangotiger (www) (e-mail)

F James: I think I wrote this already, but it might got lost with all of MGL's explanation, but the year before the 149,149,149 string was 141 and the year after the string is 142. Subsequent years drop off slightly from 142, and in fact matches what you would expect from normal aging. (This will become more clear when I do the breakdown by age... eventually, whenever that is.)

Essentially, MGL's point boils down to: whatever period you take, how many ever players you take, how many ever PAs that performance makes up, you have to regress to some degree. The amount to regress is related to the variables I just mentioned. By choosing 1 day, we are regressing almost 100%, by choosing 5 years of performance between age of 25 and 29, and in each of those years the player has 1 google PAs, you regress close to 0%. Everything in-between is subject to more analysis.

Given my sample of 3 years of 600+ players of about 500 PAs, the regression of the 149 player is 20% towards the mean of 115 to achieve the true talent level of 142 (more or less).


Let's Contract Two Different Teams

July 12, 2002 - tangotiger (www) (e-mail)

Proofreader guy: you know, I read and reread and re-reread my article, and it amazes me what I miss. How about "here" for "hear", and "marker" for "market"? Competitif is french for "competitive", so I don't think I can use the french excuse.

Common Sense: do you think that if Steinbrenner reduces his payroll from 140 million to 90 million that he will give that 50 million$ of savings to you? In fact, don't you think that now that he set up the YankeeNets that it will be very easy for Steinbrenner to claim much less revenue because the YankeeNets corporation owns the Cable rights, and not Steinbrenner?

If teams claim that they can't play in the same playing field as the Yankees, then either level the playing field by introducing teams into a lucrative market to siphon off some of that revenue, or take some of that Cable money, or realign the two leagues by market size. Let the Yankees and MEts and REdsox and Braves and Dodgers spend themselves crazy. Let the A's and Expos and Royals and Twins spend smart.

To think that by controlling player salaries that you will get an outcome that is different from today is ludicrous. Nothing is going to change. In 5 years, you'll be right back to where you started.


Let's Contract Two Different Teams

July 14, 2002 - tangotiger (www) (e-mail)

There's no question that we are introducing accountants into the fold with the owners' plan. As if lawyers aren't bad enough. How many white collar solutions do we have to introduce to "solve" the problem?

Just re-align based on market size. 4 divisions of 8 teams. The top team of each division goes foward, while the 2nd and 3rd place go into a wild-card system where the 2nd place of Divison 1 plays 3rd place of Division 4, etc. There's no need to force a socialistic solution. Just change dance partners.

There's no need to overhaul anything. If you want to overhaul, then disband the league, and do it right.


Let's Contract Two Different Teams

July 15, 2002 - tangotiger (www) (e-mail)

Common sense: it seems that you've been getting more and more common sense. How much longer before we get Commen Sense the third?

Seriously, when I say to "contract" the New York teams, I intended it to be in a humorous note. But the point of contracting the teams is to reposition the power that is highly concentrated in the New York teams. Since Steinbrenner is consolidating and hiding his power and revenue in a second enterprise (that exists only because of the first), it is unlikely that he will reduce the market value of his interests.

Why would 29 intelligent men buyout a franchise that has limited value (Expos) when they can buy out a franchise that has substantial value (Yankees). Steinbrenner used the system to its fullest, he capitalized on it with the unanticipated TV value that has created the great divide. Everyone has his price. So, buy out Steinbrenner at fair market price, and redeploy the value of the Yankees by siphoning away the cable and TV value, and selling the rest of the team to an interested buyer. That is, buy Steinbrenner's TV and cable rights away from Steinbrenner.

If that is too hard or too expensive to do (as if maintaining the status quo does not have its own expenses), then just take the "barnstorming" idea to something more palatable. Put the Yanks, Mets, Redsox, Dodgers, Braves, Orioles, Rangers, and Cubs in the "Division 1" league. Put the Expos, Pirates, Brewers, Reds, A's, Marlins, Devil Rays, and Blue Jays into the "Division 4" league.

What would happen? Well, all those Division 1 teams will soon realize that they can't hope to buy their way in because they've got too much competition for too few spots. They'll have to be smart. The Division 4 teams will realize that with just a little effort and smarts, they'll have a decent chance to make a run for the playoffs.

Once in the playoffs, anything can happen (especially if you make the first round 5 games instead of 7).

Without spending a single dollar on either side, we can reshape the entire competitive balance by simply changing divisions.

And what's more shocking: that I say to redistribute the wealth of the Yankees to the poor teams, or Selig redistributing the wealth of the Expos to the rich teams?


Let's Contract Two Different Teams

July 16, 2002 - tangotiger (www) (e-mail)

Willy Loman is all in favor of the American dream. And I didn't say to steal it from the Boss, but buy it back from him. MLB made a huge error in not securing the TV rights the way the NFL did. Now they've got to pay for it. Literally. Once they do that, the chips will fall into place. But to restrict player salaries through non-American ways? I don't think so.


Let's Contract Two Different Teams

July 24, 2002 - tangotiger (www) (e-mail)

I think the soccer relegation/promotion idea is viable. But I question the 30 teams/league decision. The disparity will still exist. Why not have a 12 team premier league, 24 team division-1, etc, etc. Which just brings us back to my proposal of having leagues segregated by market size, but having ALL of them play for the World Series. By having each league have its own championship, the fans will question the legitimacy of any except the World Series.


Let's Contract Two Different Teams

July 24, 2002 - tangotiger (www) (e-mail)

As for pay for performance, why not simply limit contracts to 1 year? And make everyone a free agent? That would make it truly free market. You'd end up paying rotisserie style prices (about 15% to your top player), because of the abundance of supply. So, a team with a 60 million$ payroll will pay say Mike Piazza 9 million$. A-Rod would have a tough time getting more than 15 million$.

So, we have a mechanism that can severely limit top players' earning potential. All owners have to do is declare everyone a free agent, and no more guaranteed contracts. Too hard. It's like going to the Playboy mansion and being told you have a chance with 1 girl, and 1 night only. The owners want control, and they want to feel empowered. It'll cost you.


Let's Contract Two Different Teams

July 31, 2002 - tangotiger (www) (e-mail)

It would turn exactly into a rotisserie style system. The top guy would get at most 20% of the payroll. In any case, it doesn't matter how much the #1 guy gets. It's the overall payroll that matters. Players will be willing to sign for below market in some cases, simply because they don't want to be left out.

Teams will have their budgets before the bidding starts, and they won't try to run up prices, because of all the other fish in the sea.

Owners need help controlling themselves, and this is the best way. And if they overpay? So what, it'll only be for 1 year. You won't have all those 5 year contracts guaranteed to worry about.


Forecasting 2003

February 13, 2003 - tangotiger (www) (e-mail)

I still have not decided how to "rank". With only 32 players, using differentials or RMSE might not be the most appropriate (esp with the Bonds thing). I could create "classes of differentials" (consider each class to be 1 SD of error, and max out at 3 SDs or something like that). Or I might use differentials, while capping the individual differential at 3 SDs. Really, it's not important. I'm going to present the full data, and the reader is free to analyze and interpret the data as well.


Forecasting 2003

February 13, 2003 - tangotiger (www) (e-mail)

...retrospectively, using the same methodology, at last year or other prior years?

I guess surprises are out of the question around here! Voros has looked at the various forecasters for the year 2000 hitters . I was going to also add in the "baseline forecast" to his list to see how that stacks up. Stay tuned in a couple of weeks.

When do fantasy drafts usually occur? The last weekend in March?


Forecasting 2003

February 13, 2003 - tangotiger (www) (e-mail)

I probably should have said this in the article.

If I were to ask the Primer readers to estimate 200 players ERA or OPS, I'd get a smattering of response. By limiting it to something reasonable (32) I hope to get a decent participation, while at the same time getting reasonable (though not conclusive) results from the forecasters. This is similar to what the WSJ does with using the top 10 picks from the brokerages. The intent is not to prove anything. I also selected those 32 players who showed the most deviations, and therefore, we'd expect the forecasters and the Primer readers to have little agreement on these.

I have also asked the forecasters to participate in a second parallel study, where they would submit the projections for a large number of players. I've only received a positive response from 2 of them. This is essentially what Voros did with his study, except he did the hard work by compiling everything himself. I can understand that the forecasters don't want to give everything away (which is why it was easy to ask them for only 32). I hope though that by the end of the season, they'll give me their list, so that I can save some work. So, you'll get the study that you are looking for, plus the other readers will have some fun (I hope) as well.

I hope this answers your concern.


Forecasting 2003

February 13, 2003 - tangotiger (www) (e-mail)

Yes, included with the ballot for the 32 players will be your estimate of MLB OPS and ERA (which will default to the 2002 level if you don't choose anything).

This is critical because if a forecaster underestimates all his projections, it doesn't matter, as long as you only use his system. Therefore, that's not a bad thing.

Really, I wanted to ask everyone to submit their OPS/lgOPS, but that loses too much meaning.

Great question!


Forecasting 2003

February 13, 2003 - tangotiger (www) (e-mail)

By the way, if anyone has a systematic forecasting system, then send me an email. It could really be based on anything, like

- weighted or unweighted recent performance
- lefty/righty splits
- gb/fb tendencies
- comparable players
- age 
- height/weight
- position
- regression towards mean
- injury history

You don't have to tell me how your engine processes everything, but just what/how does the engine consider. I can then throw you into the systematic forecaster pool. Thanks...


Forecasting 2003

February 17, 2003 - tangotiger (www) (e-mail)

Just to reiterate (or maybe iterate, since I was not very clear), the point is not to figure out who has the best forecasting system, but rather if a systematic forecasting system is any better than a baseline or back of the envelope (card) system.

What the WSJ study shows is not that the Lehman brothers have a better forecasting system (hard to say with only 10 stocks) but rather that the mom&pop do better using a baseline (the S&P500 index) than in paying off the professionals.

To determine which professionals are better, you need far more than just 10 sample points, and the WSJ also does this by looking at all stock picks. This would be part of a second parallel study if I get a decent participation from the forecasters as well, similar to what Voros did in the 2000 link I provided. However, given that I've chosen 30 players who have very inconsistent performances, I think it might show something about the forecasters, but will be far from conclusive. (If I had chosen the 30 most consistent players, my guess is that all systematic forecasters would come up with very very similar estimates. I've removed the Colorado and the inexperienced players from the study, and there again, some forecasting systems might be better with those players.)


Forecasting 2003

February 18, 2003 - tangotiger (www) (e-mail)

Erstad: good call!

The next set of players that missed making the cut were, in order: Renteria, Erstad, Beltran, Sosa, Javy Lopez, Mark Loretta, Vina, Giles, Magglio Ordonez, Garret Anderson, Ben Molina.


Forecasting 2003

February 19, 2003 - tangotiger (www) (e-mail)

Minks: no, you would only supply the unadjusted OPS. The only reason to supply lgOPS and lgERA would be to establish your basis. Suppose that you miss all your OPS projections by 50 points, but that you also projected the lgOPS to be off by 50 too. Then, this scores 100% (in my book). A person using the results of such a projection will be perfectly happy (as long as he uses only this projection).

David: the back-of-the-card forecasters are just like mom&pop investor. They each have access to their own private information and public information and intuition, and combine all their data into some sort of target price for a stock. The collection of all these investors makes up the market. You can benefit from this "wisdom" by buying the S&P500 index (SPY). The systematic forecasters follow a rigid, repeatable process, like the various brokerage houses, like Lehman and Smith Barney. The baseline is the monkey throwing darts at a stock chart. So, whether I am comparing apples or oranges I don't really care (for this study). I'm trying to put this study on the same plane that the WSJ puts its study in.

A second parallel study, looking at extended picks that the systematic forecasters provide (which the WSJ also does when selecting their best analysts), might satisfy the fruit requirements.


Forecasting 2003

February 21, 2003 - tangotiger (www) (e-mail)

Vinay, excellent. I did not know about this. We are essentially after the same goal, but where they have 27 humans projecting 125 players, I'm hoping to get the reverse (100+ humans projecting 32 players).

What is very interesting to me, and which matches the stock market with its S&P 500 index, is that the collective wisdom of the market matches the top forecaster, with all of his intricacies.

The "missing big or getting big" projections of Wilton, I think is probably attributed to lack of regression towards the mean in that system. I'd have to look at the data more carefully though. Because we are dealing with sample performances, you should expect a few guys to have seasons that are out of the norm, and therefore a system like STATS or Palmer will miss the outliers at the gain of the large population. Silver's PECOTA should give the readers the best of both worlds.


Forecasting 2003

February 24, 2003 - tangotiger (www) (e-mail)

We are trying to forecast a player's performance for the upcoming year. This performance is a combination of a player's expected true talent level, context in which that talent will manifest itself, and luck.

ERA has more luck (from the pitcher's perspective) than other measures. The point of this forecast is to try to predict a player's performance numbers, with the reader trying to do as little as possible.


The 2003 Projections

May 6, 2003 - tangotiger (www) (e-mail)

I didn't think about that sort of thing when making my projections; they were more seat-of-the-pants than that, and I assume that they were for most people.

I hope this is the case, as this is what I was hoping for. Can you get 100+ baseball fans to make seat-of-the-pants calls on extreme players, average them, and come up with something decent? We'll see in a few months...


Crucial Situations

December 3, 2002 - tangotiger (www) (e-mail)

Really? Hmmm, are you using an old version of Netscape? What's your browser version?


Crucial Situations

December 4, 2002 - tangotiger (www) (e-mail)

...but the shading doesn't print for me. Just pages of empty grids.

Hmmm... maybe I should put text and color? I'll see what I can do about that.

In some innings, does the blue shading bleed into the -4/+4 run columns (or even further)?

Good question! I was thinking about that, but since I used the same program as for Bonds, I limited to -3/+3. Maybe next time I'll expand to something larger.

...although the column headings would be better if repeated every half-inning, not just every inning.

Thanks for the suggestion! My artistic skills are not what even an average person has, so any formatting improvements suggestions are appreciated. I'll do this next time as well.

But is there anything here that isn't intuitive? I was a bit surprised by how much the leverage changes as soon as you get one guy on base, especially the late innings.

...but can there really be an argument for pinch hitting for guys (other than your pitcher) in the 3rd or 4th inning. I mean, you'd run through your bench, pretty fast. I mentioned at the end that that was not what I was suggesting. Though I would consider this if my batter was Ordonez, and Piazza had the day off.

...like what to do when you're in a particular colored box, so the practical value can be perceived. In contrast, your earlier, similar piece on when to walk Bonds seemed eminently practical, ...

There's really no end to this WE stuff. Eventually, I will be producing charts for the SB break-even points, when to bring in your reliever, should you go for the DP or try the runner at home, should you test the RF's arm, etc, etc. Any suggestions you can offer would be appreciated as well.

What are your definitions of 'Very high-leverage', 'High Leverage', etc.?

It gets a bit dry (series of math equations), but I just picked some arbitrary threshholds to try to distinguish easily the various situations. I could have put +.054 wins and +.013 wins, etc, but who the heck knows what that means?


Crucial Situations

December 4, 2002 - tangotiger (www) (e-mail)

I'll do my best.

This is what you do: 1 - Determine the WE for every inning/game/base/out for an average team. I've provided a subset of that in the initial link.

2 - Assume that your "great pitcher" or "great hitter" or whatever is going to come in for 1 PA. What is the expected WE following this player's PA? (I used a player whose component stats translates to a .750 win%)

3 - Take the difference between the two. That is the impact in wins of a "typical super-great" player for 1 PA.

The biggest swing, in this example, is about +.07 wins, and that occurs in the bottom of the 9th, home team up by 1, and you have men on 2b and 3b and 1 out. That is, if you bring in say Pedro or RJ or Mo Bonds or Thome or Giambo for ONE SINGLE PA, he will have an effect of .07 wins (assuming these guys are .750 players) over an average player.

How much is +.07 wins? Well, the typical star is +6 wins in 600 PA (+.01 wins). If you bring in Giambi IN THIS PARTICULAR SITUATION 100 times, he'll have as much impact as playing full time.

Now, now, you won't have this situation 100 times, and not having Giambi regularly in the lineup might even mean you might have this situation zero times, who knows. But this is the magnitude of the impact.

So, while Theo Epstein and Bill James are saying that tied games in late innnings are very important (AND THEY ARE!), my research shows that up by 1 for the fielding will have more of an impact to have a great pitcher pitching.

Anyway, the thressholds I used are .01 / .02 / .04. Just made them up to try to get a balance to the chart. Well, I used the .01 because that's what a great player is worth randomly. And .04 cause that would make it 4 PAs in a game. So, given the choice to hit Piazza 4 times randomly, or once in the "very high-leverage" situation, it's a wash. Of course, if that situation doesn't come up, well, you lose on the deal.

Is that enough detail? Too much?


Crucial Situations

December 4, 2002 - tangotiger (www) (e-mail)

Chris, you got it!

If you followed my "Runs Created" series, it shows that the "run environment" (really WIN environment) already exists when the batter/pitcher matchup comes up. That is, the runners on base already have a built-in chance of scoring, given the environment they are playing under.

So, if you then introduce a great player into the mix at that point in time, the entire environment changes. Now, the chances of winning change (sometimes drastically). With 1 out, more damage can be done (not only with the runner on base, but with the batter getting on base). You bring Bonds as a PH with 1 out, not only is the guy on base likely to score, but Bonds will now put himself in a position to extend the inning.


Crucial Situations

December 4, 2002 - tangotiger (www) (e-mail)

Oliver, I really don't know. You'd really have to compare how teams should make their choices optimally against how they really make their choices. And you'd have to break it down by the kinds of choices as well (steals, sacs, taking extra base, throwing to wrong base, bringing in the wrong reliever, batting order, etc, etc, etc). It's gotta be a few wins at least. I don't know, 5? 6?

In the business world, I would perform a cost/benefit analysis. But, the reason I'm doing all this baseball stuff is so that I can get away from doing these boring dry cost/benefit reports! Please don't make baseball like a job for me!!


Crucial Situations

December 5, 2002 - tangotiger (www) (e-mail)

Here's a printer-friendlier version

http://groups.yahoo.com/group/tangotiger/files/crucialpa.pdf

Still no text, though.


Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

Vinay, the starter is almost exactly 1.00. I looked at two starters, one who was good and went long (Blyleven), and one who went short and was not so good (Knepper). Bert was .99 and Bob was .98.

As for historical, I have the LI for the 20 pitchers with the most relief games from 1974-1990:

pitcherid Leverage Index suttb001 1.90 smitl001 1.76 fingr001 1.75 gossr001 1.72 rearj001 1.69 smitd001 1.52 orosj001 1.48 laveg001 1.46 quisd001 1.45 mintg001 1.45 tekuk001 1.41 garbg001 1.38 lyles101 1.35 campb001 1.31 stanb001 1.30 martt001 1.24 leffc001 1.20 hernw001 1.18 baird001 1.10 andel001 1.03

This is the LI only while as a reliever.

Sorry, but my data is limited to the pbp provided by Retrosheet.

I agree that doing the LI by year, and then doing the multiplication by year would add the "timeliness" aspect as well. I might do it for one of these guys, maybe Quiz.


Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

Thanks... good question. I haven't calculated it yet, but I have another tool (which for a lack of a better name I call the Tango Distribution), which shows the expected runs per game distribution, given the runs per game of a team. Using this, I can figure out the win% for any two teams, broken down by run differential. Surprisingly (to me), there is little difference between a .400,.500, and .600 team in terms of "number of close games". I would suspect that I could extend this to "number of crucial situations" as well, and therefore, expect that most team face a similar number of crucial situation. The more you get away from being a .500 team, the less the number of crucial situations. I'm not sure what the relationship is between team win% and crucial situations (yet). I'll keep this in mind next time I'm working on this. Great question!


Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

Craig: Mark Eichorn, in 1986, was 1.32. If you look at the list of 20 players I listed in my followups, you will note that Bob Stanley is 1.30. I think this is probably what you'd find with your multi-inning non-closing firemen. Eichorn did have 10 saves, and finished half his games, so you might be careful about how you extrapolate his usage to other relievers. In any case, his 157 relief innings is equivalent to 207 typical innings. The guys below 1.00 in LI are the true mop-up guys.

Devin: Your point is valid regarding what it would take. Since the HOF is a self-defining institution, I don't see how I can answer that question with any basis. Writers, like fans, are flying by the seat of their pants in trying to establish the potential impact a reliever has.

Pete Palmer and Bill James try to answer this question by using a combination of SV, GF, and G to come up with a reasonable estimate. I'm offering the same type of solution from a different angle. Therefore, I think it is irrelevant how we think they impact, and how they've changed the way the game is played and managed. The fact is that the impact of the best relievers, while real, is not substantial enough to catapult them to the levels of the superstars. And the best of the lot is good enough to put them in line with star pitchers who lack longevity. This is why relievers are paid they way they are. GMs may have figured out their true value already.

However, your point is just as valid, and that the HOF may not simply be about "overall value". And perhaps relievers do deserve a special spot. I don't know, and I think that the writers also don't know.

As for my sims, I'll run a couple more, like for Quiz and Reardon and Bob Stanley, using their LI. I'll let you know what shows up.

Charles: thank you! I had alot of fun doing this piece! I just wish I could devote more time to this.


Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

No, you didn't miss it. I explained it in another article.

If you go to the first line of the article, and click on the link, you get a general explanation of how I determined the leverage of the situations. If you then go into the comments section, in one of the December 4 comments, I elaborate on how I derived the leverage values. Hope that's good enough?

I apologize for making each of these win expectancy articles links to links to links. They're all related, and it's very tough (for me) to write it adequately, without making it a mathfest.


Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

If ever I get pbp prior to 1999, and I get the World Series pbp, I'd *love* to look at Mariano Rivera. He may turn out to be a borderline candidate like Goose or Lee Smith, based on regular season numbers. But when you add in his tremendous playoff performance, that may be enough to get him over.

In fact, I am surprised how little play playoff heroics get. In the NHL, they have the same problem. The NHL and NBA is *all* about the playoffs. But the awards and HOF, etc, is mostly about the regular season. Rarely do you see the two combined. I believe in soccer, they combine all games, regardless of "league". Pele has 1,241 (or whatever) goals, with no split.


Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

Thanks Colin.

Yes, your point is very valid, and Vinay also brought up the issue in his comment. Essentially, the next breakdown is to look at each PA within the context of the leverage of the situation (which is what the Mills Brothers and Doug Drinen did). So, while Bruce Sutter may be 1.90 overall, he'd have say 30% at a 4.0 leverage, and 40% at a 1.50 leverage, and 30% at a 0.3 leverage. And maybe during the 4.0 leverage situations, that's when he was his best (whether because his manager used him during his peak years, or he rose to the occasion, or by luck), and therefore, he was even more valuable.

It's possible that there is some impact here, especially with the relief wild card.

It requires some upfront work on my part to get the whole thing set up. I'll see if I can devote some time to this. Perhaps after XMAS.


Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

I think this is the exercise that Vinay did, but in response to Joe's point, let's go through it step-by-step.

Let's assume that Gossage has an ERA+ of 126. Let's assume that he had 251 IP as a starter, and 1558 as a reliever. Let's assume that as a starter, his ERA+ was 100, and as a reliever it was 130. Fair enough?

Now, his LI as a starter was 0.99. His LI as a reliever was 1.72. As we see in the above paragraph, he pitched his best as a reliever. So, if we take his 1558 innings and multiply by 1.72, that gives us his "adjusted typical" innings. Do the same for the 251 x .99. Good so far?

Now, weight the 130 ERA+ by 1558x1.72 and the 100 ERA+ by 251x.99. You end up getting an ERA+ of 127. That's compared to the initial value of 126.

The point is that because very little of Goose's innings came as a starter, the change won't affect much. If this was Eckersley, then that's a different story.

That said, while the impact is small in this case, we should still do the breakdown as I mentioned in the previous post, so that we are leveraging each particular PA, and not applying an overall leverage on the sums of the PA.


Bruce, Lee, and the Goose

December 18, 2002 - tangotiger (www) (e-mail)

Good point.

There are two things to consider with "leverage". You can take the position of what was the leverage of the situation, assuming that the pitcher will pitch to the end of the inning. So, if it's top of 9th, 1 out, man on 1B, up by 1, the leverage is not that particular situation, but rather that particular situation as the starting point, until the end of the inning. It could be that that particular PA may have a leverage of "4", but the "starting from that PA to end of inning" may have a leverage of "2".

Furthermore, you can also take the point of view that if a reliever gets himself into a jam that the manager is "bringing him in" to get himself out. That is, after every PA, the manager is deciding whether to bring in his existing pitcher, or bring in a new pitcher.

Remember, my point of view is crucial PAs. So, PA by PA, what is the leverage. I don't know if it's the pitcher or the fielders that caused the change in leverage. And really, I don't care. What I care about is how often did he face a high-leverage situation.

It is important that you don't make a stat do what it wasn't designed to do.

If I were designing a model to decide when is the optimal point in the game to bring in a reliever, such that he will pitch to end of inning, I would have different leverage numbers. And if I design a model, such that my pitcher will pitch to end of inning, plus one more full inning, I'd have again, different leverage numbers.

All these methods are good, within the context of their design assumptions.


Bruce, Lee, and the Goose

December 19, 2002 - tangotiger (www) (e-mail)

To add to the point about "how often do reliever cause high-leverage PAs for themselves": Bruce Sutter comes into a high-leverage situation, and he keeps it high-leverage. Fat Rojas comes into a high-leverage situation, and turns it into a low-leverage situation (by giving up 3 run HRs).

So, there are various reasons as to turning a type of leverage situation into another type of leverage situation. It's not just a "if he's bad, then..." kind of deal. It's alot more intricate than that.


Bruce, Lee, and the Goose

December 19, 2002 - tangotiger (www) (e-mail)

Gossage gave up .5 more walks than Gooden, but .7 less non-HR hits. Gooden's run environment was slightly higher than Gossage. The actual runs allowed by relievers are a little suspect because of the "accountability" issue. Not that it should be ignored, but just you have to account for it.

On a "rate" basis, of pitchers born since 1950, I have Guidry, Cone, Rijo, Sutter, Blylven, Gooden, Gossage all being "equivalent". In terms of IP or leveraged-IP, clearly Blyleven is the one that stands out here. By the way, John Smoltz is also in this group.

Gossage is borderline, in my view.


Bruce, Lee, and the Goose

December 19, 2002 - tangotiger (www) (e-mail)

Paul, welcome! I don't think I've seen you around here? I'd love for Retrosheet to get more PBP, and I'd love to run the 73 Hiller, and the Franco career through their paces.

Walt, my inclination is to say that they are in the pen because they are not Roger Clemens or Greg Maddux. However, they are David Cone or Ron Guidry, and those guys were pretty darned good. I don't have separate standard for catchers or anyone else. I look at it as how many wins did they contribute over some baseline. If you are a catcher, and you only play 120 games, and you are done by 34, then I don't have different standards. Not to say I'm right or anything.

You make a valid point that relievers can be considered similar to catchers (can't play long enough in a season or a career). So, you have to first resolve why they have shorter careers (because of the position, or the quality of the players there). Then you have to resolve if you want to have a different standard.

It's a tough call no matter what your perspective is.


Bruce, Lee, and the Goose

December 20, 2002 - tangotiger (www) (e-mail)

If I remember my post, Eichorn had 157 real innings, and 203 leveraged innings. It may be that that's as much mileage as you can get out of a reliever. That, because of all his warm up tosses, etc, etc, you won't get more than that. Then, he's got to do that for 15 years.

However, I don't know if this is a physical limitation, as it is for catchers (who play 130 games instead of 150, and who play 13 years instead of 18, e.g.). If the relievers are physically limited to 200 leveraged innings insead of 250 for starters, and 12 years instead of 16 for starters (just examples), then it may be fair to consider the relievers to have lower standards, like catchers.

However, this should be studied to the extent that catchers' careers have been, before we pronounce sentence.

Even after all this though, people can still choose to not lower the standards for the C/RP.


Bruce, Lee, and the Goose

December 26, 2002 - tangotiger (www) (e-mail)

If there's anyone still out there, Eric Gagne's LI last year was 1.83, and Smoltzie was 1.79.


Are Managers Optimizing Their Best Relievers?

December 31, 2002 - tangotiger (www) (e-mail)

But first, I'd like to suggest that the most optimal use of the best relievers would generally be as a starter.

Agreed.

Why don't we use the same thinking for relievers? Why is the 9th inning any more important than the 1st?

If you bring in Mariano Rivera with a 6-run lead 50 times, you won't change the outcome of the game, than if you brought in an average pitcher.

If you bring in Mo 50 times with a 1-run lead, the Yanks will win a few more games than if you brought in an average pitcher.

If there's a one-run game, aren't each of the starters' six innings just as vital as the closer's ninth?

I'm not taking anything away from the starters. Their LI is about 1.00.

In fact, I would even argue that the first 7 innings of pitching are MORE important than the 9th because the score after the 7th (and often the 6th) influences the choice of relievers the opposing manager will use.

7 innings of LI of 1.00 is 7 leveraged innings. 2 innings of LI of 2.00 is 4 leveraged innings. Yes, the first 7 are more important, or at least, they have more impact to the final outcome of the game.


Are Managers Optimizing Their Best Relievers?

December 31, 2002 - tangotiger (www) (e-mail)

Is it reasonable to conclude that Rivera's LI could be somewhat deflated due to the fact that the Yankees have been consistant winners over the last few years...

I believe I mentioned that as a possibility that the Yanks pay (earn) this price.

I don't recall any mention in the previous article of a correlation between overall team LI and team W-L records (though I would expect really bad teams to have the lowest LI's for their relievers).

On my to do list. I should be able to come up with the LI, on a team-by-team, year-by-year basis, from 1974-1990. I expect the LI to peak with teams at .500, and slowly degrade the more the team's win% is from .500 (on either side).

It would also be fun to see the converse of this study, i.e. what is the xFIP for pitchers with the highest LI?

Also on my to-do list. I just ran a prelimiary report for 1974-1990, and Todd Worrell actually tops the list at 1.97. Bruce Sutter is second at 1.90. The top of the list is all the usual suspects. The first name that I didn't recognize was Victor Cruz at 1.58. Next was Steve Foucault at 1.50.

Among "middle-relievers", Tim Burke was 1.54. He's a favorite of mine, and it certainly looks like he was used prominently. Paul asked earlier, and john Hiller was 1.62. Mike Marshall was 1.51.

Among pitchers with at least 2000 PA, Dave Tomlin was 0.73, and worst of the bunch.


Are Managers Optimizing Their Best Relievers?

December 31, 2002 - tangotiger (www) (e-mail)

Thanks, I'm enjoying this as well!

The problem with the "out" is that sometimes an out increases your WE (win expectancy), say a flyball with a man on 3b, of a tie game in the 9th inning. Strictly speaking, you have to look at the change in WE for every possible event, and then come up with the variance (and the frequency of those events). In essence, how much swing potential in winning does a particular game state provide? That's the question to answer.

I'd love a faster computer, as I'm running this on a 650 MHz (but 512 RAM). Sometimes, I have to run stuff overnight.


Are Managers Optimizing Their Best Relievers?

January 1, 2003 - tangotiger (www) (e-mail)

I will give a performance breakdown for Shuey and Stanton, among crucial, normal, and non-crucial situations. Look for this in a few days. We'll see if they can "handle" the pressure...


Are Managers Optimizing Their Best Relievers?

January 2, 2003 - tangotiger (www) (e-mail)

What we are after is *not* to maximize a pitcher's LI, but rather to maximize their leveraged-innings (LI x IP). LI of 1.00 with 120 IP will have the same win impact as 1.50 LI with 80 IP to a reliever. Of course, it's not that simple, as you have to take the totality of your starters and relievers, and maximize the leveraged innings for the good pitchers, and minimize the leveraged innings for the bad pitchers, such that all innings are accounted for. You have other constraints as well, with respect to the tiredness of a pitcher's arm, etc.

Mark Eichorn, for example, had 200 leveraged innings (LI of about 1.3) in his great year. That is an excellent total.


Are Managers Optimizing Their Best Relievers?

January 2, 2003 - tangotiger (www) (e-mail)

All things equal, you are better off having your pitcher as a starter.

Your considerations would be to take someone like Urbina and Wetteland, and determine their level of effectiveness as a starter or reliever.

Say that as a starter, their performance would be a win% of .600. And as a reliever, they would be .650. You know that you can get say 160 leveraged-innings as a reliever, or 220 leveraged-innings as a starter. What do you do?

Compared to a baseline level of .450 (the effective level of rejigging your whole pitching lineup), you get 160/9 * (.650-.450)= +3.6 wins as a reliever or 220/9 * (.600-.450) = +3.7 wins. Essentially, a wash.

So, you really have to go into it deeply, determine the effectiveness level of all your pitchers based on the starter/reliever role, determine how you can best optimize your leverageable innings, and come up with your plan. It's not so easy, especially considering injuries throw a wrench in your whole plan. Unless you are the Yankees.


Are Managers Optimizing Their Best Relievers?

January 2, 2003 - tangotiger (www) (e-mail)

Shuey and Stanton breakdown

The leverage classes were broken up into high-leverage (LI of 2 or greater), low-leverage (LI of 0.5 or less), and the rest.

$H is non-Hr hits per ball in play. All the others should be self-explanatory.

Paul Shuey? He was at his best in high-leverage situations. Mike Stanton? He was by far his best in high-leverage situations. Note the small sample of PAs. Note also that it's easier to get more WP in high-leverage situations, since high-leverage situations occur more often with men on base. In any case, Shuey's WP rate wasn't so high, relative to his other situations.

I think there's some interesting DIPS numbers in there as well. With the leverage situations different, each pitcher gave up fewer hits / ball in play, and fewer Ks as well. Almost as if the pitcher had to bear down in the high-leverage situation, and therefore, has a different pitching approach, thereby lowering his K rate, and improving his $H rate. We may in fact find that pitchers DO control the hits/ball in play ALOT. And it may simply be the fact that once you reach the majors, the pitchers are similar in this regard overall.


Are Managers Optimizing Their Best Relievers?

January 2, 2003 - tangotiger (www) (e-mail)

FJM: yes that is correct. The second guy was on a hotter seat, and that's what LI is reflecting. As I mentioned on another thread, LI is not about rewarding a player, but classifying each PA.

Note that a manager is choosing to bring back the same reliever. If he had chosen to replace the reliever after 2 hits with another reliever, we'd have no problem saying that the replacement was on a hot seat.

It doesn't matter who sets the fire. We are capturing the existence of the fire, and we are capturing that the manager is letting someone pitch in that fire.

Doug Drinen's reliever reports works based on when the reliever enters and exits the inning. This metric works great in other areas, for other purposes. Eventually, I'll probaby create an LI for this as well.


Are Managers Optimizing Their Best Relievers?

January 3, 2003 - tangotiger (www) (e-mail)

Well, I provided the LI for 10 top relievers of 99-02, as well as the historical LI for all pitchers in the 74-90 time period (see Clutch hits).

As for biased, again, there is no bias. It's a reflection of the game state for each PA. I know what you are saying about say John Franco or Mel Rojas being arsonists.

But it's not like there is a giant in a land of pygmies, even Mariano, that we should be concerned about. In the 74-90 time period, Clemens and Gooden are probably the giants. Their LI are 0.96 and 1.03. Hershiser was 1.03. Ryan was 1.05 and Blyleven was 0.98.


Are Managers Optimizing Their Best Relievers?

January 4, 2003 - tangotiger (www) (e-mail)

FJM: Again, I don't know how much effect it has, but I suspect a little. I'll find out eventually.

But again, remember the purpose of leveraged PAs. It's about describing the level of fire during that PA, regardless of whether that fire was arson or not. The manager is bringing back Mel Rojas, the arsonist, for the next PA.

As mentioned in another article, I can also create leveraged appearances, whereby I only note the fire level when the reliever is first brought in. This I will also do eventually. (Drinen essentially did this already.)

It's important to realize that a stat is constructed to answer a specific question, and it should not necessarily be used to answer other questions. Nor is it a shortcoming of the stat if it can't answer this new question.


Are Managers Optimizing Their Best Relievers?

January 4, 2003 - tangotiger (www) (e-mail)

If you page up to my Jan 2 comment, you will see a link to Paul Shuey and Mike Stanton, and how they performed in the various leverage situations. Paul Shuey, and especially Stanton, have excelled in high-leverage situations, when given the chance. The sample size is small, so who knows.

I was surprised with Percy too. I thought he was better, but his K,BB,HR numbers don't compare with the best, though he would have come in the 11-20 list.

As for more analysis, I would love to do it. But my time is really constrained. I want to do an analysis on a team-by-team year-by-year basis for the last 4 years, and within that, show how each pitcher performed in the high-leverage and low-leverage situations. There is really so much I want to do, I don't know where to begin.

Right now, I'm taking a break from relievers and concentrating on baserunners.


Are Managers Optimizing Their Best Relievers?

January 6, 2003 - tangotiger (www) (e-mail)

David, thanks much! I'm actually using alot of different concepts into all this, so it's rewarding to me as well.

As for 2002 PBP, astrosdaily.net has it, so I'm fine there. What I need is *time*. Can you help me there?


OPS: Begone!

May 20, 2003 - tangotiger (www) (e-mail)

The A's have an additional point that by being able to work the count longer, a team can "choose" their opposing pitchers to the point where the average opposing pitcher is worse than by random chance.

They "choose" the pitcher by forcing their opponent to bring in the 10th reliever, because they wore out the starter. While this is certainly conceivable, you would need a whole team of such batters for this to work. As well, there's no guarantee that your team will benefit from it, since your opposition's next opponent might reap the benefits.

In the end, we are talking about a max .20 run difference/GP (see a previous Clutch hit for calculation), if the whole team is like this, and they are the ones who get the benefit. I fail to see how jumping the OBA to 3x from 1.8x would capture this. The "extra pitches" is not a function of OBA, but of (BB+K)/PA. By jumping the number from 1.8 to 3, you are capturing only part of this effect (BB/PA), in a whole bunch of other noise (H,HR,outs). This extra 1.2 is sort of trying to rise above the noise to find the BB/PA. If this is what the A's are trying to do, I don't think they're doing it in the best way. It's hard to comment further, without having the specifics (like James / Todd Walker comment as the best #2 hitter). From what we think they are trying to do, they are wrong.


OPS: Begone!

May 20, 2003 - tangotiger (www)

The "additional point" thing is what I'm capturing. It doesn't matter if you do: 3*(OBA-.3)+(SLG-.35) OR 3*OBA+SLG-1.25

It's the same thing.

========== As for the "wearing out the starter", Ted is correct in his approach. If you have a team of player's whose "true talent level" was .333/.400, this team would score about 4.5 runs per game. However, because these guys all work the count, they have a synergistic effect in tiring out the starter, and bringing in the 10th man. These guys, because they feed off each other in this manner, will end up with .343/.405 numbers (let's say). Now, all of a sudden, this team of talent of .333 with the synergy effect, acts just like a team of .343 with no synergy effect.

This extra effect the A's are capturing inside the OBA, by overweighting that metric. However, there's no reason to rely on such a noise-filled metric, when what you want is (BB+K)/PA or (pitches/PA). Because of the amount of noise, to try to capture the little extra pitches/PA in the OBA, you have to severely overvalue the OBA to find it.


OPS: Begone!

May 20, 2003 - tangotiger (www) (e-mail)

The other reason for using "3" for OPS is if you are actively looking for those types of players. If you really really want guys with high OBA, then you would overweight OBA. You would do this because maybe you feel that it's a better predictor of future production. Or you feel that you need to get the players to toe the company line, or whatever. Guy like Vlad, Nomar, and Soriano would not be properly appreciated in such a system.


OPS: Begone!

May 20, 2003 - tangotiger (www) (e-mail)

This is how SLOB*k and SLOB*PA*k (where k is some constant to make things add up nicely) for 6 equivalent players from that last chart look:

 81 	 76 

81 78

79 79

77 79

74 78

70 76

SLOB by itself works ok, except at the real extreme. SLOB*PA works much better. SLOB*PA is essentially Runs Created, and we already know that BaseRuns is more logical/accurate than Runs Created.

The best one in this group remains static Linear Weights. The best one "on the market" right now is BaseRuns-generated custom Linear Weights.


OPS: Begone!

May 20, 2003 - tangotiger (www) (e-mail)

Rob, you know what, you are right! I goofed.

While I was using outs as my baseline in the last chart, I should have used PA instead. Each player on the team should have the same number of PAs, not outs. Let me re-run the chart, and I'll publish the update on my site.

Good catch!

(Vinay, you are right about RC = SLOB*AB, and not PA as I mentioned in my last post.)


OPS: Begone!

May 20, 2003 - tangotiger (www) (e-mail)

For the last example, I should have been more careful.

What happens is that I should fix the team outs to something. In my example, I actually fixed it to each player making the same number of outs (440) which is wrong.

Anyway, what I now did (see link) was started with the team outs (3960), and, making sure each player had the same number of PAs, found the 8 typical guys and the 1 variable guy that would produce 3960 outs.

Things actually change. The Best-Fit becomes 1.64 (and not 1.75). I suspect that the best-fit will fall somewhere between 1.5 and 2.0, and for ease, probably use 1.5.

(Static) Linear Weights now looks less good than originally. I like this change, as it shows that the component values should change if the underlying environment also changes.

Custom Linear Weights wouldn't have this issue. Though at this point, I don't want to pronounce that custom LWTS will see all these guys as the same. It would definitely see all the teams as the same (just like BaseRuns). I think there will be some differences among these players though through custom LWTS. I'm not sure how much difference though.

Great catch again, Rob!


OPS: Begone!

May 21, 2003 - tangotiger (www) (e-mail)

See link for the values I used. For the categories I didn't use, I set them to "zero". It's not too important for what I am trying to do though.

As for the other question, you are asking if you can only know one thing, OBA or SLG, which one correlates to run scoring the best? I seem to remember Dan Werr doing a correlation study a month or 2 ago that showed the r to be pretty even between the two. That doesn't mean they are "equally important", especially if you have both.

As well, the coefficient itself (1.56, 1.64 or whatever) doesn't specificy the level of importance. If you made lilSLG = 1/4S + 2/4D + 3/4T + HR, all divided by AB, what do you think would happen? The best-fit would be 1.64*OBA + 4*lilSLG. That doesn't make lilSLG twice as important as OBA, now, does it?


OPS: Begone!

May 21, 2003 - tangotiger (www) (e-mail)

I agree that it would be a rush to judgement to make any conclusions without having all the information.

While you can conclude that using 3*OBA+SLG is a poor way to evaluate current run production, it's not so clear if you want to use that equation to try to evaluate future run production (or for other secondary reasons). And you certainly can't indict someone or some organization overall. Sample size! You need alot more evidence.

I also agree that being able to work with a group of people, respecting their views, regardless of what it is, as long as they respect your views as well, is very important. Respect, courtesy, professionalism. Isn't that the police motto?

However, I'll note that in the ESPN chat, Bill James said: Baldelli's a lot of fun. In my office we were making fun of some scout who compared him to Joe DiMaggio, but when you see him play you realize what people are reacting to. Of course, he doesn't have DiMaggio's entire package, but he does have more than half of it. I kinda didn't like the first part, which left me with the impression that the stat-heads and the scouts clash behind each other's backs. But, this was a throwaway sentence, so who knows what James meant.

Finally, as for anyone's ability to deal with people, I'm not sure that you can necessarily say that DePodesta is good or bad, nor could you say that with me, or Voros, or anyone else, unless you deal with these people on different issues in different settings (or you have some second-hand knowledge... definitely not third-hand or worse). I don't think that an executive is a better people-person, or can deal with people, than a non-exec.

I agree that arrogance is a turn-off to most people, and that's something that a speaker should be conscious of. Mike Gimbel, who I've had occasion to e-mail from time-to-time, seems like a pleasant enough fellow. But I've heard from many many people that he is insufferable. That by itself, truth or perception, will keep Gimbel out of MLB, in my view.


OPS: Begone!

May 21, 2003 - tangotiger (www) (e-mail)

I agree with your comment on the corporate world (as I've been here for...geez, almost 13 years... my "corporate world" anniversary will be in 1 month).

Rather, I'm talking about the ability to be persuasive when dealing with people who have dissenting or at least ambivalent viewpoints, which at the very least involves some combination of:

That sentence alone is interesting to read!

But to really do all of those things well is fairly unusual, and I would guess that among the pool of the 15 or 50 or 500 or whatever leading analysts, there's a lot more differentiation in terms of interpersonal ability than technical ability.

That's an interesting thought too. I'm not sure if there is more differentiation in one or the other, or how you would qualify/quantify all that. And even if the differentiation is more in one category, the impact of that differentiation might not be as much as the other category.

Did it just feel like we had an OBA v SLG discussion? (More differentiation in SLG, but more impact with OBA differences.)

As with everything, there's degrees of impact to everything, and it's rather pointless to label them black/white (not that that's what I think anyone is doing here). Even if you have a terribly insufferable analyst, his work might be of such quality that it tips the scales towards good. Even if you would be able to classify DePodesta as a mediocre sabermetrician (and I'm not doing that), the rest of his skills might be so strong, that he can make an impact with his research, while others might not (even with better "stuff").

The fact that a successful organization has him employed, and he is highly regarded by other successful people, even though his experience is not as vast as other baseball execs, must show that his total package is something to respect highly. He's a mover and a shaker, and he gets things moving and shaking in generally the right direction.


OPS: Begone!

May 21, 2003 - tangotiger

I think that you should give the benefit of the doubt when you can. I've heard nothing but good (in fact great) things about DePodesta, so, without him actually saying anything, I give him that benefit.

Now, I can interpret the 3 thing as being "you know, I've got this great formula, and you know what, this correlates highly to 3*OBA+SLG. I don't use 3OPS, I have my own, but as it turns out, it's close to 3OPS. BaseRuns, which I don't use, is close to 1.6OPS. I'm sure Tango/David don't use 1.6OPS, but their equation is close to that".

I don't think that explanation is unreasonable, is it?


OPS: Begone!

May 22, 2003 - tangotiger (www) (e-mail)

David, I agree that the 1.64 value is a little suspect since it is based only on those 6 players that I happen to construct. I mentioned that 1.50 to 2.00 would be the correct value, if you were to look for it.

I've used the plus-1 method in the past, and I find I can minimize the runs error by using 1.83 as the coefficient for OBA. That is, 1.83*OBA+SLG. I think that as long as you use something between 1.5 and 2.0, you'll be ok, or at least better than not. I suppose if you really wanted to find the best-fit via the "plus 1" method, you'd look at 200 regular hitters, and figure it out that way.

(For the uninitiated, the "plus 1" method was described in the "Runs Really Created" series last year. Check out the archives.)


OPS: Begone!

May 22, 2003 - tangotiger (www) (e-mail)

Interesting. You know, I'm pretty sure I never include the IBB, but it was several months ago when I did that 1.8 thing. Interesting results though. I suppose we should compare it to the full-blown BsR version in that case.


OPS: Begone!

June 2, 2003 - tangotiger

3*(OBA-x)+(SLG-y)

This works out to 3OBA+SLG-(3x+y) which works out to 3OBA+SLG-k

Therefore, it is irrelevant what "k", "x", or "y" is. Whatever numbers you choose won't affect the ranking of the players, or the degree of their rankings, relative to each other, than if you simply used 3OBA+SLG


OPS: Begone! Part 2

May 27, 2003 - tangotiger (www)

Nick, very well said, and I especially liked this

...because he generates extra PA at (mostly) his teammates ability levels, not at his own. It would have taken me a paragraph to explain this, but you said it perfectly in half a sentence.

As for the batting average thing, I suppose that's another myth. It's pretty clear that given two guys with the same OBA and SLG, you want the guy with the LOWER BA (though in reality, we're not talking about much difference).

I suppose if you really needed to quantify it, probably something like 3*OBA+2*SLG-BA (I really don't know, but it would be of some form like that). I'll bow out of any discussion on trying to find the best-fit equation using OBA,SLG,BA. I already don't have much use for OPS, and I know I won't like OPSMB!


OPS: Begone! Part 2

May 27, 2003 - tangotiger (www) (e-mail)

Jason, interesting thought.

I just tried with a weird environment (OBA/SLG of .393/.493), and in this case, the higher the BA, the more runs scored. I then tried the other way, with .289/.351, and this time the LOWER the BA, the more runs scored.

The "break-even" point seems to be about .360/.450. That is, at that level, the change in batting average (and I checked from .200 to .340) made zero change to the run production of the team.

Great call!


OPS: Begone! Part 2

May 27, 2003 - tangotiger (www) (e-mail)

"Key" situation is another topic entirely.

Click the above link, select your "key" situation, and plug in the numbers (on a /PA or /600PA basis). That'll tell you which guy you want.

If by key you mean inning/score as well as base/out, then you need another tool to evaluate it.


OPS: Begone! Part 2

May 27, 2003 - tangotiger (www) (e-mail)

I just want to make it clear: do not, absolutely do NOT, rely on OBA/SLG/AVG to make game decisions.

You must break it down to your components, and you must apply those components against the context being faced (base/out states, inning/score/base/out game state, game/pitcher state, etc, etc).

OPS is quick and dirty and has no place in game decisions. Relying on it for some cases will make you rely on it for most cases, and sometimes all cases. That's a bad habit to start. OPS, begone!


OPS: Begone! Part 2

May 27, 2003 - tangotiger

Every game context produces different "win potential" for H, HR, BB, outs, SB, sacs. The values between those components are not static. In a completely "run potential" world, you would never call for an IBB or a sac. But in a "win potential" world, there are many many times that you need to call for the IBB or sac.

OPS, if left to its own devices, would become the defacto mechanism to evaluate game situations, when in fact its purpose is to gloss over player evaluations. I don't believe in taking baby steps, and the long path to get the job done. I also don't believe that we should hand hold the manager for 20 years to lead him to the proper tools.

Give them the right tools for the right job, and let them decide if they want it. If Felipe Alou says that looking at OPS is b.s. to decide whether to walk Bonds, I'm going to agree with him. Should I say that OPS is less b.s. than using BA? A rose by any other name...


OPS: Begone! Part 2

May 30, 2003 - tangotiger

Yes, what you want is win-based LWTS (or a sim). And I would guess that a manager will be able to be right (using only his experience) more often than using just OPS, in a tight in-game decision.


OPS: Begone! Part 2

May 30, 2003 - tangotiger (www)

1) How can injecting more than 40 extra bases into the same number of plate appearances or outs produce a negative result?

40 extra bases on hits, but 100 less bases on walks.


OPS: Begone! Part 2

May 30, 2003 - tangotiger (www)

The differences between the top guy and the bottom guy, the bottom guy has: 100 more walks 19.7 more HR 119.7 less singles

everything else is the same.

Straight static LWTS says that works out to +33, +28, -56 = +5, or some such.


OPS: Begone! Part 2

May 30, 2003 - tangotiger

RC has its own problems, magnified substantially when the HR/H or HR/PA becomes out of whack. RC does not model run scoring at all: it just got lucky that it looks like it models it. If you've got a computer, there's zero reason to use RC, when you've got BsR (unless you want to propose a model that's better).

I don't really care about the different denominators. The whole thing of OPS centers around: more good, less bad. The more walks, the more hits, the more TB, the less outs, the better the number. There's nothing inherent in OPS that ensures that the balance is proper. It's just plain old luck that for the run environment of MLB, that it works out that way.

Believe me, if the run environment was half what it is today, or double what it is, there'd be some other "quick" estimator that would get lucky to model run creation.

Sorry for the rant.


How are Runs Really Created

August 12, 2002 - tangotiger (www) (e-mail)

Devin, excellent points.

...If the results of the RC formula didn't correspond roughly to actual runs, James wouldn't be using it.

As I mentioned, as long as you are using typical teams in the .300 to .400 OBA range, and as long as the HR/game hit is around the norm, then RC works fine as something useful.

The problem is when you try to extend that to Barry Bonds types of teams (not that they exist) or Pedro Martinez types of teams (and they exist plenty, as Pedro, when on the mound, is his own team).

My point is to make sure that just because the results of Runs Created works on a particular set of samples doesn't mean that you can extend that methodology to other types of things you may be doing.

There's a reaons RC fails, and it's in its treatment of the HR.

2) Okay, my common sense has a problem with a run value system that has events with the same outcome (walk, HPB, interference) having different run values.

Let's take a real simple example: a regular walk v IBB. Since an IBB walk occurs almost always with first base open, then an IBB has zero "moving over" value. Since the IBB is given out much more with 2 outs than with 0 outs, the "run scoring" value of the IBB is much less than a regular walk.

So, based on the frequency of when the events happen, and the effect of each event, the values can change drastically.

As for a regular walk v HBP, HBP occur in more or less random fashion. A walk occurs with more frequency with 2 outs than 0 outs, and with more frequency with no runners on 1B than expected in random fashion. The effect of these two things reduce the "moving over" value of the walk and the "run scoring potential" of the walk.

If you are thirsty for more, I've published PRELIMINARY results on the run values of various hitting events by the 24 base-out states. (I should be publishing an updated table in a few weeks.) From there you will see there are virtually no differences between the walk, IBB, and HBP, as you'd expect.

http://www.tangotiger.net/lwtsrobo.html

Thanks, Tom


How are Runs Really Created

August 13, 2002 - tangotiger (www) (e-mail)

Rob, your question on base-out differences in run value can be found here http://www.geocities.com/tmasc/lwtsrobo.html

I looked at the batting order differences of run values, and there was a long thread posted on fanhome. It is not easily digestable, and someday I'll write an article on the discoveries there. But yes, as you'd expect the leadoff hitter's HR value was 1.30 while the #5 hitter was somewhere around 1.47.

John Warren: the steal is an interesting point. The run value of the SB is very independent of the run environment, as the additive value of the SB is around .17 to .21 for the most part. The CS however changes HIGHLY, as the out is the most dependent on the run environment. The break-even point is therefore much lower with Pedro, and more steals should be attempted against him.

Mike: I've previously published charts on win expectancy which I have to update in the near future. There's no doubt that win expectancy is really the most important aspect of analysis since that's what we are after. Again, for those thirsty for more, you can consult my prelimiary chart on WE here: http://www.geocities.com/tmasc/we.htm . Again, where this comes most into play is the IBB. While the run value of a regular walk is .30 runs and the run value of the IBB is .17 runs, the win values are far different. Because the IBB occurs in game situations where it is "controlled" to minimize the impact of win/loss, then it's win value would also decrease.

Thanks for all your great comments.


How are Runs Really Created

August 13, 2002 - tangotiger (www) (e-mail)

GIDP: it's worth around -.45 runs. I was thinking of breaking up the "outs" PA into "outs 1, outs 2, outs 3", but decided against it. Maybe I will fix that.

Jason: what I am presenting is how runs are really created. It's the building blocks to whatever it is you want answered. From this, you can generate win expectancy tables, if you like, or the more detailed run values by the 24 base-out states. You can then further extend this to a 24x9 run values that ALSO includes batting order. And from that standpoint, you can evaluate the #9 v Bonds with the bases loaded.

These other run evaluators give no option to do this simply because they are the end to the means. They were built to answer a specific question, and therefore are not very extendable. Play-by-play analysis is very extendable.


How are Runs Really Created

August 14, 2002 - tangotiger (www) (e-mail)

Linear Regression

There are certain things that must be understood about linear regression and using it to determine the relationship between hitting events and runs scored.

First, a little background on linear regression. If you have two things, say, the price of a stock and the earnings per share, you can probably find a relationship between these two variables. The higher the earnings, the higher the price of the stock. You will end up with a formula like P = m times E + b, where P is price, E is earnings, b is some constant and m is the slope. The price of a stock, and runs in baseball, is influenced by more than one variable. You end up with an equation that says y = m1a1 + m2a2 + m3a3 + ... + b. Linear regression lets you input the independent variables a1, a2, a3..., the dependent variable y, and solve for m1, m2, m3..., and b.

Here are 4 major problems with using this in baseball: 1 - Linear regression is LINEAR. Linear as in a straight line. While there is a somewhat linear relationship between runs and singles, doubles, triples, and walks, there is NOT a linear relationship between runs and HR, or runs and everything else like SB, WP, BK, etc. Baseball is non-linear.

2 - The independent variables are not independent. There is an interdependence between all these variables. A walk is only worth what it is because of the other things that happen. Linear regression attempts to "freeze" all the other variables when calculating the value of the unfrozen variable. As your run environment increases however, we know that the values of these variables change. Baseball is interdependent.

3 - Even if you assume for ease that run creation is linear and independent (a safe assumption for very controlled environments), what sample data will you use to run your regression against? Most people will use team season totals, which is an aggregate of individual games, which is an aggregate of individual innings. If you want to run a proper regression analysis, at the very least run it on a game or inning level. Your sample size will explode to something much more reliable.

4 - Not accounting for all the variables. Triples have a strong relationship to speed. If you don't have SB in your sets of variables, the regression analysis will award more weight to the triples as a stand-in (because of its relationship to steals). It is possible, based on some samples, that the value of a triple could exceed the value of the HR! What other variables are you not accounting for?

Arvid - Let me get back to your post. The purpose of this article is to explain the building blocks of run creation at the team level. I have not shown how to extrapolate this to individual players. The end-result is not to end up with linear values for each hitting event, since these linear values only apply to a given run environment. We need to determine the linear values for EVERY run environment! As I said, the value of a single in Pedro's run environment is far less than a single in an average pitcher's run environment.

I am interested in the pieces of how runs get created, an actual model. I am not interested in a formula that estimates runs based on whatever variables that ONLY works for a given run environment. Runs Created and Linear Weights work fine for that. BaseRuns is the key, and I will present this hopefully by the end of this month.

Michael - The building blocks of run creation does lie in run expectancy tables for the 24 base-out states. I am not introducing anything new here, but rather showing how we should extend this to other run environments. I have not read Curve Ball. Please clarify your post further so that I can properly answer you.

Rob - Are you asking me what would a player's run value be using a context-neutral approach (i.e., the final weighted average values I presented) compared to a context-specific approach (i.e., the specific values by the 24 base-out states)? If this is the case, the answer is about +/- 10 runs at the extremes. I looked at this last year, with regards to Ichiro. You can find that article here http://www.geocities.com/tmasc/lwbymob.htm though I only looked at the 8 base states. If this is not what you are talking about, please clarify further.


How are Runs Really Created

August 14, 2002 - tangotiger (www) (e-mail)

Michael: I agree that the easily most digestable measure of run creating is one that is context-neutral, and therefore, I am not adding anything new here, except more perfect values to use (and adding values to the obscure events like RBOE or BK).

My interest lies "under the hood", and the how and the why.

The important point that I'm also trying to get across is that even if you stick to a linear context-neutral measure like linear weights, that you should use a custom version, based on the run environment. It really makes no sense to apply the same formula to Mel Rojas as to Pedro Martinez. We only do this, because it's easy for us. And if we keep doing it, we will forget to question why we do it. Runs Created, as great as it was then, is an example of this. It completely fails us at the extreme player level.

I think I am in basic agreement with your point of view.

Rob: OUCH! First of all, I did look at the batting order about 2 years ago, and there was an effect of something like 15-20 runs for Rickey Henderson in the leadoff spot. That is, putting a player whose skillset is uniquely qualify for a batting spot that has the most variability (which is Rickey to a tee) with his best season I think had a variability of close to 20 runs (against putting Rickey say in the #5 spot). The #2 hitter also showed great variability, and I concluded elsewhere that in certain (many!) situations, your best hitter should bat #2.

With the MVP/Ichiro thread, I showed that batting great with men on base, or being given alot of men on base will add 10 runs. Give both, and you're close to 20 runs as well.

I really don't need to run a simulator to determine all this though. This is a simple problem of determing the frequency of facing the 24 base-out states, and your success in those same states.

I wouldn't be surprised if you have a player who is ideally qualified for a particular spot (say Ichiro for #2, though I don't know that), who faces more than normal high-leverage situations, who is one of the best hitters in the game and who performs far above his "neutral" performance level would add 30 more runs than if placed in a "neutral" spot and performing at his normal high level. This is of course a rarety, and I would guess in practical terms that 1 standard deviation would be +/- 4 runs.

This issue however is very interesting to look at, but it would be something that I would have to prioritize in with the other equally interesting things I'm looking at.


How are Runs Really Created

August 14, 2002 - tangotiger (www) (e-mail)

Michael, I would not look at SF actual run output to determine anything since 6000 PA is not a very small sample.

Anyway, I once ran sims where I had a team of 9 .333 OBA guys, and another team with 8 .300 OBA guys and 1 .600 OBA guy. Overall, both groups are the same. I also made the SLG average about 30 or 40% higher for each player.

I then moved this Bonds type player through the batting order.

From what I remember, I did not notice much difference between the 9 equal guys and the Bonds + bad team.

I'll have to redo that study now that I have better data available. It is again another interesting question that I must look at.


How are Runs Really Created

August 14, 2002 - tangotiger

I meant IS a small sample.


How are Runs Really Created

August 15, 2002 - tangotiger (www) (e-mail)

Michael, I guess I didn't make myself very clear, since what you replied is exactly what I said.

"Anyway, I once ran sims where I had a team of 9 .333 OBA guys, and another team with 8 .300 OBA guys and 1 .600 OBA guy. Overall, both groups are the same. I also made the SLG average about 30 or 40% higher for each player."

So, the 9 equals of .333 OBA had a team weighted team average of .333 OBA. The 8 equals of .300 OBA plus the Bonds-like .600 OBA would have a team weighted average of .333. So, the first team has the Bonds magic spread around. We are talking about two equal teams in terms of overall talent, except that the spread is far different.

As I mentioned, I don't remember seeing any noticeable difference. It might have been maybe 2% difference (say 15 runs over a season) only to the extent that you'd be able to optimize the batting order so that the .600 guy could do the most damage. I will redo the study at some point in the future though to get more accurate results.

Here is a link to the results of the study I did last year. Please take it as preliminary and crude. Spreading the Bonds magic


How are Runs Really Created

August 15, 2002 - tangotiger (www) (e-mail)

tango I think your crude Bonds analysis goofed up in exactly the types of ways you intended to prevent with your article.

I was doing my best to avoid lone gunmen types like Bonds. I did that analysis ONLY to show the effect of runs at a team level, with having either 9 guys equals, or 8 guys equals, and 1 outlier, even though overall, they have the same stats. I did not want to talk about the "run environement" because...

The problems I see are that as you so elegantly noted walks are valuable only because others drive you in. By using just OBA you missed that completely as most of Bonds exceptional value is in the walks.

...because Bonds doesn't get to partake in his own run environment. Bonds's run environment, the chances that the runners ahead of him will score, and the chances that he himself will score is derived by all the other batters. You can't measure Bonds value of moving runners over, if those chances include partly Bonds' effect.

So, I was hoping that everyone would overlook this, because the Bonds effect to the run environment is outside the scope here. However, since you brought it up, what you have to do, in this case, is establish a run environment for each batting spot for this particular team, such that if Bonds is the #3 hitter, then the run environment of the #2 hitter includes Bonds, but the run environment of the #3 hitter should "assume" an "average" type of ballplayer.

I went into this into great and deathly details in the batting order thread on fanhome. I really want to avoid talking about that here, because we are going to get away from the basics too fast.

Your point is well-taken and accurate.

It would be interesting to analyze all of the line-ups that have been tried to see which would be the most effective based on the run environment concept, and of course see if you could find a better one.

The run environment concept applies to the basic building blocks of run creation, and I did apply this to the above mentioned thread on batting order.

The correct and proper way to do what you are suggesting is to use the proper model (a simulator) to go through all the variations. The run environment concept with its building blocks of run creation however will reduce the different combinations of players to look for greatly.

For those interested in the batting order thread, drop me a line, and I'll point you there. tom@tangotiger.net


How are Runs Really Created

August 15, 2002 - tangotiger (www) (e-mail)

new: you pretty much have got it, except near the end. The run environment is established by the overall offense + pitching + fielding. You CAN create the run expectancy tables and all that with a little programming. You can also extend this into win expectancy tables, which is where the real fun and learning experience lies.


How are Runs Really Created

August 16, 2002 - tangotiger (www) (e-mail)

Voros: BaseRuns does not fall into the trap that RC does nor LWTS. You will find it an appropriate measure, though you lose the great additive advantages that LWTS affords you. Readers of fanhome know what I am talking about here. For the others, please bear with me until the end of the month.

Walt: your terrific dissection deserves a generous response. I will in due course. I do want to make three specific points in the meantime though: 1 - the linear models that are presented with regards to baseball are almost always to the power of 1, and therefore that was my basis for my statement

2 - John Jarvis did a regression analysis on I believe the 1976-2000 TEAM SEASONAL totals and came away with a regression value of .62 (or something) for a double, and .87 (or so) for a triple. Those values are nonsensical in reality. It doesn't matter that his r-squared was 90% or that the standard error was very low. It's wrong. I've done regression analysis on team totals by era, and the results also were strange in some cases.

3 - But I'm really not seeing what this has to do with how slopes change by run environment. Modeling that would suggest other possibilities like a series of dummy variables representing different run-scoring eras or a multi-level random effects model.

Yes, we've tried that, but it doesn't work. As I've shown, each element would have to have its own best-fit linear or parabolic or whatever equation, with respect to the run environment. And the run environment itself would have to be known before the fact. Since we are attempting to determine what is the actual run environment without knowing the number of runs scored, we're stuck. This is where BaseRuns comes in. An elegant, simple and accurate equation.

I will reply to your lengthy post soon. Thanks...


How are Runs Really Created

August 16, 2002 - tangotiger (www) (e-mail)

Not exactly. Linear regression is linear because it's "linear in the parameters." There are many ways to model non-linearities among the variables using multiple regression.

Thanks for clarifying some points. I should then say that baseball is virtually linear in the parameters, but is non-linear to its environment.

In regression, a coefficient gives you the impact of adding that particularly variable to the model, after having removed all the influence of the other variables from both the dependent variable and the independent variable in question (aka "statistical control"). I don't see any inherent problem with doing that here, but perhaps I'm missing something.

The problem is that if you freeze say all the hits, HR, etc, but leave the walk to be the independent variable in question, its value is dependent on the values of hits and HR. So, you freeze hits and HR at say 10 and 2, then the value of 1 walk might be .30, the value of 2 walks might average .32, the value of 3 walks might average .34. Furthermore, if you then freeze the hits at 11 and the HR at 1, all these values change. So, exactly what is the value of the walk?

But I'm really not seeing what this has to do with how slopes change by run environment. Modeling that would suggest other possibilities like a series of dummy variables representing different run-scoring eras or a multi-level random effects model.

Yes. But that's really really hard.

This is a good point, but of course this has nothing to do with the appropriateness or inappropriateness of multiple regression, but rather with what the proper unit of analysis is.

Yes, my third statement was exactly this point.

...That's a lot of conditional RE tables. :-)

Yes, the 24 basic states is the least amount of states that you should accept. 24 x 9 to include the batters would be better. The point of about the fielders etc should be factored into the RE tables before the game so that you have a customized set of a 24 x 9 RE tables that is based on the actual 9 hitters, the pitcher, and the fielders.

However, if things like HBP and interference are truly random, omitting them from the model will not bias the coefficients for variables included in the model.

Things like HBP may be an indication of poor or wild pitching and therefore before we omit anything, we have to determine if they are truly random. Interference I'm sure we can ignore.

All this aside, chances are none of this will have much impact. Baseball scoring is not all that variable, most of the important variables have been identified, etc. Chances are the best we can hope for is minor improvement in the level of error. The proof's in the pudding there and I hope a future article will compare the accuracy of your method to the existing ones.

John Jarvis has gone through the exercise of comparing the various estimators, so I don't need to rehash that.

As I said in the article, as long as you adhere to typical MLB teams who play at the typical OBA levels, then really any run estimator will "work". That's because at a very narrow given specific run environment, what you say is correct, and there is not much variability.

For a "team" like Pedro, this does not apply whatsoever. And Voros is correct that while there is no 9 Bonds hitters, there is effectively 9 Bonds hitters when a really bad pitcher is on the mound. This pitcher would provide his opposition with a Bonds environment.

This is why it is important to understand the building block of run creation, and its high dependence to the run environment (which itself is determined by the various offensive events working together in a non-linear interdependent fashion).

Great comments Walt, and I hope that my lack of knowledge on specific statistical concepts did not take away from the comments I have presented. Thanks.


How are Runs Really Created

August 17, 2002 - tangotiger (www) (e-mail)

Here are the results of a basic linear regression, using team totals from 1969-1999 (808 teams). The second line is the standard error.

outs 1b 2b 3b hr bb k sb cs (0.11) 0.51 0.72 1.10 1.47 0.34 (0.10) 0.21 (0.19) 0.00 0.01 0.03 0.08 0.03 0.01 0.01 0.03 0.07

The r-squared is 95.5%. I still wouldn't take those numbers. They are nice guidelines. Very good ones in fact. But when we have access to play-by-play that tells us exactly what each event, on average, is worth, what does looking at the aggregated seasonal line tell us?

Related Web Pages and Articles


How are Runs Really Created - Second Installment

August 20, 2002 - tangotiger (www) (e-mail)

I've gotten a few response emails with some nice remarks, but I guess there's not much controversy in what I'm saying.


How are Runs Really Created - Second Installment

August 25, 2002 - tangotiger (www) (e-mail)

Tango, don't let the lack of response deter you from completing this project. It is important to get all this down on paper in one place. Combined, it will become an important source for proper RC understanding.

David, I agree with your sentiment. I have a half-dozen other projects that I don't yet know the answer to, and therefore have more interest to me (things like when to actually walk Barry Bonds specifically, based on score, inning, men on base, outs, and batting order). However, I've read too many run evaluator articles that I hope to put a stop to the gobbledygook type approaches, and steer the search in the right direction.

Your BaseRuns is using the only right approach, by definition. As I've mentioned, the only thing left in understanding run creation is the score rate, and how to calculate it.

Tango, trust me, it's not yawning -- it's digesting. I am thoroughly enjoying the work.

Thanks, and hopefully the next article in the series will be as satisfying.

...it was the position of several well-qualified analysts that models don't matter--all that matters is accuracy in the range of interest. To them, the range of interest was the 3 to 6 R/G range of real MLB teams. To me, and to you, the range of interest is 0 to infinity R/G.

Yes, the shortcut way, while easier, and sometimes even more accurate, does not lend itself to extrapolating beyond what it was designed for. Since we are living in the time of Pedro and Bonds, maybe we care more about the extreme types.

I hope you delve into the subject of how to apply a successful team run formula to individuals. I believe that this area has NOT to this point been analyzed properly, and I am curious as to what you have come up with.

This area also needs alot of work, but I will present what I have nonetheless (probably in the 4th installment).

*** A few others have responded by email, and I appreciate any feedback (positive or negative).


How are Runs Really Created - Second Installment

August 26, 2002 - tangotiger (www) (e-mail)

Thanks Ben. To answer your question, there are 24 base-out states to consider. Depending when the K occurs, it's value would be different than a regular out.

You can check out a chart that breaks down the various run values by the 24 base-out states here: Run values by the 24 base-out states .

The K also has an extra wrinkle in that the batter can be safe after a K, or other events can happen afterwards. I've chosen to include them as part of the K, but in reality the effect is very tiny.

Related Web Pages and Articles


How are Runs Really Created - Third Installment

September 16, 2002 - tangotiger (www) (e-mail)

Actually, the linear weights that I use can determine either absolute runs scored, or runs above average.

As described in the previous article, the reconciliation between the two methods is simply subtracting .16 runs per out (for 1974-1990).


How are Runs Really Created - Third Installment

September 16, 2002 - tangotiger (www) (e-mail)

That is, the number of home runs hit isn't independent of other run-generating events. It's not that more runners are scoring per home run when a large number of home runs are hit, but rather, that there are more (run-generating) singles, doubles and triples in these games as well.

*** First of all, this is not true. Generally speaking, those events ARE independent of the HR. I ran further studies that controlled for those events (for example, looked only at games with 2 to 4 walks, 6 to 8 singles, etc, and separated them by the HR class) so that there was virtually no difference in those events. The results were the same.

Why the hell should I care about extreme outcomes?

*** In extreme examples, you can't hide the shortcomings of your models or estimates. They stick out like a sore thumb. And note, in my extreme examples, the dataset went only so far as Barry Bonds' 01/02. So, it was not "unrealistic" extreme, but realistic extremes.

If I'm a major league GM, how does BaseRuns help me to build a better team? Does it have better predictive value than runs created?

*** As I said, almost all run evaluators are similar at the .300 to .400 OBA range. This will help you determine the true value of those extreme players that GMs are trying to figure if they are overpaying them.

Better predictive value? This system, nor runs created, does not talk or explain about predictive value. Voros, MGL and a few others do a good job there.

Does it do a better job of explaining run creation in "realistic extreme" environments such as Barry Bonds 01/02 or the Deadball Era or Coors Field?

Yes. This series of articles explains a team of Barry Bonds, a team of Pedros (i.e., Pedro himself), and virtually any run environment, regardless of whether that run environment is due to the hitter, the pitcher, the fielders, or the park. If Runs Created says that the run value of a HR is LESS than 1 for dead ball (an impossibility) pitchers, or worth more than 2, close to 3 runs (!), for Barry Bonds run environment, what does that tell you?


How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

There is some correlation. Here's the data. But since I've shown the breakdown by OBA (where there is HIGH correlation by definition between the # of singles, doubles and OBA), and I broke it down by HR (where the correlation that does exist has very little impact overall), I don't know what it is that you are after.

S D T HR HBP BB 6.3 1.4 0.2 - 0.2 3.1

6.3 1.5 0.2 1.0 0.2 3.3

6.4 1.6 0.2 2.0 0.2 3.4

6.5 1.6 0.2 3.0 0.2 3.6

6.4 1.6 0.2 4.0 0.2 3.7

6.7 1.7 0.2 5.0 0.3 3.8

7.3 1.7 0.3 6.0 0.2 4.2

7.6 2.1 0.3 7.0 0.4 4.4

As you can see, the most important variable here is the HR. It has by far the most impact on how many runs should be scored, *in this grouping of data*.

And the impact that it does have is nowhere as high as Runs Created would say it is. It's impact, as shown in the article, is virtually exactly what BaseRuns says it should be.

If BaseRuns is 50% more accurate at dealing with extreme cases, and 1% less accurate at dealing with realistic cases (I don't know that it is), that seems to me like one step forward and two back.

Again, the point of the articles is to paint a picture as to how runs are created. BaseRuns is the first step in trying to figure out what the score rate should be. I'm sure there'll be better ones to come around. But the basic model is correct by definition. What Runs Created, static Linear Weights, et al do is to ignore the model, and instead fit their formula to the sample data they have on hand.

What is the question you're trying to answer?

How are runs really created.

What are the implications of your research?

The implication is that by ignoring the actual basis of how runs are scored (that the HR has an absolute minumum value of 1, and caps off at somewhere below 2, and that all events should converge to 1 as the OBA converges to 1), you are fixing your formula to reduce the RMSE. While you might (and should!) get better results by fixing your formula against known sample data, you are deceiving the reader into how runs are really created.

The implication is that the value of the HR does not have an ever-increasing value. There is a law of diminishing returns for the HR specifically.

I apologize in advance for my confrontational tenor,

I would appreciate if the confrontation aspect is reduced slightly. Thanks. I'd prefer the debate center on the merits of the data and interpretation of the data.

but your advocacy of BaseRuns comes across as almost cultish, based on a series of assumptions it conflates with Truth

Again, except for BaseRuns' definition of the score rate, everything I've said is truth. What assumptions are you referring to?

We start off with a point of fact that runs = BR x scoreRate + HR. Now, we're trying to figure out what the score rate is. David' B/B+C seems too simple, but in actual fact, this ends up conforming to reality. There's a problem at the very high end, and that's where we should try to look for better answers. But the runs = BR x scoreRate + HR must hold.

, without regard for the world around it.

BaseRuns is the only model that accounts for the world around it.

But how about one Barry Bonds and eight mortals? What I'd like to see is a comparison of the systems within an actual major league context, not a simulation that you've designed to produce an outcome that is preordained to be favorable to your cause.

Preordained? Would you believe me if I told you that I wrote the first 2 articles BEFORE I ran the data in the third article? I was happy to use my sim, but then I decided to run against real data. I was the biggest skeptic of BaseRuns when David first introduced it to me. There's no bigger skeptic of "new math" than me, be it DIPS, BaseRuns, or Win Shares.

As for 1 Barry and 8 mortals, that would require the use of a sim, because of the problem I mentioned regarding the pond and the aquarium. There are other factors, specifically, the batting spot you put Barry in. He has a different effect if batted 1st than 5th. I intend to look at the batting order effect at some point, but I'd be happy to share anything specific you would like to know.


How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

is Base Runs designed to provide new information

At the risk of repeating myself, it is designed to present how runs are really created. It's up to the reader to decide how valuable it is to know this.

As I mentioned on a few occasions, if all you look at are players and teams with an OBA around .300 to .400, it really doesn't matter what you use.

But, if you are interested in extreme examples, like Pedro, or a high run scoring environment, BaseRuns value comes through in that it doesn't give you the shortcuts the other run estimators rely on to be accurate.

I agree, if you are not bothered by back of the envelope calculations, and you don't care about extreme situations, then stick with basic RC or static LWTS. They'll serve your purpose. I've said as much in past articles.

I am presenting a framework to understand how runs are created, and that at some point the marginal value of the HR decreases, while other run evaluators never consider this.

And if you are looking at college or high school ball, then BaseRuns becomes much more valuable.


How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

As for how accurate is BaseRuns is in "real-life" situations, here's the data behind the "by OBA" chart. The "R" is actual Runs scored.

oba R BsR LWTS RC 0.030 0.12 0.10 (1.93) 0.03 ... BsR better

0.077 0.41 0.31 (1.18) 0.20 ... BsR better

0.124 0.61 0.63 (0.37) 0.51 ... BsR better

0.176 1.17 1.22 0.63 1.09 ... BsR better

0.224 1.88 1.97 1.67 1.86 ... RC better

0.275 2.83 2.99 2.89 2.93 ... LWTS better

0.324 4.04 4.21 4.19 4.21 ... LWTS better

0.371 5.31 5.58 5.51 5.65 ... LWTS better

0.421 6.87 7.28 6.95 7.42 ... LWTS better

0.468 8.64 9.18 8.41 9.33 ... LWTS better

0.515 10.34 11.33 9.89 11.49 ... LWTS better

0.566 12.10 13.91 11.78 13.84 ... LWTS better

As you can see, when the OBA is between 275 and 375, all three measures are very very similar. But for the Pedros and Thomes and Bonds of the world, things are different.

Notice also that LWTS is better at the "high-end". This is because LWTS takes advantage of the HR value to be fixed at 1.40, and it doesn't fall into the RC trap.

If I present a similar table broken by HR class, BsR will take over in *all* respects. This is why I say that David's BsR is the first step. Clearly, it still falls into a similar trap as RC in that it overvalues each event (but not as much) that RC does. The search is to find out how to better represent the interaction.


How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

Ugh, just trying to turn off those italics. Sorry about all that.


How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

Are those italics ever going to die?

...this as the sort of tough love you'd encounter in defending a dissertation. It is clear that all of this makes sense to you, but it is not so transparent to a well-informed audience.

*** Yes, this is very clear to me. Since I don't have the natural honed gift of Bill James in writing, I'll do my best to convey my message better.

I also think that you're inviting somewhat more ... aggressive feedback when you say things like "Runs Created is dead, BaseRuns is the now",

*** That's ok. I say these things with basis. I've gone to great lengths to show different scenarios, etc.

...or invoke (incorrectly) something like the Heisenberg Uncertainty Principle. I mean, you're talking the talk.

*** I didn't mean to suggest that the Heisenberg Uncertainty Principle was at work here, nor that my example was one of Heisenberg. The specific quote I took was one where it's hard to distinguish between what is being observed, without interacting with the system you are observing. Barry Bonds does not interact with himself, only with his teammates. But by throwing Bonds into the mix, you are changing the relative values of the teammates you are trying to study.

2. Thank you for presenting the table of correlations.

*** Sure. I'd be glad to show more detailed data. Just the forum here is not very appealing for it. Email me if you want more.

3. The fundamental point is that there's no "Holy Grail" here. Runs are created by the particular combination of batting and baserunning events in a particular inning of a particular game.

*** The search is for the holy grail, and BaseRuns is *not* it.

Any attempt to generalize these unique sequences of events into something universally applicable has got to make approximations and assumptions.

*** You don't want to make them universally applicable. The holy grail reference was in reference to things to come. You have to understand all the contexts, the base-out situation, the pitcher/batter matchup, the runners, the fielders, the park. I don't expect that we will end up with 1 formula for all that. I do expect that we will get a series of principles that will follow all that. The work that I have done all leads to this.

Linear weights cuts a different corner than BaseRuns does, by focusing on data at the season level rather than at the game level. You assert that focusing on data at the game level is True, without presenting evidence either as to the utility of this approach

*** The game is the unit since the interaction of the events occur at the game level. You make a more accurate point that the interaction occurs at the inning level, and this is true. If I had the data, I would have presented at the inning level.

(what would Billy Beane do with it?),

*** I'm sure he's a smart guy. But not all these things have to be applied by GMs. My audience is myself, and people who think like me. Maybe there's not many people out there like that, that's fine.

or to its aesthetic purity. Why focus on the game level, rather than at the inning level? Why try and take all of the context out of run creation at all?

*** BaseRuns tries to (wrongly) take the context out. LWTS, as I do them, forces all the context right back in.

4. My point in criticising your experimental design is that the coefficients you use in BaseRuns were derived based on data gathered at the game level, and that you then use game-level data in order to test its superiority.

*** As discussed, the interactions occur at the inning level. Game-level is the best I had.

If you tested the various systems based on season-level data, linear weights would triumph, because that's how it is derived.

*** Not necessarily so. Dynamic Custom Linear Weights would always triumph over static linear weights. Assuming you meant static linear weights, this is probably true, but not necessarily. But I agree generally with this statement. The point however is that the single from last week does nothing to determine the impact of run scoring tomorrow. It might have some predictive value, but it has no impact on it. This is why inning-view (or game-view if you are also looking at wins and lineup construction) is the correct view.

5. In the OBA chart you present above, BaseRuns is considerably less accurate over the entire normal range of OBA's. Missing by an additional .25 runs per game would amount to about 40 runs or 4-5 games over the course of a season. That is not "very very similar"; you have made a substantial trade-off here!

*** If I present the chart by HR, or by OPS, BaseRuns would triumph over the normal range. It depends what data it is that you are using, the context that you are presenting. Again, the strength of BaseRuns is how it handles the HR. Its weakness (relatively speaking to itself, but not to the other evaluators) is the rest of the components. This is why I don't support BsR as the end-all and be-all, but as the first step. As I've mentioned a few times, the search is on for a better score rate. If you have a run model that doesn't adhere to something as fundamental as R = BR x scoreRate + HR, what are you supposed to do with this?

*** Here is the data by the OPS class (grouped by .100)

opsClass R BsR LWTS RC

0.055 - 0.03 (1.76) 0.02 ... RC is better

0.160 0.22 0.18 (0.75) 0.17 ... BsR is better

0.258 0.47 0.50 0.11 0.48 ... RC is better

0.358 0.93 1.02 1.01 0.97 ... RC is better

0.454 1.62 1.72 1.94 1.65 ... RC is better

0.552 2.49 2.60 2.90 2.52 ... RC is better

0.651 3.47 3.63 3.88 3.59 ... RC is better

0.748 4.62 4.79 4.82 4.83 ... BsR is better

0.846 5.85 6.08 5.72 6.27 ... LWTS is better

0.945 7.17 7.48 6.59 7.92 ... BsR is better

1.043 8.60 9.00 7.42 9.78 ... BsR is better

1.141 10.11 10.53 8.18 11.85 ... BsR is better

1.239 11.57 12.25 9.06 14.14 ... BsR is better

1.338 13.16 13.89 9.76 16.66 ... BsR is better

1.443 15.78 16.15 10.78 19.62 ... BsR is better

(you'll note that when RC is better, is is barely better. As the OPS rises, BsR is far better.)

And here's broken down by the HR class

HR R BsR LWTS RC

- 3.08 3.06 3.79 3.03 ... BsR is better

1 4.62 4.62 4.44 4.66 ... BsR is better

2 6.12 6.12 5.00 6.41 ... BsR is better

3 7.65 7.65 5.62 8.37 ... BsR is better

4 9.03 9.00 6.07 10.29 ... BsR is better

5 10.55 10.49 6.73 12.45 ... BsR is better

6 12.33 12.32 7.52 15.35 ... BsR is better

7 16.22 14.32 8.34 18.27 ... BsR is better


How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

At the risk of being overly reductive, if a model like this doesn't have a use for baseball management then I'm not sure what the point of the research is.

*** The point of the research is to enlighten people as to how runs are really created. It doesn't have to have an application beyond that. However, if you want to properly value a player, you should value him on how he really creates runs. And BaseRuns helps in that regard for the extreme players.

BR could help with strategic questions concerning lineup selection or how to efficiently run your offense against a top notch pitcher.

*** That's possible, but I would not rely on BsR for that. Personally, I would use BsR to generate custom linear weights values, and THEN I'd use linear weights to assist in answering those questions. This is what I do, and I am very very confidant in the results I get from that.

...so I'm curious if you have a sense of how using BaseRuns should change the approaches we've all been using. And I apologize if this is a point you feel you've hammered home in your previous articles, because if it is I'm not sure I've understood it.

*** I don't think I've really addressed this issue. The approach is to get away from the "typical" run estimators, because they don't model reality. To quote someone's "Equivalent Runs" or "XRuns" or "Runs Created" almost makes it seem as if those estimators are accurate. They may yield accurate results in some or most cases, but the calculation to derive those calculations are not correct. Suppose that we know that 3 = 6 x .33 + 1. But, I come out and say, well, you know 3 is also equal to (6+1) x .429. I may end up getting the same answer using the same data, but the way I combined the data is wrong. But since most players and most teams do not deviate much from the norm, then, really who cares? It all works out.

However, I care about the extremes, about Pedro, Bonds, Thome, et al. And just because something works "on average" doesn't mean it works in the extreme.

So, to get back to your question, BaseRuns should change your approach as to how you view how runs are created, and should force you to question when you see a run evaluator that "works".

For low-level, or game-level actions, a custom set of LWTS or RE or WE charts is what you want (and I've provided some links above throughout the article).

============== Rob, here is the full "B" component I use. Just a caution: you DON'T need to have all this data. But I have this data, and this is what I am using. You will recognize most from the Retrosheet event files. If you want me to clarify some of the items, let me know. Note: because of "partial innings", you have to be very very careful (which is why I have that last entry). The short answer is that the RE chart at the bottom of the 9th inning of a tied game is DIFFERENT from the RE chart at any other point in the game. Again, if you need the long answer, let me know.

To all: Again, adding each of these components beyond the basics adds very very little to the accuracy of the run construction. But, for completeness, I am providing it.

0.73 Single 1.95 Double 3.13 Triple 1.69 HR 0.05 Walk (0.48) IBB 0.16 HBP 0.80 Error 0.28 Interference 1.43 OtherSafe 0.73 Sac (0.06) Strikeout (0.00) Out 0.81 SB (1.19) CS (0.51) Pickoff (0.35) PickoffError 1.05 Balk 1.17 PB 1.17 WP 0.56 DefensiveIndiff (1.06) OtherAdvance 0.00 FoulError (1.49) implied outs


How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

Italics: yes, this was all my fault. I did not have a proper closing italic tag, and it left everything subsequent in italics. I threw in a whole bunch of closing italic tags in my previous post just to make sure that there was no nesting going on, to close it off, and that seemed to work. Sorry about all that...


How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

Arvid:

1. It seems almost oxymoronic that BaseRuns doesn't do particuarly well relative to different levels of OBP, but does do very well relative to different levels of OPS. This suggests to me that there is some sort of interaction between the "getting on base" element and the "moving runners along" element that has been lost in the attempt to segregate those two things from one another. For one thing, the probabilities of particular batting outcomes aren't independent of the bases occupied during a given plate appearance.

*** As mentioned, BaseRuns is the first step at the score rate. It does very well with high OPS and HR classes, simply because it handles the HR properly. Further improvement is called for in cases where no HR are hit. This is why I mention that the search is on for a better score rate.

2. I suppose I'm still somewhat put off by the implications that BaseRuns is a "true" or "real" or otherwise aesthetically pure measure of run creation. Even if you look at data on an inning-by-inning basis, it is still an approximation:

BB-1B-1B-HR-K-K-K produces 4 runs, whereas HR-K-K-1B-1B-BB-K normally produces 1.

*** The model assumes somewhat random distribution of events. It does not purport otherwise. You can pick any single example, and any model will be wrong. You need the sample size behind it.

That, in aggregating data to the season level, unusual and random sequences tend to get lost in the noise, is as much an advantage as a disadvantage.

*** No need to aggregate by team though, since this introduces a bias. By aggregating on other terms, as I've done, you get "better" data.

Paul:

is sufficient to justify replacing existing estimators when measuring typical examples within the controlled set of major league baseball teams

*** As I said, if all you care about is the typical example, then the typical evaluators is all you need.

but you can hardly claim to be surprised to meet some challenges when you throw down a gauntlet like that.

*** I have no problem arguing against RC, since it does not have a basis in logic. Its basis is gobbledygook math that is fixed to the sample narrow data, and its flaws are exposed when taking it out of its environment. This is also true of static LWTS. This is not the case with BsR or with custom LWTS.

because as Davenport himself acknowledged, for most situations in actual MLB you’ll do just fine using 1.83 (or even 2), but that if your particular interest is in studying extreme examples, you need a custom exponent

*** I didn't know Clay said this, but this is exactly my position as well.

I’ve found Tango’s tone to be far more modest in subsequent posts on this discussion

*** I must be getting old these last few hours. I'll try to be more "O'Reilly Factor" from time-to-time.

Arvid:

BaseRuns is simply a refinement of linear weights

*** BsR allows the generation of custom LWTS. There's no other relationship between the two.

I would suggest that he discuss the offensive performance of the 2001 San Francisco Giants in terms of BaseRuns.

*** For that you need custom LWTS by batting order by the 24 base-out states. BsR is not appropriate, except to help in establishing the baseline custom values.


How are Runs Really Created - Third Installment

September 18, 2002 - tangotiger (www) (e-mail)

Paul:

No, I only meant modest on a relative scale, where 0 is the amount of arrogance to be found in an average Tango article. For God's sake, there's never a need to turn on Fox News.

Ah-hahaha... the Linear Weights Arrogance Tango Scale! I love it! As for Fox News, they are extremely biased. PBS, CNN maybe, and 60 minutes are really the only good ones out there. Seriously, watch BBC, or other world news and you get such a different perspective on the world. Did you watch those 3 Arabic-American kids from Florida on Larry King 2 nights ago? They were extremely believable, and given the choice between them and that lady, I'd choose them. Of course, the American media was all over them before the King appearance, and since then? Exactly.

You know what else they said when asked if they would sue her? No! They said no! How un-American is that??

Brian:

Excellent summary overall. I agree with almost everything, except

might be to attempt to seperate the run scoring, or the moving over(driving in) components

I have done this in Article 2, under the "building blocks of run creation". The separating into components is what you need to do to get custom Linear Weights components.

I'll have to think about your "8.2" concept. Sounds interesting.

I would like to also see the data shown grouped by number of triples in a game, number of walks in a game, number of doubles in a game

Sure, no problem. I'll try to get that done by this weekend (I usually run my research while my newborn is asleep, which is not often these days!).

It seems that if there is any inaccuracy here, it is likely in ... the relative weights assigned to the impact of the individual events

No, that is not possible. Those numbers were generated such that when using the plus 1 method it yields the exact LWTS coefficients determined by the play-by-play data. Therefore, the inaccuracy would be that we can't simply have such a simple "B" equation.

============

I will post the complete BsR equations that I used by the end of today. What I provided to Rob above was only the "B" equation. I neglected to also include the "Baserunner" portion as well.


How are Runs Really Created - Third Installment

September 18, 2002 - tangotiger (www) (e-mail)

2. Somewhat less accuracy in normal run-scoring environments.

This may very well be true, but that is only because the other measures (except LWTS) are "cheating" to get there. They ignore the constraints of a HR being at least 1 run, they ignore the constraint that you can't score more runs than you have runners, and so that gives them enough wiggling room to force in coefficients to the sample data they have to get the lowest RMSE possible.

Static LWTS values are derived from the pbp and therefore does no cheating. Well, it cheats in that its values can only be applied from the data it was generated from, the typical run scoring environment.

BaseRuns may be 1% less accurate in the typical environments but "50%" more accurate in the extremes. You (?) said that this is 1 step forward, 2 steps back. From my standpoint, this is 2 steps forward, 1 step back.

I don't like that the accuracy of the other formulas is fitted to the typical data, *especially since almost everyone then takes that formula out of that environment and applies it to Pedro, Barry, and Thome*. That little disclaimer is always ignored.

Anyway, to repeat: if all you care about is the typical, use the typical. If you want to know how the events interact with each other to produce runs in various run environments, then you need to use R = BR x scoreRate + HR. For now, BaseRuns is it.


How are Runs Really Created - Third Installment

September 18, 2002 - tangotiger (www) (e-mail)

I think you are onto something here

Thus total outs are being divided by 4.5. If you have 27 outs in a game, that would mean you are counting 6 of them, which it seems to me might be roughly the number of outs made per game with men on base in your data set.

That is not possible. About 45% of all PAs occur with men on base. 65% of all PAs are outs. Therefore, # of outs with MOB is .45 x .65 x 39 = 11

However, I did notice a very interesting relationship between the B component values and the LWTS values in the past. I haven't been able to quantify well yet though. I'm sure you are on the right path.


How are Runs Really Created - Third Installment

September 18, 2002 - tangotiger (www) (e-mail)

btw, the number I derived was 4.25. Not sure what to do with it yet.


How are Runs Really Created - Third Installment

September 19, 2002 - tangotiger (www) (e-mail)

If we go back to article 2, and the definition of the score rate (or just using common sense), we have:

% of runners scorings = (runners who score) / (runners who score + those who don't)

This is the score rate.

Runners who score is represented by the "B" equation. Though, as mentioned astutely in the earlier post, we should strip out the 4.25 (or whatever constant) to represent this actually.

The "outs" portion, the "C", of the score rate represents those runners who don't score, namely those left on base, and those outs on base.


How are Runs Really Created - Third Installment

September 23, 2002 - tangotiger (www) (e-mail)

Give me another week please on posting the formula. I'll have to write a whole article on it, as it's not as simple as I thought I could make it.


How are Runs Really Created - Third Installment

September 30, 2002 - tangotiger (www) (e-mail)

I've added a baseruns article, which is an addendum to the RC series. I apologize for not making it better, but I'd rather get it out there, rather than let it sit on my backburner.

BaseRuns Addendum


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 11:58 a.m., June 5, 2003 (#2) - tangotiger
  In a typical plate appearance, a player will not face the median pitcher, but the average pitcher. If you've got say 360 pitchers, but the BFP are spread out based on talent, with the 360th pitcher barely pitching, then the batter will faced a weighted version of the 360 pitchers, which comes out to exactly average.

However, the concept of median and other things that you can gather from these charts is certainly interesting. You just have to be careful in how you apply it, and its purpose.


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 1:11 p.m., June 5, 2003 (#9) - tangotiger
  Philly, I'll answer your specific questions in post8, and if you have other questions from post7, please rephrase them. I had a hard time following it.

1 - 1.00 is just a "fictitious" number, like Pamela Anderson is a 9.5 / 10, or what have you. I say "fictitious" in quotes because there is some reason behind it, but I haven't presented it here, though I will in the future. It's a number that can be multiplied and divided. A guy with .5 talent level compared to a 1.0 talent level is the same as a 1.0 talent level compared to a 2.0 talent level. That is, if you have a 1.0 hitter against a .5 pitcher, the resultant expected matchup is exactly the same as a 2.0 hitter against a 1.0 pitcher.

Those numbers might be roughly equivalent to a single-A pitcher (.50), an avg MLB hitter or pitcher (1.00) and Barry Bonds/Pedro (2.0). (Maybe top college and not single-A... I don't know yet.)

2 - If there are say 680x9x30 PA in MLB, and you have say 14x30 hitters, then the avg #ofPA per average hitter is about 430 PAs. A top hitter would have say 700PAs, or 160% of average, or 1.60. Something like that. I was thinking of putting actual numbers, say from a scale of 0 to 750, instead of 0 to 1.8, or whatever. I kind of like having the average at 1.0 though.


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 4:23 p.m., June 5, 2003 (#12) - tangotiger
  Chart 6 has a Y-axis labelled "Playing Time" 1-100. Draw a vertical line on that chart so that there is exactly as much Playing Time on each side. That line would probably fall between 4.50 and 4.55 (in any event, to the right of 4.45).

Yes, that's correct.

But 4.45 is defined as 1.00 ("MLB average") in chart 4.

4.45 is defined as the talent level of 1.00. The talent level of 1.00 is the MLB average.

Why is the mid-point of Playing Time chart (Chart 6) not identical to the MLB average (1.00) in Chart 4?

Because the distribution is not normal. Chart 6 is a multiplication of Chart 5 (with a very high skew to the right) and Chart 2 (with a very high skew to the left).

The mid-point of chart 6 (Playing Time chart) would be the *median* and not the mean. Because of the skew, this is pretty much what we expected.


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 4:31 p.m., June 5, 2003 (#13) - tangotiger
  I made an error in my article. If you multiply Chart SIX by Chart 4, you'll get 1.00. That's the same as multiplying Charts 2 (number of players at each level), 4 (talent at each level), and 5 (playing time at each level).

Vinay, you might remember I had a thread last week regarding "ERA by era" or some such. And in there, I showed that regardless of the run environment, your ERA relative to league average ERA was pretty constant. So, if you are Pedro with a 2 ERA in a league of 4, you should expect to be a 2.5 ERA in a league of 5. That makes his ERA+ as 200, or twice the league average. That gives him a talent level of 2.00, in a league of 1.00.

If you decided to double the number of MLB teams, the talent level would drop to say .83, and Pedro maintains his 2.00 talent level. I used 1.00 as a convenient marker, but it was fixed, so that I didn't always have to redo the baseline.

Consider the 1.00 level to be the avg MLB player in 2002.


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 10:29 p.m., June 5, 2003 (#17) - tangotiger
  Walt: I was trying to avoid technical terms so that I wouldn't get slammed for using normal, binomial, standard normal, etc improperly.

common or typical a normal distribution is

I don't think I said that a normal distribution was typical, but rather that the distribution that does exist in Chart 6 was a typical looking distribution (in lay terms).

For my education, if the median is to the left of the mean, what is that distribution called? Does a normal distribution imply that the mean and median are equal? Does a standard normal distribution imply that the 68% of the points fall within 1 SD and that the mean and median are equal?

Thanks...

Tom: yes, that's pretty much it.

Kevin: please explain the purpose of the equation. I don't know what it's trying to tell me.


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 9:43 a.m., June 6, 2003 (#19) - tangotiger
  Michael, thanks much for all that info.

I really don't get the use of the covariance for the ERA, insomuch as what it's trying to tell us.

I showed mathematically that an ERA of 2.00 in a 4 RPG environment is the same as a 2.5 in a 5 RPG environment. Therefore, an ERA+ type fits the bill.

However, if you are trying to use the covariance to say "how hard" is it to get a ERA+ of 200, based on the talent distribution of your opponents, or your peers or something, that's another issue. That maybe there's such diluted talent that an ERA+ of 200 in 1906 is equivalent to 160 in 1993 or something. That's really a whole other ball of wax (and really more in line with what I'm trying to do here, than what ERA+ equivalencies are normally used for).

It almost looks like Kevin is trying to work backwards by trying to infer what the talent distribution could have been to produce those results. It's a worthwhile exercise, but I must ask what is the confidence level and sampling error in doing so.


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 12:34 p.m., June 6, 2003 (#21) - tangotiger
  Well said.

Another thing that opens up now is "Regression towards the mean". What mean? The unweighted mean of all MLB players (say talent level .91)? The weighted mean by PA of all MLB players (talent level 1.00)?

The issue is with PA: is that based on a player's sample performance, or based on his "tools"? This becomes critical, especially for rookies. If I were to regenerate the charts, but only look at say 22 year olds, thing are going to get skewed differently. Are 22 year olds given PAs by talent or performance? What mean do we regress them towards? What's the difference between a performance level of a 22 year old of .85 in MLB than in TripleA?

The only reason he's in MLB is because he was selected there for some reason. If that selection was based on tools, that's one thing. But if it was based on sample performance, that's quite another.


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 1:32 p.m., June 6, 2003 (#23) - tangotiger
  Dave, you want me to multiply Chart 2 (number of players, per SD) by Chart 4 (talent level, per SD)? That resultant will be a weighting of the players, by talent level, per SD.

That is, if you had 300 players at 4.2 and 150 at 4.6, and the talent level at 4.2 was .8 and the talent level at 4.6 was 1.6, then you'd get "240" at 4.2 and at 4.6.

What would this represent?

If you want to multiply them all AND add then, to come up with one number, that's a different story. This would be the unweighted average talent level in MLB. The answer to that question would be 0.92. That is, if you take the 1500 players who play in MLB in any given year, and they were to each play the same amount, their mean talent level would be 0.92.


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 12:36 p.m., July 7, 2003 (#25) - tangotiger
  With these distributions, we are in a good position to establish how much "talent dilution" exists with adding or subtracting teams.

If we assume that the current 2003 average player has a talent level of 100, what would happen if half the teams were to disband, leaving us with 14 or 16 teams? What would the average player look like?

I figure that the average player in such a league would have a talent level of 110, which is roughly equivalent to a player that is right now about +1 win / 162 GP over the 2003 average player.

How about if we double the number of teams from 30 to 60 teams? These talent distributions say that the average player in such a league would have a talent level of 90, or about -1 win / 162 GP from the average 2003 player. Troy O'Leary or Ricky Ledee would be an average player in such a league.

So, when we talk about adding or subtracting 4 teams, how much impact is that? This would have the effect of making the player with the 97 talent level or 103 talent level average. Effectively, this would be imperceptible to the viewer.

So, when people talk about "talent dilution", it's hard to see it, if you are talking about adding/subtracting 2 or 4 teams.


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 9:02 p.m., July 7, 2003 (#27) - tangotiger
  At fanhome I did a very long study on timeline adjustments. However, based on assumptions, you can make the case that Ruth's era had half the talent as today, making Ruth average today. Using a very slightly different assumption, Ruth then became much much better.

The study also suffered from my non-understanding (at the time) of regression towards the mean, and my non-understanding that a player's performance is only a sample of his true talent, and not representative of his talent. This basically invalidated everything I've just said about Ruth.

I've seen this error repeated when looking at aging patterns for pitchers, where regression towards the mean is much more important to understand. You can spot these studies when they show a pitcher's peak age to be 23.

I was going to rerun that study (eventually).

Using the talent distributions that I've listed here would probably be best to be used as explanation after the fact, than trying to lead to a conclusion.


SABR 101 - Relative and Absolute Scales (June 6, 2003)

Discussion Thread

Posted 7:43 a.m., June 9, 2003 (#6) - tangotiger
  Because James views "negative" value as being "bad", so bad that you'd be better off as 0 runs above average in 5 PAs, than 5 runs below average in 1000 PAs. But, the key to understanding this issue is the point I made in the article regarding the "key" point.


SABR 101 - Relative and Absolute Scales (June 6, 2003)

Discussion Thread

Posted 12:59 p.m., June 9, 2003 (#9) - tangotiger
  But those lists in Total Baseball are implicitly intended to be a ranking of players according to their value

Palmer is wrong in how he sells Linear Weights by doing this.

James is wrong in saying that because he doesn't buy the TB ranking, then he can't possibly buy Linear Weights, and then go on and start telling you why he can't buy Linear Weights. James can't get past the concept of "zero" not meaning zero in an absolute sense, where zero is "absence of something". Zero is defined in LWTS as average.

The issue I have with James is not that he doesn't buy it, because I really don't care. I also don't care too much that he gives weak arguments against Linear Weights. My issue with James is that he has an enormous amount of influence (more than all of us put together), and the reader has a certain amount of trust in James, that they won't feel they have to do the dirty work to validate what James is saying, and that James then puts out these weak arguments that it takes us 20 years (and counting) to undo the damage.

That run-on sentence means: James has to be responsible with what he says, becauses people treat him as judge, jury and executioner. James derives (or at least derived) his income by getting people to buy into what he says, and he should be more responsible with his analysis.


SABR 101 - Relative and Absolute Scales (June 6, 2003)

Discussion Thread

Posted 10:56 a.m., November 12, 2003 (#12) - tangotiger
  Just bringing this one forward as a companion to Patriot's article.



Velocity loss of a pitched baseball (June 10, 2003)

Discussion Thread

Posted 7:49 a.m., June 17, 2003 (#7) - tangotiger
  Must be a typo. I think some tennis player just hit 147 mph over the weekend. The max is around 102 or so.


SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)

Discussion Thread

Posted 9:41 a.m., June 11, 2003 (#4) - tangotiger
  Andrew, your suppositions are plausible, though I don't think the sample size at this level will make it statistically significant. That would be my guess as well though, that walks are more likely to be issued by below average pitchers, and to the top of the order.

As for lineup slot, yes, I could do that as well, but a few problems
1 - sample size will definitely play an issue here (I think I'd have to do this for the whole retrosheet years, which really isn't more time, but I just need a more powerful computer)

2 - selective sampling (Mike Piazza would be 3% of the cleanup spot instead of a random noise in the overall, which makes that pretty significant, unless all cleanup hitters are of the same "type" as Piazza.. that may be true, but that won't be the case for the #1 or #2 hitters)

3 - we'd want to separate pitchers batting to not

4 - most importantly, I find the current chart already a little unweildly, and I can't imagine readers enjoying NINE of these, and really, if I separate by pitcherss, 18 of them!

However, I know I would, and I'd guess you would as well!


SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)

Discussion Thread

Posted 3:39 p.m., June 11, 2003 (#8) - tangotiger (homepage)
  Andrew, I did not take it as such... just pointed out some areas to think about.

Jim, I didn't talk about win probability here, as that would be another topic. However, if you are interested in that, please go to my site (see homepage link above), and there's plenty there for you. The two important things I've done are:

1 - I've give you the win expectancy for inning/score/base/out for 7th inning and on, with score differential of 1 or 0

2 - Created "leverage" situations for ALL innings, ALL base/out with score 3 runs and less, which you could use for pinch hit talk, and to a slightly smaller extent, bullpen usage

Hope this answers at least some of your questions.


SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)

Discussion Thread

Posted 3:42 p.m., June 11, 2003 (#9) - tangotiger
  As always, there are other variables to consider:
- batter/pitcher matchup
- batters due up
- potential pitchers due up
- runner speed
- fielding talent and positioning
- park

in addition to the inning/score/base/out. But, I'm limited in my time, otherwise, I'd love to generate a WE that incorporates all this.


SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)

Discussion Thread

Posted 1:18 p.m., June 12, 2003 (#11) - tangotiger
  The way to read it is:
with man on 2b and 0 outs, there's a certain run expectancy from that point on to the end of the inning, based on the actual chain of batters that faced that situation

with man on 2b and 0 outs, and an IBB then issued, there's a certain run expectancy from that point on to the end of the inning, based on the actual chain of batters that faced that situation

The chain of batters are not necessarily random (well, they probably are in the first case), and certainly not random in the second set.

Therefore, you have to be very careful in how you read the chart and trying to make comparisons, and what-ifs, etc.


SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)

Discussion Thread

Posted 7:55 a.m., June 13, 2003 (#13) - tangotiger
  No.

In order to establish the validity of the IBB, you first need to use Win Expectancy, as I did when I looked at "When to walk Barry Bonds" last Oct. To do that, you have to do some work behind the scenes, to establish a "what-if" scenario.

The empirical results are just what they are.


SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)

Discussion Thread

Posted 8:12 a.m., June 14, 2003 (#15) - tangotiger
  http://pub119.ezboard.com/fbaseballfrm8


Applications of Win Probabilities (June 13, 2003)

Discussion Thread

Posted 9:51 a.m., June 15, 2003 (#2) - tangotiger
  It's always important to use the right tools for the right job. Phil's data is completely empirical, and therefore, the situation you face yourself in should be similar to what the empirical shows.

If you have additional variables like "pitcher is 20% abve average", the hitter at bat is 30% above average, but the batter on deck is 10% below average, and the hitter after him is 5% above average, you have to create your model to reflect what it is you want. Empirically, you won't find the sample size to match that. Which is why you need a Markov chain that handles all this (or you run a sim).

The empirical data or a basic Markov may help you and guide you to an answer, but if the variables not being considered are very relevant, your conclusion will be suspect.



Hitting the cutoff man (June 13, 2003)

Discussion Thread

Posted 9:54 a.m., June 16, 2003 (#2) - tangotiger
  Sylvain,

I know you said drag is not considered, but that has to be a key consideration right? I mean, I'm sure Tim Raines can throw a ball at 75mph, but he'd never be able to throw it 330 feet, certainly not on the fly. I'd also think that maybe a throw on 1 or 2 bounces would be better, in terms of "accuracy".

Thanks for producing your work, as it is very fun to try to decipher!


Hitting the cutoff man (June 13, 2003)

Discussion Thread

Posted 11:15 a.m., June 16, 2003 (#4) - tangotiger
  No need to go to the trouble for the pdf file. I meant deciphering not in your presentation layout, but in the equation itself. I'm trying to remember my physics classes from 15 years ago.


Hitting the cutoff man (June 13, 2003)

Discussion Thread

Posted 9:26 a.m., June 17, 2003 (#8) - tangotiger(e-mail)
  Sylvain,

Send me the file, and I'll ask Primer to post it.


Hitting the cutoff man (June 13, 2003)

Discussion Thread

Posted 1:46 p.m., June 17, 2003 (#9) - tangotiger (homepage)
  Sylvain's file has been posted to the above link. It is a Excel file. If you have trouble opening it, right-click the "homepage" link, and "save target as".


Hitting the cutoff man (June 13, 2003)

Discussion Thread

Posted 12:05 p.m., June 18, 2003 (#10) - tangotiger
  Sylvain has issued an update (with graphs!). You can find it at the same link from post #9.


SABR 301 - Rocco Baldelli, sabermetrician (June 16, 2003)

Discussion Thread

Posted 5:10 p.m., June 16, 2003 (#1) - tangotiger
  And try to answer that question WITHOUT referencing a player's K or BB numbers.


SABR 301 - PZR - Blueprint (June 17, 2003)

Discussion Thread

Posted 1:15 p.m., June 19, 2003 (#2) - tangotiger
  I agree that we don't know what the knucle advantage is, especially since we've got such a small sample to deal with. It could even be "good knucklers versus bad knucklers" have a huge gap in $H, while the "good flyballers verus bad fylballers" have a smaller gap in $H, etc.

In all cases, we are always comparing a MLB relative to other MLB pitchers, who *may* have been specifically selected to play in the majors because they can keep the $H down. Lots of work ahead of us.


SABR 301 - PZR - Blueprint (June 17, 2003)

Discussion Thread

Posted 1:56 p.m., January 13, 2004 (#3) - tangotiger
  I'm bringing this thread forward in conjunction with the True Talent Fielding thread.


SABR 301 - PZR - Blueprint (June 17, 2003)

Discussion Thread

Posted 9:47 a.m., January 14, 2004 (#5) - tangotiger
  I know Tango is now thinking, "What took that idiot (boor) MGL so long to figure this out?"

Not at all. I only criticize your memory!

I'll comment on the rest of your post in a while.


SABR 301 - PZR - Blueprint (June 17, 2003)

Discussion Thread

Posted 10:40 a.m., January 14, 2004 (#6) - tangotiger
  Actually, this PZR thing is getting more complicated than I thought. I'm going to need some time to try to sort things out.

(For those also trying to work on this, there are two layers to consider: things that the pitcher controls and those that he doesn't. We probably also want to consider HR and not just BIP. We don't want to consider the end-result of the play, so that we can ignore the fielder's impact.)


SABR 301 - PZR - Blueprint (June 17, 2003)

Discussion Thread

Posted 12:50 p.m., January 14, 2004 (#8) - tangotiger
  PZR should be calculated independently. But, as a test, they should all add up.

Team UZR does not necessarily apply equally to all the pitchers on the team (if for example you have a great CF and you have a pitcher that rarely allows a ball to that CF, then he won't benefit as much, etc, etc).

The point of PZR is that we should be able to get it independent of the fielders, while, as a test, the fielding + pitching + park should get you the total of defense on contacted balls or BIP (not sure which yet).

The handedness issue is an interesting thought. After all, RJ gets to face a disproportionate RH because he is a LP.


SABR 301 - PZR - Blueprint (June 17, 2003)

Discussion Thread

Posted 2:10 p.m., January 14, 2004 (#11) - tangotiger
  MGL, you can't just blanket use the team UZR on each pitcher. If you have 3 great fielding OF, and 4 poor fielding IF, and you have 1 GB pitcher and 1 FB pitcher, you can't given them the same UZR runs / BIP impact.

What if you have a great SS,3B,RF, but poor other fielders?

If you are going to go down the path of adjusting a pitcher's stats by taking the UZR of the fielders, then you'll have to do it one position at a time.

I really have no problem with this, and it should be done.

I'm (trying to) offer a way to do PZR without needing to know about UZR. But, by calculating PZR, UZR and parks, you'll end up with a team's DER.



Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 3:37 p.m., June 19, 2003 (#7) - tangotiger
  Varb: it would be interesting to see if the spread of hitting talent is tighter across years, and not concentrate so much on the position.

Rob: yes, pure speculation, but a variable to consider.

Ross: there are about 5.4 pitches thrown per BB, 4.8 per K, and 3.3 per BIP. So, if you have a pitcher with lots of (BB+K) / PA, you can bet that that pitcher will have more pitches / PA than league average. Dan Quisenberry I will wager had the lowest pitch / PA count of any pitcher with at least 1000 PA in the last 50 years. (If not lowest, then in the bottom 10). Just for fun, go look at any active pitcher's (BB+K) / PA, and you will see a direct correlation to pitches / PA. MLB has the data from 1998-2003 on their site. Benitez is I think the #1 guy in (BB+K)/PA and pitches/PA (or pretty close).

Is this what you are getting at? Otherwise, please restate the concern.

- higher OBA means more batters / game meaning tougher to get a complete game

I'm not sure why you are talking about the 50s, when the low OBA was probably in 1967/68, but let's say there are 25 batting outs / 27 outs (I realize that a pitcher with few baserunners will be closer to 27/27).

Anyway, if your OBA is .300, that means .3 safe plays per .7 outs, or 10.7 safe plays per 25 batting outs, or 35.7 batters. A .350 OBA works out to 38.5 batters. (Just an example here). So, as the OBA rises, so does the number of batters per 27 outs. With more batters available because of the run environment/playing conditions, the more pitches needed overall.

I agree with your last point that there are more variables at play for pitches thrown across era. I've got a working model, though I have no data to validate it against. Essentially, it shows Cy Young with, if I remember right, 2.8 pitches / batter, and Nolan Ryan with 3.9 pitches / batter. (or somewhere around there.) Eventually, I'll present a mathematical proof that shows the relationship between K,BB,BIP and pitches thrown. This is as much as I can say for the moment.


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 4:34 p.m., June 19, 2003 (#8) - tangotiger
  This article would have been much more interestiung if Joe had gotten a hold of a pitch count estimator like Tango's and actually checked how many pitches were being thrown per start by today's aces versus yesteryear's.

To expand on this point, and tie it in to Ross's point, our estimators are based on "all things being equal". I am reasonably confidant that the extended pitch count model (unpublished) that I have works with the 1990s to today pitchers. I wasn't certain how well it would hold up to the 1960s pitchers, and so I was very happy to see how well it matched to Koufax.

So, for Koufax, and for the 90s pitchers, the way batter/pitcher matchup exist in terms of "all things" we can say that they are pretty equal. Because it worked on Koufax, will it work on all other pitchers of his era? I would bet yes, but I'm not sure. If it works in the 60s and 90s, should it work in the 70s? I would say yes, but Sheehan brings up a good point about maybe baseball changed the way they played in the 70s. I can't just discount it, especially since Ryan and Carlton and a few others had kinda high pitch counts. If you look at the Ryan progression, he completely tails off at around age 30. It's kind of remarkable, and I don't know if he was injured or what happened. Maybe the Angels didn't want to wear him out so he can resign with them as a free agent? But again, I would bet that if we really looked at it, baseball was the same from the 50s to today, in terms of batter/pitcher matchups, and how often balls/strikes were thrown, etc.

But how about earlier? Well, again, the more you go back, the less likely things are the same. You get to the point where you have very few BB+K per PA across the whole league. Those pitcher/batter matchups may not be "all things equal".

To get back to the point, I agree the article would have been enhanced if he used my estimator or someone else's (it seems Nate/BP also has an estimator). However, the article is excellent, and gives a good push for people to tackle some of the issues, most notably, the talent distribution across eras.


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 4:58 p.m., June 19, 2003 (#11) - tangotiger
  there are about 5.4 pitches thrown per BB, 4.8 per K, and 3.3 per BIP

Has that been consistent across era's?

It's not even consistent among pitchers of the same era. However, I've got a working model that takes the "type" of pitcher you are, and spits out the pitches / event. It's quite logical, but there's room for error. Because of the way that model works, it's transportable to other eras, assuming that the way batters/pitchers approach each other in terms of throwing/looking for strikes is the same. For this reason, I'm not too crazy about publishing this until I get some data to at least point me to the right answer.


So, if you have a pitcher with lots of (BB+K) / PA, you can bet that that pitcher will have more pitches / PA than league average

Well you can bet on it, but it may not be true. The average pitches thrown for each category may vary widely by pitcher.

It actually does vary widely. But, I think eric's point explained it very well in post 6. I won't extend my answer more than that, and my bet would be there. Of course I could be wrong, but that's why betting is fun!

Is this what you are getting at?

No. Although it is part of it. The question is what are the effects of the difference in K's and walks between eras, which is the way Sheehan is using it. It is perfectly possible that 50 years ago more 2 strike counts ended in a ball in play rather than a third strike. That does not change the number of pitches at all.

Sure, it's possible. As I said, we don't know what the strike/ball matchups were for pitcher/batter.


So, as the OBA rises, so does the number of batters per 27 outs.

Assuming that the number of double plays, caught stealings and runners thrown out advancing are all the same. I don't think they are.

I agree, which is why I added my provision. But again, what's the impact here? I'm sure I can come up with an r of over .95 between OBA and PA/27 outs historically (if I had the data back then).

With more batters available because of the run environment/playing conditions, the more pitches needed overall.

Assuming the number of pitches per batter is constant for all pitchers and regardless of the number of batters.

Assuming the same K and BB rates, I'd say yes, you are right about your first part.

I agree with your last point that there are more variables at play for pitches thrown across era

I think there is a fundamental problem with using average results for comparing elite pitchers. The question is does Roger Clemens have to face more batters and throw more pitches to accomplish the same things Steve Carlton did. The fact that Sean Bergman gave up a lot of baserunners or Bobby Witt struck out a lot of batters and walked a lot of batters doesn't seem to have much to do with that. And yet when we compare era totals they are included in the analysis - it may be there were fewer Bobby Witt's and Sean Bergman's pitching in the mid-60's but that doesn't make Steve Carlton's job easier than Clemens.

I think I understand what you are saying, but then I don't. I agree that you don't necessarily want to use the basic pitch count estimator, as it won't work too well on the extreme pitchers. But then again, that basic estimator was 2% off for Clemens and Koufax compared to their actual totals.


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 7:51 p.m., June 19, 2003 (#12) - tangotiger
  Ross, just so I'm not going to go nuts again, here's what I'm going to do. You let me know if this will satisfy the issue.

Select
- all starter seasons
- from 99 to 02
- min 400 PA in that season

Calculate
- seasonal OBP
- seasonal PA / 9IP

Then run a regression of OBP against PA/9IP. I expect to get an r over .90, and more likely over .95.

My contention is that the more runners on base / batter faced, the more batters at the plate / 9ip, regardless of pitcher. This I think is rather obvious, so rather than wasting my time again, you tell me what you want me to run, and what contention you are positing. Thanks...


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 11:30 a.m., June 20, 2003 (#17) - tangotiger (homepage)
  So there is no reason to think that a pitcher of the same quality will need to throw more pitches to complete a game today than they did 20 or 30 years ago.

I don't think I was really talking about this, but I agree. The number of pitches a pitcher has to throw is dependent on the style of the pitcher/batter, in which both are dependent on the skillset of the pitcher/batter and the "run environment", which is partially provided by the pitcher/batter in question.

This is neatly evidenced by Bartolo Colon 2001/2002, who must have changed his pitching style drastically (assuming he maintained the same quality) in order for him to have that change in pitch count, walks and K changes, etc.

See above link for his stats. Looking at his 2002 to today (about 1400 batters), his pitches/batter was: 3.6. In 1999-2001, he averaged 4.0 pitches/batter.

His (K+BB)/PA rates in 1999-2001 was: 32%. In 2002 to today, he's at: 23%.

Looking at his HR/H:
1999-2001: 12.5%
2002-today: 10.6%

His (BB+H)/PA:
1999-2001: .319
2002-today: .297

So, Bartolo changed his style to the point that batters were no longer going deep in the count, probably because Colon was giving what looked like more "hittable" pitches (though they probably weren't). The net effect is that by not going deep in the count (or by batters not taking Colon deep, who knows), Colon:
- reduced his pitches/batter
- reduced his walk and K rates per batter
and as a side effect
- improved his overall performance level (though not necessarily to a statistically significant degree... I didn't check).... that is, maybe Colon skillset remained the same, but he found a more optimal pitching style to increase his performance level.


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 12:22 p.m., June 20, 2003 (#18) - tangotiger
  In 2002, Colon went to 3 balls or 2 strikes: 501/966: 51.9%

In 1999-2001, he did that: 41.6%

Boy, that's not at all what I was expecting. Assuming I didn't make a mistake somewhere, Colon managed to get to 3 balls or 2 strikes MORE when not concentrating on striking players out. Yet, somehow, he ended up with fewer pitches/batter. Not sure about 2-strike fouls changing. I can only guess he had more pitcher's counts, so that when he got to 2 strikes, he did that with getting fewer balls.

Let's see.... ok, here's how many balls/PA thrown, when Colon managed to get to 2 strikes:
year.. PA with 2 strikes... balls/PA
2002 455 1.58
2001 459 1.77
2000 432 1.77

Interestingly, he managed to get to 2 strikes while throwing fewer balls, which may explain some of his good performance (more pitchers counts, and fewer pitches to get there).


Koufax Pitch Counts (June 19, 2003)

Discussion Thread

Posted 3:20 p.m., June 19, 2003 (#2) - tangotiger
  Rally: no they are not, but I'll see if I have permission to post them for you.



After Sabre-School Special (June 19, 2003)

Discussion Thread

Posted 3:46 p.m., June 19, 2003 (#2) - tangotiger
  Park: well, that makes alot more sense. I should have verified what he did. That looks just about what you and I discussed a year or two ago, about putting the PF at the center of the time period.

28/29: well, .28 is my preference, because I look at extreme teams. .29 is the best-fit for actual all teams.


After Sabre-School Special (June 19, 2003)

Discussion Thread

Posted 7:47 p.m., June 19, 2003 (#5) - tangotiger
  I agree with Patriot that I don't think you'll see any difference (max 1 run / 600PA is my guess), especially since a random team really isn't that different from the average. A team .300 to .360 OBA is no big diff really.


After Sabre-School Special (June 19, 2003)

Discussion Thread

Posted 7:48 p.m., June 19, 2003 (#6) - tangotiger
  I mean no big diff for what's being proposed here.


Actual Pitch Log for Koufax, game-by-game (June 20, 2003)

Discussion Thread

Posted 1:21 p.m., June 20, 2003 (#3) - tangotiger (homepage)
  Keith Woolner passed on the above link to me, which I rememeber reading, but I quickly forgot. Some excellent stuff in there, specifically about how the avg pitches / start was the same between the Old Dodgers, and our current pitchers, but that the distribution was much different, with starters being pulled out much much faster than today (and as well kept going longer and longer in other outings). Seems like managers have all agreed in the last 60 years that 100-120 pitches is what your top starter should average per start, but have not agreed on how that distribution should be reached.


Actual Pitch Log for Koufax, game-by-game (June 20, 2003)

Discussion Thread

Posted 2:41 p.m., June 24, 2003 (#6) - tangotiger
  Well, if you maintain your pitch count level and improve your BB rates, then you'll get to face more batters (which is what happens here). On top of which, he got better, meaning he got more outs/batter, and increased his chances for a CG.

I'm still shocked his overall average is essentially what a 1990s starter would get.



Making Money (June 23, 2003)

Discussion Thread

Posted 8:48 a.m., June 24, 2003 (#2) - tangotiger
  You don't have to look for the article as it is in the link of the title of this thread.

Your quote of mine was me referring to Voros' equation of

Team Revenue = (W% * $430,169,580) + (Metro Population * $3.46) - (Teams in Metro Area * $27,962,685) + (Home Playoff Games * $2,446,043) + (Per Capita Income of Area * $2,655.60) - $160,287,379 + ($22,906,159 if the team’s stadium is less than two years old).

So, the non-winning variables are: population, teams, per capita income. I meant "all things equal" among these three, and let's only focus on the relationship between winning and revenue.

I also said that he has an extra variable for playoffs, which will force a curved relationship for the higher achieving teams. According to the Voros equation, each home playoff game adds 2.5 million$. If you are the "upper echeleon" team, maybe you expect to add an extra 5 million$ of which you'd give 4 million to the players. So, instead of having an 80 million$ payroll to finance an 87 win team, you really have 84 million$ payroll, or 2 mlliom$ instead of 1.85.

I also didn't say anything about any conspiracies, but rather that there's a fundamental equilibrium point that would force the free market to center around.

So, what exactly am I saying that you disagree with?


Making Money (June 23, 2003)

Discussion Thread

Posted 5:06 p.m., June 25, 2003 (#4) - tangotiger
  Steve, thanks for writing back.

I am surprised that the effect of revenue works as it did, rather than as a multiple of the other factors (like population, etc). Makes life easier though that way... And the playoff impact wasn't as great as you would have thought, though I am surprised.

It's good to remember that Voros' equation is based only on 3 years of data, and so, we would have issues there. It would be worthwhile to re-run for the last 10 years.


Making Money (June 23, 2003)

Discussion Thread

Posted 8:00 p.m., June 26, 2003 (#6) - tangotiger
  I suppose what we want is "disposable net income". NYC residents and workers also pays a city tax.



Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 2:39 p.m., June 24, 2003 (#2) - tangotiger
  Well, I'm not so surprised that Percy gets 1/4 of his batters when they basically don't count, but that that's the best figure, and that the top relievers get 1/3. That's seems a bit high to me. I'm guesssing 15-20% would be a better target?

I could understand if say Mariano comes into the 8th in a high-leverage situation and gets out of the jam, and then the Yanks take a big lead, making the 9th inning a mop-up job for Mo (though you could consider taking him out if that happens). But, this is not what usually happens with the current fireman.

Give me the usage pattern of the late 70s to mid 80s.


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 4:25 p.m., June 24, 2003 (#5) - tangotiger
  Rally: I looked at Gossage from 82-86. His line was: 20,9,14,27,31. That's probably what we should shoot for.

Tribe: Yes, you are accurate. Remember what we are NOT measuring. We are NOT measuring the performance level of a pitcher. What we ARE measuring is the level of fire a pitcher finds himself in. If you are John Franco and you have a propensity to get men on base, and then work yourself out of a jam, well, that's a big fire you had to put out, even though you were the arsonist.

As well, the manager is making the call to leaving in Franco after every batter. I can't say that "If Franco stays in, the LI is 1, but if Strickland comes in to bail him out, the LI is 2", can I?

To measure performance, you need a "win probability added" type of measure (which I have, and have shown some results in the past). What this does is combine the performance of the pitcher with the leverage of the situation. So, say you are at LI 1.0, and you give up a hit. That brings you to LI 2.0, and you give up a walk. That walk will count as "2 walks". Now you are at 3.0, and you give up a HR. That counts as "3 HR with 2 men on each". At this point, the game may be out of reach, at which point your LI is now 0.3.

Leveraged Index is NOT this measure. It's one-half of what you need to measure the performance of a reliever. Eventually, I'll combine the two for a complete measure. For the moment, take my LI and multiply it by Wolverton's ARP. That'll get you close enough.


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 6:37 p.m., June 24, 2003 (#10) - tangotiger
  Nick, maybe you should sell someone at FOX on the idea! I love the story!

Jim, the example is based on figuring out what the win probability is at each situation, and the possible distribution of win probabilities based on each possible outcome. That "variance" gives you the leverage. Check out Phil Birnbaum's article that I posted elsewhere on this STUDIES section of Primer as he gets into it as well. I figured this mathematically, for all possible inning/score/base/out, with the score +/- 18 runs.

As for the cutoffs, they are arbitrary, but I tried to group them so you get 50% in the first grouping, and then about 10-15% in each of the other 4.

FJM: I started to do that a little while ago, with Urbina, Shuey, Stanton, and Benitez. Eventually I'll do it, but I'm just trying to find the best way to present it. My feeling is that I will not find a difference.


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 8:06 p.m., June 24, 2003 (#12) - tangotiger
  I agree, when he was brought in would ALSO be a good metric. Heck, when he was REMOVED would also be a good metric too. Both are a snap to calculate as well. A manager therefore can do 3 things:
1 - when does he put him into the fire, and how big is that fire
2 - how long does he let him stay in the fire, and how big is that fire while he's battling it
3 - when does he take him out, and how big is the fire when he leaves

You can look at 1 and 3 and say that the difference in fire is his performance level, but
a - if he pitches multiple innings, then his offense can help/hurt
b - his fielders always help/hurt

Drinen handles #1/#3 by taking him "out" at the end of the inning, and inserting him back "in" at the start of the next inning. It's a good way to do it.

Like I said, there's alot of descriptions you can generate using the LI concept. For the moment, I've only done #2. Drinen has done #1 and #3.


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 4:41 p.m., June 25, 2003 (#14) - tangotiger (homepage)
  What might be interesting for this chart is an indication of each reliever's percentage of the team's total reliever innings at each leverage category.

You are absolutely correct. I actually did this for the Yankees (see homepage link), and I should do this for all the teams.


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 5:11 p.m., June 25, 2003 (#16) - tangotiger
  Walt, good idea! I can try to pick out say the "typical" LI for each class.

As for the % of distribution, I really wasn't sure what to expect. 45% for 0-0.5, 15% for 0.5-1, 10% for 1 to 1.5, 8% for 1.5 to 2.0, 7% for 2.0 to 2.5 among our group of relievers here. I guess that's an ok distribution, though I was surprised by the big dropoff from 45% to 15%.

Essentially, half the PAs in MLB have very little value.


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 9:04 p.m., June 25, 2003 (#23) - tangotiger
  LI *is* normalized to 1.0, so your equation holds for PA, by definition.

From 1974-1990, the LI for starters was 1.01, and relievers was .98. I haven't checked the 99-02 period, yet.


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 6:37 a.m., June 26, 2003 (#25) - tangotiger
  Doug, your concern was also expressed by post #13, and I addressed part of that in #14.

Btw, the overall LI, by team, varies by about 0.1. I don't think you will find quite the distribution differences that you might be looking for. But the concern is definitely valid.


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 9:40 a.m., June 26, 2003 (#27) - tangotiger
  You know, I spoke too quickly. ARP includes base/out, but not inning/score. So, I can't use LI and multiply it by ARP, even for a basic verison.

Yes, all you need to do is calculate the change in WE (win expectancy). This is what Drinen did in the other article I posted (WE before pitcher came in, after he left, and give difference to pitcher. In multi-innings, assume the pitcher left the game, and came back in, so that the change in offense won't affect him). I've actually also done this, and posted the results somewhere.

But, I cannot use LI directly to get into my change in WE.

The issue comes with "crediting" the fielders. Right now, I give it all to the pitcher (which is why I'm not too crazy about publishing my current results). Same with hitters, and I give all the credit to the hitter, and none to the runners.


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 2:21 p.m., June 26, 2003 (#29) - tangotiger
  Jim, check out the other link I added, called "Win Probability Added".

What I intennd on doing (eventually) is
1 - win probability added using inning/score/base/out
2 - win probability added using no context
3 - calculate hitter's LI

Take #3 and multiply by #2. This gives you his win probability, if he were to hit the same regardless of the leveraged situation.

Compare the the result of this to #1. The difference is the player's "clutch" performance.

To measure his underlying clutch skill, you'd check year-to-year correlation.

Now, if you read the Hidden Game, I believe they did this kind of thing with the Mills' Brothers PWA (2 years only). We're in a position to do it from 20+ years.

Not now though.


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 11:02 p.m., June 26, 2003 (#30) - tangotiger
  I wrote this on Jan 2, in one of the Primer articles, and it bears repeating:
==========================
What we are after is *not* to maximize a pitcher's LI, but rather to maximize their leveraged-innings (LI x IP). LI of 1.00 with 120 IP will have the same win impact as 1.50 LI with 80 IP to a reliever. Of course, it's not that simple, as you have to take the totality of your starters and relievers, and maximize the leveraged innings for the good pitchers, and minimize the leveraged innings for the bad pitchers, such that all innings are accounted for. You have other constraints as well, with respect to the tiredness of a pitcher's arm, etc.

Mark Eichorn, for example, had 200 leveraged innings (LI of about 1.3) in his great year. That is an excellent total


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 1:48 p.m., June 27, 2003 (#32) - tangotiger
  THat would be good to know, but I'm not sure what you mean about "adjusting".

LI is a reflection of the game state. It is dependent only on the inning/score/base/out.

Certainly, if Bonds were at plate, the LI would be different. Certainly, if Clemens is throwing 160 pitches already, the LI would be different. But, I'm trying to keep the player's identity out of the picture. (For this purpose anyway.)


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 4:42 p.m., July 11, 2003 (#34) - tangotiger (homepage)
  Phil,

See above link for a discussion on wearing out the pitcher. There's also a cool chart by MGL in post #25 I think it was. I'd ignore the back/forth between Steve and Ross... they no like each other.

As for your other comment, I think this definitely has value. I've been meaning to break down the performance of all relievers by classes of leverage. It would be interesting to see at the team level this difference, as you propose. Something tells me that we shouldn't find much difference.



2003 Win Shares, updated (June 24, 2003)

Discussion Thread

Posted 12:31 p.m., June 25, 2003 (#5) - tangotiger
  The highest single-season LI that I have found was 2.3. No way is any pitcher getting an average of 3.

10-15% of all batters for a team occur with an LI over 2.5. That translates to about 60 innings. I think the single-season record for a single pitcher is probably the equivalent of about 40 innings. The current relievers have the equivalent of 25 innings of LI over 2.5.

So, if you've only got 25 innings with an LI over 2.5 for the season, you've got 50-60 innings with an LI under 2.5.


2003 Win Shares, updated (June 24, 2003)

Discussion Thread

Posted 3:54 p.m., June 25, 2003 (#7) - tangotiger
  Oh, I wasn't trying to say whether WS was valuing Gagne correctly or not. Gagne's LI last year was 1.8, so if he has a 2 this year, I can accept that.

In the case of Gagne and Nomo, if you have something like
Gagne: 40 IP, 1.50 "component" ERA
Nomo: 120 IP, 2.50 "component" ERA

our LI comes in and treatss Gagne's 40 IP as 80.

The question then is if you want a 1.50 ERA in 80 IP, or a 2.5 ERA in 120 IP. Compared to say an average of 4.00, that makes Gagne worth +22 runs and Nomo is +20 runs. Compared to a replacement of 5.00, then Gagne is worth +31 runs, and Nomo is +33.

I see no issue where you have a reliever having 1/3 the innings, but if he is more effective, then he could be worth the same.


Redefining Replacement Level (June 26, 2003)

Discussion Thread

Posted 10:31 a.m., June 26, 2003 (#2) - tangotiger
  Patriot, good stuff! I had forgotten about that equation. (I think I was looking for something where it would cap at a ratio of 1, and never got around to it.)

The other thing that we should remember is that Nate groups them by classes, when really, we want the aggregate to that point. For example, if you have a 10-yr time frame, you don't want the guy with 5000 AB, but *all* the guys up to the guy with 5000 AB. I'm not sure if I'm saying it correctly.

From that standpoint, I would guess that you might get similar results (as this would slow down Nate's curve) using completely different approaches, which as you say, is a great thing.

As well, we have to remember about "chaining", which is a concept that Patriot I think introduced. This is easier to think about with hockey, where you have 6 defenseman, each getting ice time based on their (perceived) talent. When the #1 guy goes down, the other 5 get more ice time, and the 7th defenseman comes in to play with limited ice time. The #1 guy was replaced by a combination of the other 5, and this 7th guy. When the #6 guy goes down, the #7 slides right into place. The "replacement level" is higher for the #1 guy than the #6 guy, because of the chaining effect. In baseball, it's a little different because of the positions not being so interchangeable, and the talent depth not being so even at each position/team.


Redefining Replacement Level (June 26, 2003)

Discussion Thread

Posted 4:40 p.m., June 26, 2003 (#6) - tangotiger
  Using a running total of Nate's numbers, I'll add a 3rd column to Patriot's chart

YEARS...Repl-Nate...Repl-Tango... running-Nate/Tango
1 .76 .81 .80
2 .84 .86 .85
3 .88 .88 .88
4 .92 .90 .91
5 .94 .92 .94
6 .96 .93 .95

and it caps out at 1.00 at 11 years.


Redefining Replacement Level (June 26, 2003)

Discussion Thread

Posted 1:45 p.m., June 27, 2003 (#13) - tangotiger
  A nice quick simple function would be

repl rate = 1 - 1 / (2n + 4)
where n = years

You get the following

years, repl rate
0.001 0.75
0.25 0.78
0.5 0.80
1 0.83
2 0.88
3 0.90
4 0.92
5 0.93
6 0.94
7 0.94
8 0.95
9 0.95
10 0.96
11 0.96
12 0.96
13 0.97
14 0.97
15 0.97
16 0.97
17 0.97
18 0.98
19 0.98
20 0.98


Redefining Replacement Level (June 26, 2003)

Discussion Thread

Posted 6:25 p.m., June 27, 2003 (#17) - tangotiger
  I'm not sure why we need a "general" evaluation system.

A player does add PA-by-PA value. From that standpoint then, he comparison level is always the average player, since that is the environment he faces himself in. Average teammates, average opponents, average pitchers, average park, average everything. His value is derived therefore against the average player. The average player is worth "0" relative to average. The "0" win value corresponds to the average salary of say 4 million$ for a regular.

The problem is that you also have a time component. The more you play, the more you can keep somebody else out of a job (the bad guy hopefully). This is worth positive dollars. And the longer you play, the longer you keep somone out of a job. But if you are a .400 player playing for 15 years, then, you kept a GOOD player out of a job. That creates negative absolute value.

Anyone above the sliding time-dependent replacement level scale creates absolute positive value, even if he has negative relative value. If you drop BELOW the replacement level, the you've create negative ABSOLUTE value.

If we insist one scale as a general evaluation system, I'd say to use the .450 scale or the 90% scale, as this corresponds to roughly say a player's 3 or 4 year career. But I really like the time-depedendent scale.


Redefining Replacement Level (June 26, 2003)

Discussion Thread

Posted 11:05 p.m., June 27, 2003 (#22) - tangotiger
  I agree with Patriot that the starting point must be .500. You can argue that the final point should be something else like .450 or .400 or whatever.

But, the run values of all events, the win values of all events always presumes the actual context being played in. And the average actual context is average.

The reason that the HR has a 1.4 run value is because, given average teammates to get on base, average pitchers to hit against, average fielders, average park, average everything, we expect the HR to add, on average, 1.4 runs to the game.

So, .500 must be the starting point in actual game conditions.

Now, the next step is: "Well, how would the bottom of the barrel had done, given that average environment?" And, for one PA, you can argue that the bototm of the barrel is 80% of league average.

But, the argument here goes, that over the average career of an average 3-yr hitter is 90% of league average. THAT's the guy that you have to beat. It's not the 2-month guy who gets the emergency callup and who has no business being in the majors. It's that guy, that 3-yr guy, who's your bottom of the barrel. Anyone below that is keeping you from finding that 3-yr guy.

It's like if you have a company, and you suddenly got alot of work, and you need someone, anyone. So you hire some schlub. But, if you keep this schlub for 3 years, you know what happens? You start to lose business. You actually lose absolute dollars.

I don't see why the "emergency" baseline is the baseline we need to compare against all the time. Use that to compare against for an "emergency" time period (1 month). Use a different baseline for different needs.

I feel another 108 post thread coming...


Redefining Replacement Level (June 26, 2003)

Discussion Thread

Posted 1:01 p.m., June 28, 2003 (#26) - tangotiger
  I agree with Patriot's good explanation.

I also want to talk about "salaries".

Salaries have 2 components: value relative to average, and playing time. In terms of marginal dollars, each marginal win is worth about 2 million $ in salary. So, an average team (81 wins) will pay the average payroll (70 million$). If you are a player that is +1 win over average, you should get 2 million$ more than average.... but what average?

That's where playing time comes in. If the average regular gets 4 million$, then this player is worth 5 million$. If this guy did +1 as a backup, then maybe the backup average gets 2 million$, and this guy is worth 3 million$.

Enter replacement level. If this guy, who is +1 over average, he might be +1.5 over replacement (while a regular might be worth +3 over replacement).

Mutliply by 2 million/win, and you get his salary worth (3 million or 6million depending on the playing time).

This is just another dimension to replacement level, and one that is based on really the replacement level being about 80% all the time. One a one-year salary basis, I think this makes sense.

I'll be gone for a week, so have fun!


SABR 301- Win Probability Added (June 26, 2003)

Discussion Thread

Posted 7:12 a.m., July 2, 2003 (#4) - tangotiger
  I've only got a sec to reply to the first post.

No, I do not yet adjust for park or league or any other environmental condition (opposing pitcher or batter, etc). But I should.

To measure reliever's past effectiveness, you need some form of WPA, meaning inning/score/base/out at least, plus park/league preferably, and batter/fielders, too.

Wolverton does only base/out, which is also pretty good. For the moment, your best bet is to stick with Wolverton.

As for using LI, the thing is that that assumes that the pitcher is equally effective in all base/out, which is one reason you can't simply multiply it to Wolverton's. The other is that Wolverton already "LI's" the base/out, in effect. Therefore, I'd have to give you an LI based only on inning/score, for you to multiply by Wolverton's number.

You other option is to calculate a reliever simply by using "peripheral ERA", and then multiply by LI. That might get you part way there.



Bonderman and Age 20 (June 26, 2003)

Discussion Thread

Posted 12:36 p.m., June 27, 2003 (#3) - tangotiger
  I would also propose that you ues a "pitch-count estimator" instead of innings. Or at the very least, an estimate of BFP.


Estimating Pitch Counts (July 2, 2003)

Discussion Thread

Posted 10:12 p.m., July 2, 2003 (#5) - tangotiger
  bob, you are right, and I'm wrong. He's essentially giving out 2.8 pitches / non-K out (which as we know is too low). The intercept should go through zero, though.


UZR inter-positional linear correlations (July 6, 2003)

Discussion Thread

Posted 10:23 a.m., July 6, 2003 (#1) - tangotiger
  I think another good test is looking at players that switch teams, and see how they have their UZR change. I.e., Ventura next to Jeter and next to Reyrey, etc. Put all of the 3B UZR in one pile with good SS UZR, and the same 3B UZR in another pile with the bad SS UZR, and see if there's any relationship there.


UZR inter-positional linear correlations (July 6, 2003)

Discussion Thread

Posted 12:24 p.m., July 8, 2003 (#7) - tangotiger
  Mike,

That's very interesting! I looked at all players who played for multiple teams, and tracked their UZR/162 (weighted by the lesser of their games), to see if there was anything there for all players.

The biggest discrepancy was with Tor, followed by Bos, Det, Cin, ChA. On the flip side were teams who had the "advantage" go the other way: Min, LA, Ari, Tex, Bal.

This might suggest that the park factors employed by MGL are not good enough. Looking at Cincy specifically, there was 979 matched-games.

Here's the breakdown

team team diff G
CIN MIL 65 37
CIN COL 51 143
CIN MIN 36 106
CIN TEX 32 48
CIN SEA 30 281
CIN KCA 2 61
CIN PIT 0 115
CIN SDN -6 48
CIN TBA -11 140


In almost all cases, a player playing for Cincy had a higher UZR than playing for another team.

Here's the report from Texas:

team team diff G
TEX COL 25 145
TEX CLE 4 53
TEX KCA -2 100
TEX OAK -14 47
TEX SEA -15 382
TEX CHA -24 217
TEX ANA -25 47
TEX NYA -31 42
TEX CIN -32 48
TEX SDN -38 48
TEX DET -98 145


That's 1274 matched games. Here is the match on the last line, TEX/DET, making up 145 games:

Juan Gone went from +15 in Det to -7 in Tex, over 53 games in RF. Kapler went from +55 in Det to -21 in Tex in CF.

As for why this might be, it could be the park or the pitchers (or something wrong with MGL's programs). That is, MGL has tried as best he could to isolate the park and pitchers, etc, so that all we are left is the fielder's performance to measure. It could be that the "batted ball velocity" isn't doing the job, and that maybe Texas pitchers suck so bad, that it's not being measured properly in the data. Or that Texas is such a hard place to field at, that we can't adjust it well enough.

Cincy has the opposite problem that maybe their pitchers are much better at controlling balls in play than the league, or that their park is much easier to play at (and the park adjustment does not reflect that properly).

Note: I chose a cutoff point of at least 30 games played with both teams in the above analysis.

Here is the full report using all players, with no min cutoff:

team diff G
TOR 8 1861
DET 7 2046
ANA 7 1026
CHA 6 1333
NYA 6 2200
CIN 5 2647
PHI 4 1044
CLE 4 2012
SLN 3 1714
BOS 3 1832
TBA 2 2232
SEA 2 3751
HOU 1 1765
CHN 1 3468
MIL 1 2674
SFN 0 1685
FLO -1 1322
NYN -2 2912
SDN -2 3193
MON -2 1856
ATL -3 2501
LAN -3 1842
PIT -3 1261
COL -4 3005
OAK -4 1886
KCA -4 2072
BAL -5 1808
TEX -6 3548
ARI -8 1271
MIN -13 727


Essentially, this suggests that Tor fielders are 8 runs over compensated in UZR (if their opposing teams are league average... I'd have to adjust for this as well... that maybe the teams that the Tor played for were +4 or something... this is similar to strength of schedule analysis).


UZR inter-positional linear correlations (July 6, 2003)

Discussion Thread

Posted 12:26 p.m., July 8, 2003 (#8) - tangotiger
  The "diff" is difference in UZR/162.


UZR inter-positional linear correlations (July 6, 2003)

Discussion Thread

Posted 12:38 p.m., July 8, 2003 (#9) - tangotiger
  Ughhh.. "opposing teams" is really "the non-Toronto team that the Toronto player has also played for at the same position".... replace "Toronto" with whatever team you want.


UZR inter-positional linear correlations (July 6, 2003)

Discussion Thread

Posted 10:17 a.m., July 9, 2003 (#10) - tangotiger
  I re-ran the above to include the standard deviation. What I did was assume a 70% success rate (p), and a sample number of plays = G x 4. And each play was worth .8 runs. Taking Toronto as an example, we get that 1 SD (for plays) is SQRT(.3 x .7 / (1861 x 4) ). Taking that figure, and multiplying it by .8 gives us the SD (for runs) per play. Converting this into a /162GP figure, I multiply that value by 162 x 4. So, for Tor, 1 SD (for runs per 162 GP) is 2.8. Their figure is 2.91 SD from their sample mean.

Now, how did all teams do? Only 10 of 30 were within 1 SD, when we would have expected 20. 23 of 30 were within 2 SD, and all were within 3 SD. So, I do think there is some park bias going on.

Here's the data

team diff G 1SD SD
TOR 8 1861 2.8 2.91
DET 7 2046 2.6 2.67
NYA 6 2200 2.5 2.37
CIN 5 2647 2.3 2.17
ANA 7 1026 3.7 1.89
CHA 6 1333 3.3 1.84
CLE 4 2012 2.6 1.51
PHI 4 1044 3.7 1.09
BOS 3 1832 2.8 1.08
SLN 3 1714 2.9 1.05
SEA 2 3751 1.9 1.03
TBA 2 2232 2.5 0.80
CHN 1 3468 2.0 0.50
MIL 1 2674 2.3 0.44
HOU 1 1765 2.8 0.35
SFN 0 1685 2.9 -
FLO -1 1322 3.3 (0.31)
MON -2 1856 2.8 (0.73)
PIT -3 1261 3.3 (0.90)
NYN -2 2912 2.2 (0.91)
SDN -2 3193 2.1 (0.95)
LAN -3 1842 2.8 (1.08)
ATL -3 2501 2.4 (1.26)
OAK -4 1886 2.7 (1.46)
KCA -4 2072 2.6 (1.53)
BAL -5 1808 2.8 (1.79)
COL -4 3005 2.2 (1.85)
ARI -8 1271 3.3 (2.40)
MIN -13 727 4.4 (2.95)
TEX -6 3548 2.0 (3.01)



What value firemen? (July 6, 2003)

Discussion Thread

Posted 7:14 a.m., July 7, 2003 (#3) - tangotiger
  According to the Voros link I put up last week, the "playoff factor" will increase the marginal $/win to 2 or 2.15 or somethign / win. It's certainly possible that his study would be improved greatly by increasing sample size, and that maybe we'll find a 1.5 million$/win for the average team, and 2.5 million$/win for teams in the hunt.


What value firemen? (July 6, 2003)

Discussion Thread

Posted 9:29 a.m., July 7, 2003 (#5) - tangotiger
  Since the media has mangled his original definition of how to use a bullpen, I wouldn't be surprised if they've mangled anything else he has said on the subject.

All I can say is that Rollie Fingers, Bruce Sutter, and Goose Gossage were fine and did not suffer any psychological depression or anxiety attacks.


What value firemen? (July 6, 2003)

Discussion Thread

Posted 12:39 p.m., July 7, 2003 (#7) - tangotiger
  I agree that the biggest difference is quantity. As I showed in an earlier article, the leverage situation of Gossage and Sutter in the 9th was pretty much what Percy and Hoffman have been getting. The difference is that Goss/Sutter have received equally high-leveraged situations in the 8th and even 7th, innings that Percy and Hoff don't see.


What value firemen? (July 6, 2003)

Discussion Thread

Posted 8:56 p.m., July 7, 2003 (#10) - tangotiger
  Vinay, that 80/87 thing is very interesting! What's the breakdown for say Pettite?

jto: An "optimal" LI would be somewhere above 2.0. So, if the average RPW converter is 10 to 11, then a leveraged RPW converter would be about 5 for a top optimally used reliever. However, a topreliever gets an LI of about 1.7 for the most part, giving him a RPW of over 6.


What value firemen? (July 6, 2003)

Discussion Thread

Posted 11:12 a.m., July 9, 2003 (#12) - tangotiger
  The causal relationships would be the following
1 - more talent+experience leads to more payroll
2 - more talent leads to more wins
3 - more wins leads to more revenue

So, if we take say the Yanks or Redsox, they do #1 (get good players), and they pay for it. At the same time, those players will do #2, and generate wins. #2 leads into #3, causing more revenue.

The causal relationships you can combine #2 and #3 to form:
1 - more talent+experience leads to more payroll
2 - more talent leads to more wins which leads to more revenue

With #2, we know that when the talent adds 1 win, it adds 2.65 MM in revenue (of which say 1.85 will be redirected to the players).

If a team spends more initially, you are talking about #1, which is somewhat related (though not in direct causal effect) to #2.

I don't see how a team having a higher base of revenue, say like the Redsox, would be able to spend their disposable income any wiser than the Blue Jays.


What value firemen? (July 6, 2003)

Discussion Thread

Posted 3:12 p.m., July 9, 2003 (#14) - tangotiger
  Ah, gotcha.

Sure, I can buy that there may be other variables. I for one would have expected the increase to not be linear, but tied to the "fan base". That is, I would have expected that with more people to draw from that the Redsox increase in wins would be proportional to their fan base, rather than be as linear as every team. It seems that the only non-linearity (or at least the pronounced non-linearity) is tied into the playoff possibilities.

Until other variables are identified, the marginal wins x 2MM rule-of-thumb seems to be pretty good.



UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 9:10 p.m., July 7, 2003 (#4) - tangotiger
  ADP, you comment about high-school SS, or other sports (say like an avg NFL QB v avg NFL OT) are bang-on, and something that I've been talking about at fanhome for the longest time.

It's a given that the avg HS SS is better than the avg HS player at almost any other position (except for probably pitcher). So, if we were to expand MLB to 3000 teams, I think we can say the same thing. What if we only had 6 MLB teams? (The NHL had 6 teams for the longest time.) Well, I think we can see how maybe the avg LF or the avg SS in MLB might be better than at other positions. So, somewhere between 6 and 3000 teams, there's a balance where the avg player at each position is equivalent.

There's no reason that balance has to be at 26 or 30 teams. And really, we shouldn't even look for it. We also shouldn't expect that if you do find that the balance should exist for 30 teams that this should exist EVERY YEAR. Sometimes, the avg SS overall is better than average, or the avg 1B overall is better. Why balance to every year?

I think we can maybe accept that over a 25-year period, there would be a balance, but that's only to make our life easier. That trying to balance it to the last 2 or 3 runs might be not worth our while. And maybe that's true. But, to use a 1-year adjustment is just plain wrong.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 9:13 a.m., July 8, 2003 (#6) - tangotiger
  David, you are probably right that the avg 3B is probably better than the avg player overall (since I believe as a hitter, he is lg average).

I don't buy the "someone has to play that position", because you can make that argument about HS baseball too, or NFL. The avg QB gets paid far more than the avg at other positions. The avg HS SS is far better than the avg HS at most other positions. Every team needs a QB or SS.

This is useful not only for MVP discussion, but for how much to pay the player. The avg 3B, if the above analysis is correct (above average fielder, average hitter) should be paid more than the average player overall.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 10:33 a.m., July 8, 2003 (#7) - tangotiger
  I have revised my process. Please see article above, and page down to the "Revised" section.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 3:08 p.m., July 8, 2003 (#8) - tangotiger
  I revised the article again. In addition to the "Revised" section, look for the "Practical Application" section.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 10:29 a.m., July 9, 2003 (#10) - tangotiger
  Well, it is only based on 4 years of data, so just generally confidant. If you want to bump some position up or down .003 runs / play (about 2 runs / 162 GP), I wouldn't disagree with you.

I'm a little annoyed about the LF-RF thing not being closer. But with the largest matched-pair being LF-RF, of all the matchups, that's the one that we should have the most confidence in. As well, position-wise, there's really very little difference in playing LF or RF (though I suppose if you wanted to add a trait like "what's his UZR against LH/RH while in LF/RF, that would be a good thing).

Generally speaking, the values I presented are in-line with the fielding spectrum.

As for the Yankees comment, putting Bernie at LF and Matsui in CF should be done immediately. Switching say Jeter for Ventura is probably a net loss, knowing the traits of these 2 players specifically. Generally, you want your better fielder at SS and worse at 3B, but when one guy is 27 and the other is 35 (or whatever), speed becomes critical. Putting say Soriano at LF and Bernie at 2B I think might also be a positive (after some time).

As for the process, what I did was take UZR / 162 GP and turned that into UZR / play. (using 4 plays per game for 3b, 3 plays for lf,rf, etc...). That gives you this list that I published
6 +.011 (6)
4 +.009 (5)
5 +.007 (4)
8 +.005 (4)
7 -.016 (3)
9 -.021 (3)
3 -.025 (2)

Then, that final table was generated using only the above figures (difference in runs / play, and number of total plays / 162 GP). If you look at the Hubie column, those would be the neutral "posistional adjustments".


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 5:14 p.m., July 9, 2003 (#11) - tangotiger
  I'm looking at the LF-RF mulitple positioning, and I split it up into "Primary LF", "Primary RF", "Neutral LF/RF", with the split being that the primary position is at least 50% more games than the secondary position. In each matchup, there was between 1100 and 1700 games, which is pretty good.

Anyway, when the player whose position was a primary LF got moved to RF, this was the change:
- 7 runs to +2 runs.

So, right away, we know that the guy who moves from LF to RF is a below-average LF, and he becomes above average at RF.

What about the primary RF who moves to LF? He was +1 run at RF and +2 runs in LF. So, an average RF gets moved to LF, and he continues to be around average. In fact, he looks slightly better.

And what about the players with no primary positions? They were -2 runs in LF and -4 runs in RF, meaning that there was tougher competion in RF (and that the guys with no primary position were below average).

It seems that the next breakdown I should do would be based on:
primary LF - above average
primary LF - average
primary LF - below average
... how does each do when moved to RF?

Repeat for primary RF moving to LF, and non-primaries.

Not sure when I'll get to it, but I also want to repeat this for the 2b/ss, the other natural comparison point.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 10:11 a.m., July 10, 2003 (#12) - tangotiger
  Breaking down by Primary, Seconday and Neutral positions adds an interesting layer.

Let's look at the IF, starting with SS as the primary position. I will present the UZR runs / play at each position, relative to the league average of that position.

SS/2B (823): 0.000, -.004
SS/3B (647): +.001, +.008

So, what does this mean? In the 823 games where our player played at SS and 2B, with SS being the primary position of the player, he performed at league average level at SS, and slightly BELOW league average at 2B. Since we "know" that the average 2B is a worse fielder than the avg SS, we would have expected that our SS would have performed better at 2B. He didn't. Sample size is an issue. The other factor would be experience, that there's a natural dropoff in performance when playing at a secondary position. On the other hand, looking at SS/3B, we get pretty much what we expected. We have essentially an average SS performing very well at 3B.

Let's look at the 2B as the primary position
2B/3B (689): +.002, +.001
2B/SS (710); +.013, +.001

We see that the guys who shift from 2B to 3B are slightly above average fielders as 2B. They pretty much maintain that same level of performance at 3B.

The next line is the very interesting one. Of the players whose primary position is 2B, and they were asked to play SS, they were the cream of the crop, with a very very above average +.013 runs / play at 2B (the equivalent of +11 runs / 162 GP). When they played at SS, they performed essentially as the equivalent of a league average SS.

Finally, with 3B as the primary position:
3B/2B (638): +.022, -.008
3B/SS (513): +.017, +.007

The 3B to 2B shift is only done with very good 3B, and they end up performing below average at 2B. Interestingly, the same level of 3B also gets moved to SS, but he performs at an above average level at SS. That is, there is more of a dropoff going to 2B than to SS, even though the base talent level he is being compared to is higher at SS than 2B.

Wheh! That's alot of inconsistencies on the surface. With a sample of 670 games, at 5 plays per game, and .8 runs / play, one standard deviation is .006 runs. Essentially, I really don't have the sample size here to say anything with confidence. So, what I'm about to say in the rest of the article, I'm saying it without the numbers actually supporting me.

The SS to 3B move indicates that the avg SS is .007 runs / play better than the avg 3B. The 3B to SS move indicates a .010 runs advantage for the SS. Splitting the difference, and it works out to a .009 advantage. We can even say that the difference between the .009 and .007 is the "experience/familiarity/similarity" factor.

Doing the same process for SS to 2B (.004 advantage to 2B) and 2B to SS (.012 advantage to the SS), and we can say that the SS gets a .004 advantage (with that whopping .008 difference being the familiarity factor).

The 2B to 3B (.001 advantage to 3B) and 3B to 2B (.030 advantage to 2B) implies a whopping .015 advantage to the 2B, with the largest familiarity factor of the bunch as well. Essentially, it's tough to ask the players to switch between 2B and 3B.

Listing this mathematically, and we have:
3B + .009 = SS
2B + .004 = SS
3B + .015 = 2B

Trying to force a best-fit equation gives us the following spectrum among these three players:
SS: +.004
2B: +.004
3B: -.008

I think what this exercise shows is two important points:
1 - Sample size, sample size, sample size
2 - That the primary/secondary component is critical, since the familiarity/tools aspect comes into play. Specifically, the 2B/3B tools are different enough, such that the experience level to leverage those tools conspire to bring the whole house down.

Anyway, I don't think I can do much more without a large enough sample.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 10:14 a.m., July 10, 2003 (#13) - tangotiger
  By the way, based on the level of talent that is moved around, this supports what we already knew: the fielding spectrum is SS/2B/3B. Going from SS to 2B to 3B, the talent level is much less than going from 3B to 2B to SS.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 1:07 p.m., July 10, 2003 (#14) - tangotiger
  I had this laying around, so I thought it would be interesting to look at too.

Off LWTS by position, both leagues, 1989-2001.


Pos LWTS
ss -13
c -10
2b -6
cf -1
3b 0
lf 7
rf 9
1b 17


This looks pretty similar to the "Hubie" column. Let me put them up, side-by-side:

Pos...OffLWTS...FieldingAdj (Hubie column)
ss... -13 -11
c... -10 ???
2b... -6 -7
cf... -1 -3
3b... 0 -2
lf... 7 7
rf... 9 10
1b... 17 6

The first column is the actual Offensive value. The second column is how Hubie Raines would play, relative to lg average, at each position.

As you can see, the big difference is at 1B. I mentioned in the article that there's only so much damage a fielder can do at 1B, since the opportunities for damage are less. (Even more so at DH, where the fielding value of Hubie, relative to an average fielder at DH is ZERO.)

If you take col 1 and subtract col 2, you get the overall value at that position. The 1B has a sizeable advantage in this regard (i.e., we put too much penalty on the 1B fielding value.) In pretty much all other cases, it seems that the managers have properly balanced the fielding to their hitting.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 10:28 p.m., July 11, 2003 (#16) - tangotiger
  Excellent post David! I pretty much agree with the sentiment.

I also think the key is to look at the replacement level to help us out. However, I think this applies far more in NFL than MLB. What does the best QB and the best tackle not in the NFL have in common? They are both worth exactly 0 dollars to the NFL team. AND they (pretty much) have no other position to play except QB and tackle, respectively. Therefore, for the NFL, I have no problem setting a player's worth compared to this best player at his position not in the bigs.

In baseball, CF/LF/RF are very much related, as are SS/2B/3B. And the pool to draw from 1B is the largest. C might be a unique position. P is. (And guys who can't make it as an SS/2B/3B can still make it as OF.)

Therefore, I think it's hard to find that replacement level by position. Which is why the multiple-position player analysis comes in handy. It's a built-in conversion factor between positions that you can chain for the whole spectrum. But, as we've found, going from a primary position to a less difficult secondary position is not always advantageous. This is because it's not THAT easy to make the switch (at least insofar as 2b/ss/3b is concerned).

I agree that what I think we will find is that if we use a long-term offensive-based positional adjustment (OPA), we'll be better off. 10 or 20 years or something.

I would NOT do like some people do and use a 1-year OPA.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 11:08 a.m., July 30, 2003 (#17) - tangotiger
  Someone sent me an email regarding figuring out position-neutral UZR. Here is my response:

===============================
For example, if you go to the bottom of [the above article], you'll see my hypothetical neutral fielder called Hubie Raines. This fielder, at SS, would be 11 runs below the MLB avg SS. Put him in RF, and he's be 10 runs above the MLB avg RF, etc, etc.

So, in terms of trying to get his neutral fielder rating, and assuming all you know is that he's 11 runs below the MLB avg SS, this translates to "0" relative to ALL fielders.

Trying to look for a best-fit equation, I would then say that
UZR(neutral) = UZR(SS) x 0.667 + 8.

So, Jeter, if we think that he's really a -18 UZR at SS would come out as -4 as a fielder at a neutral position.

Doing the same for 1B: UZR(neutral) = UZR(1B) * 2 - 16

So, if you've got a top 1B who is +10 UZR relative to the avg 1B, he'd come in at +4 as a fielder at a neutral position.

We can of course can't take it to this extreme. There are certain tools of a 1B that just won't translate to SS, and vice-versa. Zeile could be a +10, or more at 1B, but the tools that he can hide at 1B would be exposed at SS. And what he can leverage at 1B might not be exploited at SS.

It is better to think of "what would an average fielder do at this and that position". From this standpoint, now you can compare Jeter's -18 to Hubie's -11. (Jeter is 7 worse). And you can compare Zeile's +10 to Hubie's +8. Zeile is 2 better.

You've given them the same comparsion baseline player, without getting involved with actually figuring out how to move Zeile or Jeter around.

It allows you to do these neutral positional comparisons, without losing the argument (which you would) about moving Zeile to SS or Jeter to 1B.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 3:20 p.m., August 4, 2003 (#18) - tangotiger
  Having said what I said above, and all that still holds, for those who wish to compute a UZR(neutral) for all positions, here's the best-fit equations:

Pos...slope... intercept
3 2.0 -16
4 0.8 6
5 1.0 5
6 0.65 8
7 1.3 -10
8 1.0 3
9 1.3 -13

So, for seconbase, it would look like this:
UZR(neutral) = UZR(2B) * 0.8 + 6

A 2B who has a -7.5 UZR as a secondbaseman would be considered an "average" fielder overall.

Again, I'll repeat it. This does not mean that all 2B who are -7.5 would be average at a neutral position. It's just a neat little way to say that an average fielder at a neutral position would be a -7.5 if he played 2B.

You'll end up with weird numbers, if say your 1B was +15 runs above the avg 1B. By this process, this 1B would end up as +14 relative to an average fielder at a neutral position. Hard to believe you can have a 1B that good. But, in reality, what I'm saying is that an average fielder at a neutral position would be +8 as a 1B. So, if you do have a 1B who was +15, then he must have really really leveraged his skills that added an extra 7 runs. What kind of skill? Maybe his height or his scooping or his charging the plate or whatever... things that you essentially not come into play anywhere near as much at a neutral position. And he was able to hide his lack of speed maybe, a skill that would be quickly exposed at a neutral position.

Hope all that was clear...


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 7:28 p.m., December 18, 2003 (#19) - tangotiger
  I'm going to make this thread "Required Reading of the Week". The article has pertinent information in trying to compare fielders at different positions. Please read the article and all posts herein, prior to commenting.


UZR, multiple positions (July 7, 2003)

Discussion Thread

Posted 10:59 p.m., December 18, 2003 (#21) - tangotiger
  I had forgotten that I did work on the primary/secondary position thing (post #12). It's definitely a critical component. As soon as I get my hands on the 2003 UZR, I'll update all this.



Ruane - Cost of outs, and speed (July 9, 2003)

Discussion Thread

Posted 11:41 a.m., July 10, 2003 (#4) - tangotiger
  I showed a few months ago the GIDP rate of Rickey, Coleman, and Raines, it was much less than the league average. Something like if the league average was around .115, these guys were at the .060 to .080 level. Can't remember though.


Ruane - Cost of outs, and speed (July 9, 2003)

Discussion Thread

Posted 1:23 p.m., July 10, 2003 (#5) - tangotiger (homepage)
  The above link has the data I mentioned.


Fewest BB / PA, since 1947, min 150 PA (July 10, 2003)

Discussion Thread

Posted 2:39 p.m., September 29, 2003 (#4) - tangotiger
  Wells ended up the season with 20 walks, in 213 IP, while facing 887 batters.

After his incredible start, Wells ends up with 23 walks per 1000 batters. Still great, but not the record.



SABR 201 - Should we non-sac bunt more? (July 10, 2003)

Discussion Thread

Posted 1:15 p.m., July 11, 2003 (#4) - tangotiger
  For a great hitter, the break-even point is much higher.

If we look at the bases empty, 1 out situation, here are the run values for all events:
1b .27, 2b .40, 3b .65, hr 1.00, bb .27, out -.18 (for 1999-2002).

Let's assume you have a .333/.440/.675 hitter, what's his run value / pA? In this case, it works out to +.08 runs / PA.

That makes the breakeven point for this hitter to non-sac bunt at .500 (assuming he doesn't miss his bunt, and is now down in the count).

How often would Bonds, Pujols et al have to bunt to keep the fielders "honest". I think the fielders are giving that up, that risk that he might bunt, by playing them the way they do. It would probably cost them too much to do otherwise.

But for your Vizquels, the bunt is a good weapon.


Aaron's Baseball Blog - David Wells (July 10, 2003)

Discussion Thread

Posted 9:18 p.m., July 11, 2003 (#2) - tangotiger
  Aaron is the Santana of baseball writing.



SABR 201 - Custom Linear Weights (July 11, 2003)

Discussion Thread

Posted 7:42 a.m., July 13, 2003 (#2) - tangotiger
  Actual totals from 1974-1990.


SABR 201 - Custom Linear Weights (July 11, 2003)

Discussion Thread

Posted 2:44 p.m., July 17, 2003 (#5) - tangotiger
  It might be unclear, because it sounds like the same thing.

If the RE in state1 is .60 and the RE in state2 is .90, and a walk will ALWAYS get you from state1 to state2, then it's value is .30.

To expand, there are 24 states to consider, and therefore, 24 (or more) transitions to consider. Each of the 24 start states will give you a run value for the walk (LWTS by the 24 base-out states). The weighted average of the walk (frequency of each walk by start state) will give you the LWTS value of the walk.

I hope that was clear?!?


Workshop on Pitch Counts (July 14, 2003)

Discussion Thread

Posted 2:22 p.m., July 14, 2003 (#1) - tangotiger
  Just to qualify the "regardless of time period". I mean, as long as the time period has the modern day rules (4 balls, 3 strike, 2-strike foul), then the model should work. I suppose that this should apply to minor leaguers and college too.



GIDP (July 14, 2003)

Discussion Thread

Posted 4:00 p.m., July 15, 2003 (#2) - tangotiger
  Yes, that method would be better, so that at least we could control somewhat for the fielder influence.

Also comparing different pitchers on the same team would help too.



Previous DIPS research (July 22, 2003)

Discussion Thread

Posted 3:55 p.m., July 22, 2003 (#1) - tangotiger (homepage)
  This is also a good link.



SABR 301 - Regression towards the mean (July 22, 2003)

Discussion Thread

Posted 10:10 a.m., January 14, 2004 (#1) - tangotiger
  I'm bring this back, as a blast from the past. Let's make this thread the "required reading thread of the week".


SABR 301 - Regression towards the mean (July 22, 2003)

Discussion Thread

Posted 5:10 p.m., January 17, 2004 (#4) - Tangotiger (homepage)
  The above is another good link.


Chances of making the playoffs (July 23, 2003)

Discussion Thread

Posted 3:25 p.m., July 24, 2003 (#6) - tangotiger
  If all teams were .500, then your guess is pretty good. Dackle is showing a 55% chance of winning the division in that case, looking only at the lead.

If you use dackle's expectation of the true talent of that team and its opponents (which I don't), then you get a much higher value.



Sabermetric Site to Visit - ESPN (July 25, 2003)

Discussion Thread

Posted 3:31 p.m., July 29, 2003 (#3) - tangotiger
  Well, you can visit my site for my individual work, and that's been categorized.

For a more all-encompassing, you can go to baseballstuff.com , and I believe that James Fraser or Jim Furtado's sites (see links on the right on that site) has a pretty good list of such topics and categories.

My suggestion, to those who really want to do this, and it'll be alot of work, is to be an "editor" on the open source directory, specifically for sabermetrics, and build the thing directly there. The value is that many search sites use that open source directory (including google I believe) to build their search engines.


Sabermetric Site to Visit - ESPN (July 25, 2003)

Discussion Thread

Posted 11:10 a.m., July 30, 2003 (#5) - tangotiger
  Yes, my idea was simply to have a set of links, just like John Skilton's site, but better (with categories along the lines of what I have on my site, and expanded).

If you want to create such an index, and if the open-source directory that Google uses doesn't want it, I'd be happy to post it here.


Competitive Balance (July 25, 2003)

Discussion Thread

Posted 11:46 a.m., July 26, 2003 (#4) - tangotiger
  Thanks guys...

One thing that I would want to do as well is to add the "expenses". The cost of doing business in NYC is alot higher than in a typical city, so the potential net income, aside from salaries, is not as great as the revenue would suggest. This is complicated, especially if you try to figure out a cost for the ballpark.


Competitive Balance (July 25, 2003)

Discussion Thread

Posted 2:38 p.m., July 29, 2003 (#7) - tangotiger
  What the Voros equation shows is that aside from winning, that all of a team's base revenues is inherent.

The only way to increase revenues (long-term anyway) is to put a product on the field that performs better than the competition.

Any short-term gimmick ("Come see our prospects!", "Come see our European players!", "Come eat our free food") will not impact revenues for an extended period of time.

(This is just like stocks where fundamentals will drive the price of the stock long-term, but the technicals will set the pace short-term. Fundamantels in a stock, its expected future earnings, is equivalent to the talent level of a team, its expected future wins.)

The one doubt I have is the linear relationship between wins and revenue (except for that curve at the end for the playoff-bound teams). I really expected something like an extra win would result in an increase in 2% revenue (that is, proportional to the base revenue), instead of the linear relationship Voros is presenting. This is what Palmer reported in the Hidden Game, and this is what I found in a very very quick look last year (where Palmer and I only looked at attendance, which is our best stand-in for revenue). And this is contrary to Voros' more detailed, but smaller sampled, study.


Competitive Balance (July 25, 2003)

Discussion Thread

Posted 2:40 p.m., July 29, 2003 (#8) - tangotiger
  If not proportional to its base revenue, at least proportional to parts of its base revenue. And this may be where Jim is heading about a team being able to increase its revenues. I think they can only leverage a certain part of their base components, but they can't do anything, unless that leverage is based on the team winning.


Competitive Balance (July 25, 2003)

Discussion Thread

Posted 4:29 p.m., July 29, 2003 (#10) - tangotiger
  Sure thing.. you are talking about making an investment. Let me spend 200 million$ on a new stadium, or let me spend 5 million$ on new luxury boxes, or let me spend 500,000$ on a new scoreboard. I can drive more people to come to the game, and that will drive more revenue to my team.

I can choose, or not, to spend those revenues on my team to sustain that increase in attendance.

So, yes, there is an extra way to get more revenue. You can either get more revenue by getting more wins from your players (which may or may not cost you more money), or you can get more revenue by paying for it now (spend 10 million$ in expenses, and hope to get 1 million$ in revenue each year for the next 20 years, which discounted at a certain rate might be worth 11 or 12 or 7 million$ in revenue). If I can leverage it right, spending that money can generate more revenue.



Bats Right, Throws Left (July 29, 2003)

Discussion Thread

Posted 11:15 a.m., July 30, 2003 (#2) - tangotiger (homepage)
  The list of bat R and throws L was simply to get a list of "great players" with that weird combination of handedness. The easiest listing I used was Runs Produced (though I don't subscribe to the name, I do subscribe to its idea). I could have used just runs, or just games, or just hits, or whatever. Wouldn't have made a difference.

As for subtracting the HR from R+RBI, I suggest you click the Homepage link above, and read the discussion I have with a few of the fanhome regulars. (Unfortunately, some of the data presented was lost in transition.)



Open Directory Project - Sabermetrics (July 30, 2003)

Discussion Thread

Posted 6:06 p.m., July 30, 2003 (#3) - tangotiger
  I agree.

The suggestion here is for Sylvain to do the work of coordinating all of the material, based on the help from Primates wishing to contribute their favorite links.

After Sylvain has gather, sorted, indexed, and done his stuff, he can go to the editor, or become the editor, of the Sabermetrics directory at dmoz.org.

And, if they don't want him, or they take too long, Primer can be the official source to house the index.


Leveraged Index (LI) - by the 24 base-out states (July 30, 2003)

Discussion Thread

Posted 1:07 p.m., July 30, 2003 (#1) - tangotiger
  If someone wanted to come out with a clutch index, and if you didn't want to use the Linear Weights by the 24-base out states to do it, then using the LI is a good stand-in.

In essence, figure out every player's OBA and SLG by the 24 base-out states. Multiply that figure by his LI x lgPA for that base out state. Add up your totals and divide by the sum of lgPA.

Compare that figure to his overall figure. Voila. Clutch Index.

(Again, using OBA and SLG is problematic, especially since they don't have the same denominator. And I would include SF in the SLG calculation. But like I said, to do it right, use LWTS by the 24-baseout states.)


Leveraged Index (LI) - by the 24 base-out states (July 30, 2003)

Discussion Thread

Posted 12:37 p.m., August 1, 2003 (#6) - tangotiger
  Probably because with a runner on 3b and 0 outs, he's always assured of scoring (not much leverage there), while with 2 outs, he's got a 30% chance of scoring. So, there's a big swing possibility there.

If a reliever comes in to the 9th inning with a 3 run lead, he's almost assured of winning the game (not much leverage there), but coming in with a 1 run lead, there's a big swing possibility of winning or losing the game.

Leverage is about swing possibilities, and nothing else.



Baseball Prospectus - Small sample size (July 30, 2003)

Discussion Thread

Posted 12:49 p.m., August 1, 2003 (#2) - tangotiger
  What would have been better would have been to breakdown by GB, Flys (and ignore pops and liners). The 1B being there or not would not affect the flyball out rate, and vice versa for the CF and groundballs.

If you really wanted to do something cool, extend that for multiple years. How did Giambi's teams throughout the years do on groundballs when he was there, and when his backups were there? Problem is that you might not get enough "backup" games to get any meaningful results.

Like I said, it's a fun exercise, but limited to the sample size.



Calculating Relative Stats (July 30, 2003)

Discussion Thread

Posted 6:09 p.m., July 30, 2003 (#1) - tangotiger
  In that new McGwire example, with Coors affecting the medium/long flyballs, he gets 173 long flys, and 82 HR.



Forecast 2003 - Interim Results (July 31, 2003)

Discussion Thread

Posted 10:10 a.m., July 31, 2003 (#1) - tangotiger
  The deltas were calculated as follows, and I'll take Bonds as an example:

Baseline: Bonds = 1.231, lg=0.760, comp=+0.471
Current: Bonds = 1.240, lg=0.757, comp=+0.483

delta = Absolute value of (+.471 minus +.483) = .01

We can do it as a percentage as well, but I think this will do for now.


Forecast 2003 - Interim Results (July 31, 2003)

Discussion Thread

Posted 11:54 a.m., July 31, 2003 (#3) - tangotiger
  standard deviation, in the above order:

hitters: .094, .099, .115
pitchers: 1.00, .83, .83



10 runs = 1 win? (August 1, 2003)

Discussion Thread

Posted 4:57 p.m., August 7, 2003 (#2) - tangotiger (homepage)
  Every 10 marginal wins will add 1 marginal win.

So, you start with a team of 4.5 RS and 4.5 RA. What's their win%? .500.

Now, you have a team with 4.6 RS and 4.4 RA. What's their win%? If you use PythagaPat, it estimates .521. You can also try running the Tango Distribution (see above site), and you get .520.

Try 4.8 v 4.2, and you get .563 with PythagaPat and .561 with TD.

Try 5 v 4, and you get .604 with PP, .601 with TD, and .600 in real-life.

The equation: W% = .500 + (RS-RA)/10 will lead to a similar answer to the above.

It is a nice shorthand, but for more rigorous, I suggest the PP or TD processes.



John Jarvis SABR presentation on the IBB (August 1, 2003)

Discussion Thread

Posted 2:00 p.m., August 1, 2003 (#2) - tangotiger
  Excellent point on the "rolling-forward" concept.

I have my own method of handling the IBB issue, and it has similar variables to Jarvis'. I do not use SLG or BA, but rather a tailored version based on the 24 base/out state that we are in (much less when looking at the IBB situation). I would definitely use the LWTS by 24-base out states, rather than SLG and BA.

I'm also disturbed that Jarvis only looks at the recent performance to establish the player's performance level, because the regression deemed that to be the best fit. What you should use is the best estimate to a player's true talent level. Maybe at a high group level, the 2-week performance does the job, but at an individual player's level, it definitely can't.

All in all, I'd have to give this thing a few more readings.


Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 12:58 a.m., August 2, 2003 (#30) - tangotiger
  Year-to-year correlation, pitchers, from 1969-2002

minPA HR/PA BB/PA K/PA bipH/PA n
500 0.20 0.45 0.61 0.22 2814
200 0.13 0.36 0.56 0.17 6690

bipH: H-HR
n: number of pitchers

Seems to me that if HR/PA and bipH/PA are pretty close, we'd expect (2b+3b)/PA to also be close.

Using /PA or /BIP or /(AB-SO), I don't think, will make much difference.


Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 9:46 p.m., August 2, 2003 (#41) - tangotiger
  I was a little surprised by my results, esp since the HR is so related to the park too.

However, Tippet, is a much larger study, shows the "r" of the HR to be much higher.

Lots more work to do.


Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 4:19 p.m., August 3, 2003 (#44) - tangotiger
  Please note that every size sample comes with its own error range.

As well, the different environment almost forces you to start separating by era.

It could very well be that Tippett's findings and my findings are consistent from being drawn from the same population, and the differences can be explained to chance. I don't know that. But, as with everything, we should always report the margin of error.

Also note that we did not use the same denominator.

And, we should also try to control for change in home team.

I don't expect to do much work in the near future, but if I do, I'll report whatever I find.


Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 10:00 a.m., August 4, 2003 (#49) - tangotiger
  Voros is an excellent sabermetrician. Fanhome has a great group of guys who post consistently with the same handle (which is as close as you'll get to a signature.)


Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 10:50 a.m., August 4, 2003 (#53) - tangotiger
  By the way, I once dared ask if Babe Ruth would have been an average hitter today, and I went through a rigorous thorough analysis of how he would hit today. And based on the assumptions and analysis, of which NO ONE questioned, I was forced to conclude, at that time, that Ruth would have been an average hitter today. My conclusions were questioned, but instead of going back to where I went wrong, people came back with other arguments.

Afterwards I realized that my error was in not handling regression towards the mean (of which I was not aware of at that time), the most important concept any sabermetrician should be aware of.

I'd hate to think what people would have called me for making the first statement that Ruth might have been an average hitter today, while not recognizing that I corrected myself in further research.

This is why we have peer review after all, something that Voros was extremely open to, publishing virtually all of his findings. This is a far cry from some sabermetricians who will only publish their results (which I do time to time).

Now, Tom Tippett was very thorough and honest with everything he's done, issuing corrections and addendums to his conclusions, based on new material that he's been made aware of. He's opened himself up to peer review, and so, we should try to further his findings.

So, can we get back to the matter at hand? Let's talk about the topic, rather than people's opinion of the topic, and people's opinion of people's opinion of the topic.


Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 12:11 p.m., August 4, 2003 (#56) - tangotiger
  When you post under a different handle than what you normally use, the point is exactly for the reason you are specifying: to protect your good name, from the darker but still real thoughts that you have. If it's not cowardice, it's dishonest.

(If it's not cowardice or dishonest, what in the world would you have to do to be a coward? It's like Reagan saying "no, that's not a threat... that's a promise". No, it's a threat, because there's nothing worse that you can do to make that "promise" a threat.)

If I want to say that Bill James "just doesn't get it" about linear weights, and that "Runs Created is dead", those are pretty harsh words, and I'll stand by them. For me to say that with a big over my head... what does that do?

And before anyone asks, Tangotiger is my handle or nom de plume, one that I use with the same thought and respect that I would use with my real name. For privacy's sake, I don't give out my real name. (I figure Mark Twain and Dr. Seuss don't have a problem with nom de plumes, and those "names" are much more valuable and well-known than Samuel Clemons and Theo Geisel.)


Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 12:28 p.m., August 4, 2003 (#57) - tangotiger(e-mail)
  And that's my last comment on "etiquette". If anyone else wants to talk about this issue, send me an email, and I'll open up a separate thread, where those who want to continue this, will have their chance.


Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 3:42 p.m., August 4, 2003 (#67) - tangotiger
  I was thinking of calling the book "Tangotiger presents [The Book] by Mitchel Lichtman".

Baseball Prospectus doesn't put bylines to most of what they write in their own book. Maybe the "brand" of BP or the "brand" of Tangotiger is good enough. Maybe I can spin off magazines like Oprah and Rosie. Anyone want to design a "T" logo?

Seriously, I'll probably "come out" by that time... not that there's anything wrong with the way things are now.


Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 9:57 a.m., August 5, 2003 (#70) - tangotiger
  Triples, relative to doubles, is almost definitely due to: speed, handedness, park, OF fielding.

How do we know? Well, the age progression of triples/(2b+3b) is almost exactly the same as sb/(1b+bb). Triples is easily an artifact of speed.

Other variables, like if the batter's handedness (more balls hit to RF) and the park would explain some of the individual variations.



Extended Pitch Count Estimator (August 4, 2003)

Discussion Thread

Posted 9:10 a.m., August 6, 2003 (#2) - tangotiger
  Bob, let's take it backwards. If let's say you construct a model where a pitcher, with every pitch, will do the following to a batter:
make contact for a fair ball: 90% of the time
others (fouls, swing and miss, or called ball/strike): 10%

I think it's easy to see that at some point, that 10% will eventually lead to a walk or a K. So, what value for "others" would I have to set to guarantee that the PA will end with a fair ball? Mathematically, I have no choice but to set "others" to 0%.

Now, this is not exactly how baseball works. Every count offers the pitcher/batter matchup different expectations for the batter to get a fair ball. And in fact, every individual pitcher/batter/count matchup has their own rates.

But, to the extent that I want the BIP rate to be 100%, I have to put the pitches/PA at 1. We also know that the league average is around 3.7 to 3.8 pitches when the BIP rate is around 73 to 76%. So, we have that fixed point. We also have the two sample points provided by Randy Johnson (57%, close to 4 pitches/PA) and Brad Radke (81% and close to 3.5 pitches/PA).

So, given those 4 points (and some other behind-the-scenes work that I'm doing), those equations come out. It maxes out to 3.5 pitches / BIP, 5.5 / BB and 4.9 / K, though I suspect that a pitcher with very high BB and K would probably throw more pitches than I'm showing here (will go to 3-2 more than someone else). It may be that I also need a function for BB that is based on the BIP rate AND the K rate.

In any case, I think what we have here is a reasonable basis to further the research on estimating pitch counts.


Extended Pitch Count Estimator (August 4, 2003)

Discussion Thread

Posted 1:05 p.m., August 6, 2003 (#4) - tangotiger
  Bob, what you are saying is that the penalty for the walk is so high that a pitcher would rather finally give a fat one down the middle than to give up a walk.

That a pitcher/batter will work the plate to the point where the pitcher can throw at the corners and risk a ball, or a batter will wait for a fat one and risk a strike.

But, don't forget there are also swinging strikes. If the pitcher has the batter 0-2, it's very possible therefore that he can swing and miss, and thus the strikeout.

What you are saying is that the batter has the chance of swinging and missing on 0-0 and 0-1, but that he cannot at 0-2. Is that because the price of the strikeout is so high that he just absolutely has to get the ball on the bat, rather than take a big cut and maybe get a solid hit on it?

In the "real-world", there are a few pitchers that I estimate had 95% BIP rate (back to the early 1900s). From that standpoint, the pitches / BIP checks in at 3.0. Even at 99% BIP rate, I'm estimating 2.7 pitches / BIP.

So, from 99% BIP to 100%, I'm going from 2.7 to 1.0. I really have no reason to make it go to 1.5 or 2.2 or anything else.

I agree, it's an interesting scenario, but one which we probably don't need to worry about, as you are also saying.

(There were 2 pitchers in the beginnings of baseball with no walks and Ks and at least 150 PA, but they used different rules for balls and strikes.)



DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 3:36 p.m., August 5, 2003 (#1) - tangotiger
  Lowering the bar to at least 250 PA in consecutive years, and you get the same order of results (all r are about .05 less than the above ones).

Event r
K 0.74
BB 0.61
1B 0.40
HR 0.30
XBH 0.22

1bBIP 0.18
2bBIP 0.17


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 4:21 p.m., August 5, 2003 (#6) - tangotiger
  You know what's even more weird? The year-to-year correlation for single/BIP and (2b+3b)/BIP for that second class was .18 and .17, right? The year-to-year correlation for (1b+2b+3b)/BIP was .15.

I think, though I'm not sure, that this must imply some negative relationship between 1b and 2b+3b. This may be due to the GB/FB tendency of the pitcher (FB pitcher has more xbhits and outs, than a GB pitcher).

(For the 500 class, those numbers are .25, .21, .20)

As for what it says of DIPS, there's no change. The year-to-year r is .20 as has been reported by many people many times with very different data sets. It's still our best guess that if a pitcher has a (1b+2b+3b)/BIP rate of .320 and the league is .300, then a pitcher's "true" talent, based on the BIP, is .305 (80% regression towards the mean, or 1-r). This applies to pitchers with 500 to 1200 PAs.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 4:22 p.m., August 5, 2003 (#8) - tangotiger
  I think I do agree that fielding and park may play a much larger role on flyballs than ground balls, and therefore, we'll see the pitcher's influence relatively less.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 5:22 p.m., August 5, 2003 (#19) - tangotiger
  I really gotta go run, but here you go. I broke up the pitchers with at least 250 PA in both years into "GB" and "FB" pitchers.

I ran the correlation only for the xbhBIP category. The FB pitcher's year-to-year r was .10, while it was .19 for the GB pitchers. Seems to me that park and OF fielders play a big part here.

Ok... I'll do the same for 1bBIP: .13 for GB pitchers, and .15 for FB pitchers. Again, makes sense.

Sorry, but I don't have the breakdown by FBhits, GBhits, though that would be very useful.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 8:49 a.m., August 6, 2003 (#30) - tangotiger
  We are not trying to establish if a pitcher has a skill, even though I and others are saying that, when we look at the year-to-year correlation.

What we are really saying is "does this particular metric correlate well year-to-year.... and if it does NOT correlate well year-to-year, then we should not be using it as a basis to predict the next year's metric".

So, if we replace the "ability" talk with the "metric's persistence", I think we'd be more accurate.

So, regardless of the extent to which a pitcher has a skill at preventing hits on balls in park, we are saying that:

[Official quote]

the metric "hits per ball in park" has an r of about .20 among pitchers with 500 to 1200 PA, and therefore we need to regress that metric heavily (80% for the group, which may not necessarily apply to the individuals to the same figure), if you want to predict next year's metric.

Even having next year's metric still does not tell you about the pitcher's true underlying skill at preventing hits on balls in park. Just to the extent that we can measure this underlying skill, that's our best guess as to the expected outcome of that skill, with a [insert number] margin of error.

It may very well be that if we look at very specific breakdowns by zones, opponent, fielders, park, weather, etc, that we CAN ascertain what a pitcher's skill is at preventing hits on balls in play (see: PZR). It's just that, for the moment, the metric called "hits per ball in park" does not do a good enough job at establishing the pitcher's skill with "hits per ball in park". (This would be similar to ERA, earned runs per 9 innings, does not do a good enough job to establish a pitcher's skill at allowing "earned runs per 9 innings".)

[End Official Quote]

************
I may be completely wrong, but the numerator is irrelevant to establish the "strength" of the correlation. Triples/PA for a hitter I believe has an r over .50.

Think of it this way. Say I do: x = Triples/PA * 10 + .300, and then say newRate = x / PA. And I did a correlation year-to-year with either Triples/PA or x/PA.... I'm almost positive that my "r" will be identical.

It's the denominator that counts, not the numerator.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:47 a.m., August 6, 2003 (#32) - tangotiger
  Tom or Tango or Tangotiger is good.... I'm not old enough to be a mister.

Ok, I just ran the following test, and perhaps you can tell me what it means. I took 5 pitchers each with 1000 PA, and I randomly gave them, for each PA, a double, a single, or an out, at the rate of 0.1, 0.2, 0.7.

Therefore, we "know" what their true rates are. And we give them a full season to let their true rates manifest themselves.

Then, I did the same for year 2.

As an example, here are singles allowed, year-to-year, for the first 4 of the 20 pitchers in my group.

203,201

208,192

211,196

199,207

Now, since we know, absolutely know, that it's the same talent rate, then we should be able to explain the "r" based strictly on some statistical principle, probably standard deviation. [I'll let you insert that here.]

Anyway, for these 20 pitchers, here are their year-to-year r
2b: .18, 1b: .47, out: .11

Wouldn't we have expected the out, with the highest numerator, to have the highest r, based on your previous explanation?

Now, what I did for a second test was take the same 20 pitchers, but this time, change their talent rates in the second year. For example, allow a .10 doubles rate in the first year, and make it .08 in the second year for 1 pitcher, or .12 rate in the second year. In essence, I'm trying to change the talent rate of my pitcher year-to-year to try to get a lower "r".

Here was the results of that:
2b: .10, singles: .33, outs: .35

I'm not sure what this means, if anything. PErhaps having only 20 pitchers is really limiting, and maybe I should redo with 50 or 100 pitchers.

I look forward to your comments...


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:48 a.m., August 6, 2003 (#33) - tangotiger
  In my initial comment, "5 pitchers" should read "20 pitchers".


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 1:52 p.m., August 6, 2003 (#38) - tangotiger
  Very very interesting!

A couple of things. First, what you are showing is that with a pitcher's singles and extrabase skills consistent year-to-year, that the correlation among a group of 10,000 pitchers with exactly 1000 PAs each was .46 for singles and .28 for doubles.

These figures are virtually IDENTICAL to what I have presented at the top of this page. That is, GIVEN that a pitcher has a set skill, the best year-over-year r that you can hope for is .46 and .28.

More specifically, the year-over-year r that I have presented is consistent with a pitcher having a skill where the range is between .18 and .22 for singles and .09 and .11 for doubles.

Am I reading this right?

What if you extend this to .15 and .25 for singles, and .05 to .10 for doubles?

Will the larger spread in talent among pitchers allow us to get an r to approach 1?

So, I guess what I'm saying is not that the low r is telling you that you've got little consistency, but rather that the low r is showing that you can only get little consistency, simply because the range of talent is so tight.

And that "tightness" is really what DIPS is all about.

This is tremendous stuff Erik! Keep it up.

Finally, can you also give the "r" for the out, the largest component of them all? I'm still not convinced. My guess is that the further you get from .500, the less the "r". So, the "r" of the out (which is .2 from .500) should be slighly larger than the single (which is .3 the other way from .500).


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 3:49 p.m., August 6, 2003 (#40) - tangotiger
  I think what we are prepared to say is that:
- given the spread of "true skill rates" of whatever metric you want, you can estimate the expected "r" year-to-year

- using the sample year-to-year results, you will get an "r" for those samples

COMPARING these two "r" is what establishes to the extent that you can say that a skill exists (in that metric).

So, we can easily have a hits/BIP with an r of .2 and a BB/PA with an r of .7 and in both cases we can say "yes, a pitcher's skill is perfectly represented in those metrics".

It would be good if the Primate statisticians spoke up at this point to add clarity and conviction to what we are saying.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:29 p.m., August 6, 2003 (#42) - tangotiger
  To recap, the year-to-year r is dependent on:
1 - how many pitchers in the sample
2 - how many PAs per pitcher in year 1
3 - how many PAs per pitcher in year 2
4 - how much spread in the true rates there are among pitchers (expressed probably as a standard deviation)
5 - possibly how close the true rate is to .5
6 - the true rate being the same in year 1 and year 2

Given all that, the biggest factor in the K "r" being the highest and the XBH "r" being the lowest may be entirely due to #4. That is, the "r" is not explaining #6 anywhere near as much as we think it is.

Someone please slap me awake... it seems that there's about 10,563 Primates that need to give RossCW an apology?!?!?


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 7:31 a.m., August 7, 2003 (#45) - tangotiger
  I think, maybe, that simply the tightness of the h-hr / BIP (over a career) is what is being explained and not the "pesistence" of ability, based on the "r".

For those of us hoping that "r" was trying to find the signal, that's not what it's doing. The h-hr / BIP is too tight to find a signal.

So, we should use a heavily regressed h-hr / BIP, but not for the reason of "lack of control".

I think.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 11:12 a.m., August 7, 2003 (#48) - tangotiger
  I would think that you create a model where you have known fixed talents, with a range equivalent to what you think MLB has (however you do that, but you can try different reasonable scenarios). And figure out the year-to-year "r" based on this model, and the number of BIP these pitchers have. That essentially gives you the "upper boundary" of r, which may be something like .2 or .25 for hits on BIP.

If in actual life, the MLB r is .18, well then, that's pretty strong evidence of perisistence, right?

I think (again).


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 12:11 p.m., August 7, 2003 (#51) - tangotiger
  Erik, that should be no problem, but there are a couple of issues.

If you've got a pitcher with 800 BIP, chances are that he would be of a certain quality. So, you shouldn't expect a .350 BABIP in this class, based on selective sampling.

If you've got a pitcher with 100 BIP, chances are that the reverse would happen... you'll get lots of .400 BABIP, by luck, and the manager's had enough, and won't put him out there.

That is, I'd expect to find the mean to be different among the classes, and the distribution around them might be skewed based on selective sampling.

I don't know how the spread would be affected.

Send me an email, and I'll send you the file. Unless you need something else, I will give you a file that has:
BIP,1B,2B,3B
for every pitcher by year, 1972-1992, min 100 BIP, and I'll let you select the necessary classes.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 12:11 p.m., August 7, 2003 (#52) - tangotiger(e-mail)
  My email address.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 1:13 p.m., August 7, 2003 (#54) - tangotiger (homepage)
  Erik, if you haven't seen it, click on the above link.
It is the career records of pitchers, relative to their teammates, broken down by career BIP classes.

It is very apparent the skew that exists. We just don't know the reason (selective sampling, ability, or both).


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 3:17 p.m., August 7, 2003 (#56) - tangotiger
  Virtually exactly what we expected if we suspected selective sampling.

A couple of points: you should probably at least adjust for the year-to-year league changes in BABIP. Park does play a role, but it's not like pitchers at Dodger Stadium will get more BIP per season than at Fenway. We kinda expect to have 1 pitcher on each team with 900 BIP, on one each with 750BIP, etc, etc.

What's interesting is that after 600 BIP, you are talking about guys with at least 30 starts. So, it's not like the manager will have suffered with a pitcher for 30 starts and then pull the plug on him. Essentially, selective sampling should not play an issue with 600+ BIP.

Therefore, the effect we see from the 600 to 999 classes would probably be due to skill more than anything.

From 200 to 600, it's pretty stable, and that's probably also due to great relievers balancing out the starters who couldn't cut it after bad luck.

The impact that we are talking about is that the great pitchers will have a BABIP of .272 against the league average of .282. That's .01 hits / BIP, or 7 hits per 700 BIP. That's really what we'll be talking about, after the dust settles.

The range of skill is so tight among MLB pitchers that there's little to differentiate at this level (with the metrics currently at our disposal).

The conclusions that DIPS is showing is still supported, jsut that the statistical justification using the "r" is not applicable for those conclusions (at least to the extent that we first thought).


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 3:27 p.m., August 7, 2003 (#57) - tangotiger
  Tippett brought this up, and since no one else is going to say anything, I will.

The idea to use the team $H (or BABIP) in place of the player $H is severely flawed (though I have used this process many many times).

Because of what we now know about sample sizes affecting the correlation, it's probable that the reason that the team $H works better than the player $H might be simply due to the team $H being based on a much larger sample (4000 BIP to a pitcher's 500 BIP).

In fact, I would bet that if you randomly took any team $H, and compared that to the next year's pitcher $H, that it would do better than the current year's pitcher $H.

Therefore, if you want to do this "substitution" process to kind of mimic your team's fielders, you should find a pitcher on your team with a similar # of BIP. So, if you've got Steve Rogers with 700 BIP and a $H of .270 and Charlie Lea, with 650 BIP and a $H of .285, then use Lea's $H as your control. I think that would work out better.

If DIPS holds, then we'd expect that the pitcher and his control will have an equal "r" when compared to the pitcher's next year's $H.

If someone wants to do this, you should control for
- both pitchers being on the same team in year x
....(and at least 600 BIP each to kind of circumvent selective sampling issues,
.....and have the number of BIP within say 10% of each other),
- the pitcher being studied to also be on the same team in year x+1 (and also at least 600 BIP).

Anyone want to try?


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 6:31 p.m., August 7, 2003 (#59) - tangotiger
  #BIP #seasons observed STDEV..... expected STDEV if all random
200-299 1446 0.032... sqrt(.28*.72/250)= .028
300-399 812 0.0268... .024
400-499 592 0.0245... .023
500-599 507 0.0221... .019
600-699 579 0.0210... .018
700-799 454 0.0204... .016

That is, if all the pitchers in the 500-599 group were all just pitchers of the same ability, we'd expect 1 SD = .019, while we observe .0221.

However, as Erik is showing us, to match the observed, the spread of the pitchers true talent cannot be the same (even though my 019 and 022 look so close). We are showing that the standard deviation of the true ability must be .012, essentially across all samples. This is a great discovery!!

And this .012 is much higher than I would have expected. This means that 95% of the pitchers are within +/-.025 hits / BIP. At 700 BIP, that works out to +/- 17 hits. This is more than double what I would have expected.

The problem is that even though we have this huge gap, we just can't measure it on an individual pitcher's basis with much reliability (until his career is almost over).

Lots to think about....


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 7:37 p.m., August 7, 2003 (#61) - tangotiger (homepage)
  2 things I forgot about: park and fielding.

Go to my site to get park factors. Divide by 2 to simulate half season at a park. Maybe randomly assign a park to your simulated pitchers.

Also assume about 1 sd = .008 hits / BIP for a team of fielders.

Run your stuff again. I think that'll cut your numbers down to half what they are showing.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:38 a.m., August 8, 2003 (#65) - tangotiger (homepage)
  Erik,

To simulate park, that's easy enough. Just go to the above link. We see that the stdev for park is .0085. Since they play half their games at home, the "seasonal" park adjustment would be .004.

We definitely have to simulate fielding, but the question is "how"? If I look at team-level UZR, on a year-by-year basis (n=120 over 4 years), the stdev is about .0100 (but you need to regress somewhat). If I take it on a multi-year basis (1999-2002, n=30), the stdev is .0070. Since teams do turnover, I think the answer lies somewhere in-between, I'd guess. So, I'd make that .008. (I'd guess that if you even just used ZR, or any other measure, you'll get similar results.)

If you were to run your simulation where you set the standard deviation of the park to .004 and the fielders to .008, we can figure out what's left over for the pitchers.

Now, you can try running your sim so that fielding is set to .006 or .010 or anything (reasonable) you want really. So, you can say that "if fielding stdev is .006, pitching stdev is .007... if fielding stdev is .008, pitching stdev is .005", or something alone those lines.

This is really exciting! We can finally come up with the proper "split" between fielding, pitching, and park.

My original guess would have been a 4/3/2 split between fielding/pitching/park.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 2:15 p.m., August 8, 2003 (#67) - tangotiger
  1) Both the park effect and the defensive deviations you mention are the observed standard deviations, correct? If so, I would think that the measured stdev is larger than the true stdev as in the general case. I think we can account for this somehow.

The link I have for the park effects on DER is over a 17year period, or about 80,000 BIP per park. Feel free to regress whatever your sim would say to regress. That is, run your sim giving each team 80,000 BIP, and try to match the observed. I have to believe that you won't regress more than 5%.

2) I just read the UZR Primer by Mitchel Lichtman. However, he focuses on individual performance, not team performance. Do you have a good article relating to team UZR?

On my site, I have MGL's file by player, pos, team, year. My results I just published was based on this data.

3) It appears that UZR ignores certain outcomes (pop flys?) which would not give credit to pitchers who were able to induce lots of pop flys. I am worried this (or other effects) might give more credit to the defense than is due.

That's a fair point. For every one ground out (as opposed to ground ball), there is 0.7 flyouts and 0.3 line outs and pop outs (about evenly distributed). Again, if you want to set aside a certain percentage of BIP as fielder-independent, that's a good idea too.

4) Has anyone done a comparison of year-to-year correlation of pitchers who remain with the same team, versus pitchers who change teams? This seems like it might provide some insight into how much control a pitcher has.

Yes, and I think the Tippett article also examined that. If I remember, he said the year-to-year r of pitchers who switched teams was .09.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 3:43 p.m., August 9, 2003 (#75) - tangotiger
  I would say that the stdev for team fielding would be .008, as I noted earlier.

I suppose we can break down UZR by "IF" and "OF" and get stdev by that level, as an approximation for "GB" and "FB".

Then, we can use the GB and FB rates of the pitchers to figure out the extent to which the IF or OF is impacting them. (Sid Fernandez would be impacted by the OF or FB stdev more than the IF or GB stdev. So, you give Sid and Gooden et al the same OF stdev, but that stdev would apply to Sid more than anyone else.)

I'll guess that the stdev for GB and FB rates is .04. Virtually all pitchers are within .12 of the league average.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 8:54 p.m., August 10, 2003 (#77) - Tangotiger
  Ok, how about we split up the fielding factors by team by position.

Then, give each pitcher his own ball distribution.

THEN, you can figure out the effect the fielding has on the pitcher.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:54 a.m., August 11, 2003 (#78) - tangotiger
  Let me take a few steps back. We're almost to the point that if we start doing all this (accounting for fielding by position and accounting for ball distribution by pitcher), we are really just doing UZR and PZR, and therefore, no need to do this sim analysis.

Therefore, I suggest that to proceed in baby steps, that we assume that the pitchers have the same ball distribution, and that the fielders on the same team are equals as defenders.

Once we get those results in, we can start adding layers like taking into account ball distribution, and individualized fielding.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 2:28 p.m., August 11, 2003 (#80) - tangotiger
  Please note that I meant that each fielder on the same team would be "equals", but that each team of fielders would follow the .008 standard deviation that UZR says it is.

Like I said, start off with baby steps, and work your way up.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 1:57 p.m., August 12, 2003 (#83) - tangotiger
  Erik, SUPERB stuff! I think as a rule of thumb that on balls in play, fielding/pitching are 50/50, based on your analysis in Case 2.

Now, what if case 1 is more representative? You asked how I got the ".008" as the true expected. In my post #65, I said the following:
If I look at team-level UZR, on a year-by-year basis (n=120 over 4 years), the stdev is about .0100 (but you need to regress somewhat).

Therefore, if you want to rerun to establish what the "true rate" based on this "observed rate", remembering that we've got n=120, perhaps we will find that the true rate standard deviation will be .007 rather than .008 for fielding. My guess is that if you rerun using .007 for fielding, that you'll get .008 for pitching.

In any case, even if you stop here, I think you've added a tremendous amount of knowledge to this.

Our current best-guess is that fielding and pitching are (more or less) equally impactful on BIP.

The revelation about what an "r" really is is also incredibly important to us non-statisticians.

If you were to rewrite your research and analysis, I'd be glad to post it here, or send it to the "home page" of Primer.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 7:18 a.m., August 14, 2003 (#99) - Tangotiger
  To split GB/FB: I mentioned earlier that we can probably use a mean of .50, with a standard deviation of .04. I'll confirm that later.

I will redo the UZR observed calcs, splitting between IF and OF (to approximate GB/FB). I'll guess that we'll get a std dev of .015 observed for each.

For the park, for the IF/GB, I have to believe that the effect is almost all grass/turf. From that standpoint, you would do something like +.002 grass, and -.002 turf or something. If someone wants to look at the DER factors I put up, you can probably make a good guess at that. Maybe.

So, for the OF/FB factors, you'll probably get .006 or .007 for the standard dev.

So, as the next baby step, in addition to the steps Erik has taken, you randomly assing a pitcher a grass or turf park (based on the 1972-1992 teams), and you randomly assign him a GB/FB tendency, and you randomly assign him an OF/FB park factor.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 7:19 a.m., August 14, 2003 (#100) - Tangotiger
  And this is where now Erik has to split things in 2: based on what is GB/FB rate is, you'd have to give say your first pitcher 2000 GB and 2550 FB, and then use the appropriate park and fielding factors for each of GB and FB.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:56 a.m., August 14, 2003 (#103) - tangotiger
  Among the 752 pitchers with at least 1000 PA (average of 3,589), the standard deviation to their GB rates was .078.

Among the 183 pitchers with at least 5000 PA (average of 8,106), the standard deviation to their GB rates was .066.

Among the 36 pitchers with at least 10,000 PA (average of 12,705), the standard deviation to their GB rates was .058.

My guess is that if you were to run your "expected" to "observed" sim using these numbers, that you would get the "expected" stdev of .04.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 11:14 a.m., August 14, 2003 (#104) - tangotiger
  The standard deviation, observed, over 120 team-years:

whole team: .008
IF only: .010
OF only: .013

The rest of this take with a grain of salt, since I had to make some assumptions. Anyway, by position, over 120 team-years, here are the stdev:
2b/ss: .015
3b/lf/cf/rf: .023
1b: .030

Now, what can you do with this information, besides what we've talked about? Well, you can FINALLY answer the question: is fielding talent at a position independent on a team level? That is, do teams seeing that they have a bad SS counter that with a great 2B? Or, are the talents at the positions randomly distributed?

Well, once Erik or someone confirms what the "expected" stdev is based on these true rates at a position level, you can then see if using these values as independent variables will match the observed at either the IF/OF level or at the team level.

My guess is that teams DO treat positions rather independently.

Should be fun to find out...


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 11:31 a.m., August 14, 2003 (#105) - tangotiger
  Actually, Arvin, I'm assume "league average" for everything else. For example, I already published the DER park factors over the 21 year span. The standard deviation (50% home, 50% road) was .004. I'm assuming that over that many years and BIP that the observed and expected would come in at pretty much the same thing.

A pitcher takes a random point inside this DER park factor, and when applied with the pitcher's expected DER rate, and his sample (say BIP=600), this will match the observed DER rates (which I sent to Erik).

So, now we're extending that. We're saying that a pitcher will have a random GB rate which we're taking from the stdev observed of .06, which we have the "true" rate as probably .04. We split up his BIP into 2. Instead of the park factor DER, we use the park factor IF or OF DER. Since the observed at the IF/OF level is around .011 or so, the expected might be .006.

etc, etc...


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 9:50 a.m., August 15, 2003 (#110) - tangotiger (homepage)
  Excellent ball distribution data can be found at the above. Essentially, this is what I used, even though it doesn't exactly correspond with the same time period.

I did not park adjust any of the figures I supplied. I will rerun my standard deviations by pos to make sure I've done it correctly.

Yes, I use UZR for everything.

The .010 was the observed standard dev for n=120, and .008 was the "agreed expected true", which you can confirm using your process.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:41 a.m., August 15, 2003 (#111) - tangotiger
  Ok, reworking my numbers to match the Levitt numbers, this is what I get for standard deviations. n=120.

Both: .009
IF: .013
OF: .013

rf: .020
2b: .020
ss: .021
lf: .022
cf: .026
3b: .031
1b: .032

(Doing a weighted average of the above, and we get a value of .024. I think for ease, we should consider the standard deviation on a per-position basis to be the same and equal to .024. Erik, it's your time, so do whatever you figure you can handle.)

These standard deviations are all observed and need to be sim-ed or calculated to determine the "true rates".

There's something that looks strange with the Levitt numbers. For example, the BABIP against SS comes out to .066, while against CF it's .454.

I do know that the BABIP for GB and FB are more or less similar (about .030 off, with the value lower against OF). But the Levitt numbers show a BABIP of under .100 for IF and over .500 for OF. Do some balls hit into the OF count as GB? Is this what I'm missing?


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 11:15 a.m., August 15, 2003 (#112) - tangotiger
  When I think about it, each of those positions need to be regressed a different amount, since the opps for each position is different. So, first, we'd have to do the sim process to get to the true rates for each position. THEN, we can do a weighted average if we want a uniform true rate to use.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 3:58 p.m., August 15, 2003 (#115) - tangotiger
  Arvin,

I don't know what you are talking about! Please re-explain.

What I am saying is that for the *park factors* DER, those splits were based on 21 years of data, comprising about 80,000 BIP per team. So, if I say that Fenways is +.020 hits / BIP compared to a non-Fenway park, my guess is that this observed difference will be pretty darn close to whatever "true" difference would produce this observed difference over 80,000 BIP. Are we talking about the same thing here?

As for UZR, just think about ZR or DER instead. We are simply talking about how many extra outs a fielder makes / BIP. The standard deviation, on the observed team-level data (n=120) is .010. Broken down by position, the observed standard deviation (n=840) is .024.

If you regress a certain amount, or calculate the "True" rate using this sim process, my guess is that the true standard deviation that produces those observed figures would be .008 for team fielding and .015 for positional fielding.

Mike: interesting. Can you provide the "plays,outs,hits in zone" (however you want to define it), split by GB/FB, by position for any year that you have it?


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 8:26 p.m., August 16, 2003 (#118) - Tangotiger
  .010 is the observed stdev, which we simed (or mentally regressed) to .008.

Arvin, is that a statistical equation? Because it is brilliant and simple! Pythag move over, make way for the Arvin theorem.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:49 p.m., August 16, 2003 (#119) - Tangotiger
  Arvin's theorem is intriguing. For example, I mentioned that the observed stdev for IF and OF was .013, and for the team it's .009 (according to my post 111).

Let's see what happens with this new equation, and realizing that half the BIP are IF and half are OF (let's say).

Observed team ^ 2 = [(.013/2) ^ 2] + [(.013/2)^2] = .009 ^ 2

Wow!

How about if we use the .024 for each of the 7 positions? Following the same process, and we get: .009!

Holy moley!

Now, if you want to really impress me, tell me how to get from the observed stdev to the true stdev. That is, how much do I regress towards the mean, given the sample size? Do I make it k/sqrt(n)? How do I know what to set k to?


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 11:17 p.m., August 17, 2003 (#121) - Tangotiger
  Ah, got it now. Tremendous stuff.

So, to go back to first my team-level data. I showed an observed stdev of .009 for my 120 teams, each of which has about 4500 BIP. In your equation above, is n=120, or n=120x4500 or n=4500? If n=120, then how do you account for a team having 4500 BIP or 62 BIP?


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 9:45 a.m., August 18, 2003 (#123) - tangotiger
  Good stuff again.

So, we have

.090^2 = .28*.72/4500 + true^2

that makes the true std dev at the team level as: .090

Actually, even after only 450 BIP, the true stdev rate comes in at .087.

Am I doing this right?


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 9:49 a.m., August 18, 2003 (#124) - tangotiger
  Oops... that should be .009. Working it out again, and we get: .006

So, that's the fielding.

.012 ^2 = .006^2 + .004^2 + pitching^2

pitching = .010

So, are we saying that each pitcher has a .010 stdev, each team of fielders is .006?


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:05 a.m., August 18, 2003 (#125) - tangotiger
  Continuing in the same vain:

true team fielding ^2 = true 1b fielding ^ 2 + true 2b fielding ^ 2...

.006 ^ 2 = [(t/7)^2] * 7

(That is, each position is on average getting 1/7th of the plays, and there are 7 positions. See post 115 for more info.)

t = .016 = true avg single fielding position

So...... the true standard deviation for a single position is about .016. The true standard deviation for pitchers is .010. So, on any given BIP, the fielder has more influence than the pitcher.

On a group of BIP, the pitcher has more influence than the team of fielders.

Anyway, since we know that range of fielders UZR runs is about +/- 30 runs (and since we know that their stdev is .016), then I would make a guess that pitchers with a stdev of .010 would have a range of +/- 20 runs. That is probably our best guess as to the influence of the pitcher on BIP.

Just taking a wild guess, but if the range is +/- 20 runs, then 1 stdev is probably +/- 6 runs. So, we expect say 95% of pitchers to be +/- 12 runs.

Since our best interpretation of BABIP shows that a pitcher's skill is about +/- 8 runs, for 95% of them, then the BABIP is not a good enough metric to capture the real skill that a pitcher has on the influence of BIP.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:14 a.m., August 18, 2003 (#126) - tangotiger
  Sorry for the continuous posts, but I'm writing faster than I'm thinking. That last step you should ignore, as it uses different denominators.

Anyway, since we've established for fielders that 1 stdev is .016, and if they average about 650 plays each, that gives us 1 stdev = 10 plays per season, or about 8 runs. That's 1 stdev for fielding runs for an average fielding position.

For pitchers, 1 stdev is .010. The average full-time starter will have 700 BIP, and the average reliever will have 200 BIP. For the starter, .010 stdev on 700 BIP = 7 plays per season, or about 6 runs. That's 1 stdev for pitching runs for an average full-time starter. That means 95% of pitchers have a skill at preventing hits on BIP to the tune of 12 runs per 700 BIP.

I believe that our current interpretation of BABIP is that 95% of pitchers have a skill to the tune of 8 runs per 700 BIP, but I'll have to look that up again.

Bottom line? Pitchers have the skill, not as much as an individual fielder (60/40 split on a single BIP), but they have more skill than a team of fielders (60/40 split the other way for a season of BIP). And the BABIP metric is not good enough to capture this skill.

Which is why we need PZR to find their skill.

Tremendous work by Erik and Arvin!!


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 1:50 p.m., August 18, 2003 (#127) - tangotiger
  Erik, Arvin, and anyone else who has contributed to this thread: I was thinking of doing a writeup of this entire thread as an article, hopefully citing everyone's work at the appropriate places. This has really been an eye-opener for me, and perhaps having a (hopefully logically ordered) detailed summary of the really incredible work by Erik and Arvin would be the reference point for DIPS going forward. (Arvin, I've already got Erik's email, so please send me your email address.) I don't know about anyone else, but I call this a sabermetric orgasm!


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 7:08 a.m., August 19, 2003 (#131) - Tangotiger
  I think you're right, but I've got to think about it for it to sink in. Makes sense though...


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 11:10 a.m., August 19, 2003 (#134) - tangotiger
  Yes, I think I am agreeing with you. Since our assumption is based on fielding talents on single fielders only, I think we can stick with .006 (though again, I don't see this being impacted to a significant degree if we look at SS to 1B throws, or 2B to SS DPs, etc).

So, what we are saying is that we have a 10/6/4 split between pitching/fielding/park, in that order. Luck plays a part, and that is dependent on the sample size. When n=1, it's almost all luck. When n=1 million, luck is not involved.

So, over 700 BIP, where we observed a .020, we have the following:

observed ^ 2 = .010 ^ 2 + .006 ^ 2 + .004 ^2 + luck ^2 = .020 ^ 2
solving for luck = .016

So, can we say that when a starter has 700 BIP, the influence on those BIP as a group can be broken down by:
luck : 44%
pitch: 28%
field: 17%
park : 11%
??

I have to admit that I've recently said, though I don't remember where, that I thought the split would be 40/30/20/10 with the order being luck,fielding,pitching,park.

What we are saying here is that pitching and not fielding is the larger determinant between the two. And perhaps before I read about DIPS I might have had the correct order.

I think it's still important that yes we need to separate the components (HR.BB,K) from the BIP, as Voros does. But, the conclusions drawn from that does not stand based on the reasoning.

I think our best conclusions would be the follows:
1 - pitching has more impact on BIP than does fielding
2 - luck has more impact than anything, over 700 BIP
3 - BABIP is not a good enough measure for the pitcher's skill

What would be interesting is that if MGL or Tippett or someone with pbp data gets around to implementing the PZR blueprint I published (the flip side to UZR), that we'll get closure on this subject. That is, we should be able to get the standard deviations on the pitcher's side that will support the data we are inferring here.

So, before we trample in any direction, it may be worthwhile to keep the case open, pending final data. After all, we may have made a serious miscalc somewhere.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 11:46 a.m., August 19, 2003 (#135) - tangotiger
  Someone asked me about the implication of all past DIPS work. I responded the following:

=====================
I'm not really sure of the impact. It's still a blur
to me as to what use to make of it.

What we are saying is that there are 2 components for
a pitcher: his non-fielding dependent skill (HR.BB.SO)
and his fielding-dependent skill.

We know very well how to estimate the former, and not
very well the latter. Since the BABIP figure is not
reliable for an individual pitcher, it's more accurate
to use say 50% lg, 40% team, 10% pitcher to estimate
his expected BABIP. But, that estimate will come with
a very wide margin for error.

The conclusion stands that you need to separate
things, and you can't rely on a pitcher's past BABIP
to predict the future (much like you wouldn't use his
ERA). Still outstanding is WHAT to use for BABIP.
I'll contend that PZR would be that measure. But,
that has yet to be implemented by anyone.
=====================


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 2:47 p.m., August 19, 2003 (#137) - tangotiger
  Actually, the observed should have been
600-699 579 0.0210
700-799 454 0.0204

So, at 700 BIP, I should have used .0207. Reworking, and we get a nearly perfect match.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 5:02 p.m., August 19, 2003 (#140) - tangotiger
  Arvin,

Please note that each individual fielding position has a true standard deviation of about .016, and an observed of .024, for n=120.

The .006 figure is the TEAM standard deviation for fielding.

So, on a player basis, it's .016 x 650 (or 10). On a team basis it's .006 x 4500 (or 27).


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 5:04 p.m., August 19, 2003 (#141) - tangotiger
  Also note that if each team of 7 fielders were independent and randomly chosen, you would get
team ^ 2 = fielder ^2 x 7

And, 10 ^ 2 x 7 = 700
27 ^ 2 = 729

Close enough.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 8:58 a.m., August 20, 2003 (#143) - tangotiger
  I just want to interject something to keep in mind. Remember, our equation is

trueDER ^ 2 = truePitch^2 + trueField^2 + truePark^2

Erik has provided trueDER from his sim, and Arvin has confirmed it with his "observed" equation, and that is .012. I've provided the truePark figure as .004. Dropping all the decimals, and our equation becomes

128 = truePitch^2 + trueField^2

Based on UZR, which I'll have to go over because I'm not sure I'm using the right numerator (Levitt's numbers might include HR), the observed single position UZR is around .025 and the observed team fielding UZR is around .010. So, our true UZR will be somewhere between .005 and .008, probably.

We're not even sure that UZR is the best thing to use, but it is the best thing available at the moment. (You could even use ZR, and I'm pretty sure you'll get a single position observed stdev of .025 for your regular players. This is easy to eyeball since the range of players is mostly around +/-.05 outs/BIP, so that would be 2 standard deviations.)

Anyway, so we've got something like
128 = truePitching^2 + [5 to 8]^2

So, when fielding = 8, pitching = 8.
When fielding = 5, pitching = 10

etc, etc.

So, depending on how the fielding measure is determined and manipulated, a small change there will have a huge impact in the relative value between fielding and pitching.



Exactly How Full of S is OPS? (August 6, 2003)

Discussion Thread

Posted 3:08 p.m., August 7, 2003 (#5) - tangotiger
  I mentioned this in my OPS article, but that doesn't matter.

Value = (OBP - baseX) * 3 + (SLG - baseY) * 1
Value = 3*OBP + SLG - 3baseX - baseY
Value = 3*OBP + SLG - whatever

As you can see, it's irrelevant what baseX, baseY, or whatever equals. The DIFFERENCE among the players will remain exactly the same.

Try it out. Take a 400/500 player and a 300/600 player, and take whatever baseline for OBP and SLG you want. The difference between Value(player1) and Value(player2) will be exactly the same.


Hoban - A player ranking (August 8, 2003)

Discussion Thread

Posted 1:33 p.m., August 8, 2003 (#4) - tangotiger
  Hoban was kind enough to reply back to my email, pointing him to UZR, which he is aware of.

I replied:
=========
In order to evaluate Jeter, Nomar, and ARod, you would be forced to use play-by-play data, since that would contain all the data available for the players that you are evaluating.

The less data you decide to use, the larger the margin of error being introduced.

If the intent is to compare Arod to Ozzie to Honus Wagner, it is not clear that using the same data would be the best way. What you can say is that by using UZR, you can say something like Arod is +10 relative to average, +/- 3 runs, and by using the basics, you can say Honus Wagner is +12 relative to average, +/-8 runs (or something along those lines).

While I see the point in using the same metric for all players, you would at least have to use the most advanced metric to baseline the more basic metrics against. The validation of the basic version must be done against the more advanced metrics, with a certain margin of error.

Just my 2 cents.
============


Hoban - A player ranking (August 8, 2003)

Discussion Thread

Posted 9:13 a.m., August 12, 2003 (#6) - tangotiger
  I posted this at Clutch:
Just because you do 5*fld% + RF or whatever, does NOT make one thing 5 times more important. This is the same kind of explanation that people give about OPS.

BEcause these things are different scales, the multiplier does not and cannot imply importance.

What that multiplier does is streeeeetch out the range of performance.

If for example the fld% stretches from .950 to .990, that's a .040 swing. If RF stretches from 4.00 to 6.00, that a 2.00 swing.

By multiplying fld% by 5, you are increasing the swing from .040 to .200. The RF swing is still much much larger.

I don't know what Hoban's exact equation is, but, please keep this in mind when you think of "importance" and matching it to the "multiplier".



Pitch Counts, estimated (August 8, 2003)

Discussion Thread

Posted 4:10 p.m., August 11, 2003 (#1) - tangotiger
  Based on the above estimate file, here are how many pitches were thrown, by decade, per game per team.

decadeStart pitchesPerGame
1890 134
1900 129
1910 133
1920 135
1930 137
1940 137
1950 139
1960 139
1970 139
1980 140
1990 143
2000 144

As you can see, we are increasing the counts by about 1 pitch per decade. So, I don't think we can say that it's harder for a pitcher to get a complete game these days because more pitches are thrown. I think it simply comes down to that pitchers today are on much tighter pitch ranges on a per start basis, even though, on a per-season basis, today's pitchers pitched as much as any non-70s pitcher.


Pitch Counts, estimated (August 8, 2003)

Discussion Thread

Posted 3:16 p.m., August 13, 2003 (#3) - tangotiger
  It's fun for me too.

One thing that I didn't show was the number of pitchers, per year, with a pitch count of at least 4000. By far, since 1919, the 1969 to 1975 time period shows the largest number of pitchers at that level. From 1989 to the present, it's by far the lowest.

Number of starts has something to do with it. The increase in the number of pitchers in the bullpen might be another reason.

And I agree, it's not like those workhorses were getting injured like crazy in the 70s.

If it was me, I'd go back to the 1970s style of starter and reliever usage.


Pitch Counts, estimated (August 8, 2003)

Discussion Thread

Posted 7:05 a.m., August 14, 2003 (#5) - Tangotiger
  The earliest pitch counts I have is for Koufax, and those haven't changed.

However, why SHOULD they change, be it 1911 AL, or 1976 College? Think about it. The rule is 4 balls 3 strikes and 2-strike fouls. If you end up with 75% contacted balls, 15% Ks, and 10% walks, don't you think the ball-strike progression to get to those observed results would be similar regardless of league?

If Koufax, Feller, Walter Johnson, or RJ are all at 60% contacted balls, 30% Ks, 10% walks, again, could the approaches of the batter/pitcher be that different that the ball-strike progression for each pitcher be completely different?

But, like I said, this is mostly theoretical. I'll be getting the Dodgers 1947-1964 data soon, so we'll see how that stacks up.


Pitch Counts, estimated (August 8, 2003)

Discussion Thread

Posted 10:27 a.m., August 14, 2003 (#6) - tangotiger
  Remembering that I did NOT use Koufax as part of my sample to establish my equations, here is how Koufax stacks up, through 1964:

Actual pitches thrown: 26,450
Estimated, xPCE: 26,785
Estimates, basic: 26,300

So, the xPCE is 1.3% too high, and the basic is 0.6% too low.

So, if I say that Steve Carlton averages say 4,300 pitches/season, I'm probably within 100 pitches/season of being accurate.

Of course, getting the pitch count totals for games of yesteryear would be great to validate against.


Pitch Counts, estimated (August 8, 2003)

Discussion Thread

Posted 3:55 p.m., August 14, 2003 (#7) - tangotiger
  By the way, I'm not saying that pitches are thrown at the same rate, whether on a pitches/PA or /IP or /game basis. What I am saying is that the function of pitches/PA is dependent almost entirely on the rate at which balls are put into play. So, if you have an era where most balls were put into play, then the pitches/PA would be lower than otherwise.



Psychological Impact of Losing an Easy Game (August 9, 2003)

Discussion Thread

Posted 10:34 a.m., August 10, 2003 (#2) - Tangotiger
  I believe that the link underlying that thread points to a study that does that.



BP - Sample size and park factors (August 11, 2003)

Discussion Thread

Posted 7:50 p.m., August 11, 2003 (#4) - Tangotiger
  Patriot, I agree with you.

Depending what you want, sometimes run PF are what you want, and other times component PF.


BP - Sample size and park factors (August 11, 2003)

Discussion Thread

Posted 10:43 p.m., August 11, 2003 (#6) - Tangotiger
  FJM: can you calculate the odds that a team of pitchers that gives up 10% of their hits as HR would give up 12.5% at home and 7.5% on road, given 400 hits in each, by random chance alone?


BP - Sample size and park factors (August 11, 2003)

Discussion Thread

Posted 1:41 p.m., August 12, 2003 (#8) - tangotiger
  So, combining the two, it's 1 chance in 460 of having those rates occur by luck, correct? Seems like something's up, especially if you go back to the history of Dodger Stadium where I presume that the split would usually be the other way.


BP - Sample size and park factors (August 11, 2003)

Discussion Thread

Posted 7:38 p.m., August 12, 2003 (#13) - Tangotiger
  FJM: the 10% was simply a historical figure that I like to use.


Neyer - Angels (August 15, 2003)

Discussion Thread

Posted 1:43 p.m., August 15, 2003 (#2) - tangotiger
  That "32" should be "34".


Neyer - Angels (August 15, 2003)

Discussion Thread

Posted 3:48 p.m., August 15, 2003 (#6) - tangotiger (homepage)
  I agree you can't just say take the last 3 years, and ignore the rest.

The homepage is Erstad's br.com page. His career OPS+ (park adjust OPS) is 100, or league average.

If you were to weight his OPS on a 5,4,3,2,1,1,0.5 basis from 2002 back to his rookie year, his weighted OPS would be: 96.

So, that's pretty darn close to average. Also, don't forget all the little things he does that I mentioned that does not show up in OPS or the "boxscore".

Anyway you cut it, our best guess is that Erstad was an average hitter entering 2003. He has 280 PAs in 2003.


Neyer - Angels (August 15, 2003)

Discussion Thread

Posted 5:04 p.m., August 15, 2003 (#8) - tangotiger
  Actually, just because you perform at +40 runs over average doesn't mean that is your talent level. It's more likely he is +30 runs over average.

Anyway, how can you figure it out at home? Assume there are 3.5 plays available for every CF, and the avg CF makes 3 outs on them (rate of .857). How much would a great and bad CF make? Let's guess .900 on the top end and .800 on the bottom end. So, essentially, a great defender will be +.05 hits / BIP better than average. In this case, that works out to about +.2 hits per game, or about 32 hits per season.

Each hit-to-out is worth .80 runs, and so, the top defender would be worth about +25 runs in this example.

If the great defender is +.06 hits / BIP, then, he would be +30 runs.

For all intents and purposes, I think that +30 is the upper boundary for a CF, and realistically, +20 is the upper boundary for a CF's career.


Neyer - Angels (August 15, 2003)

Discussion Thread

Posted 3:53 p.m., August 18, 2003 (#11) - tangotiger
  No, each position should have its own converter, based on how many extra base hits are hit in his zone. I believe Chris Dial may have published this somewhere, or maybe it was MGL?

Pretty much, I think, it was between .75 and .85 for every position. So, to not complicate matters, I like to use .8



Game-Calling Revisited - Chris Dial (August 16, 2003)

Discussion Thread

Posted 1:08 p.m., August 17, 2003 (#2) - Tangotiger
  No, it is based only on those parts that are mostly catcher-pitcher in relationship (or those things that are not dependent also on the fielders). I don't think Chris should even have brought DIPS into play, or any of that other stuff.


New postseason odds (August 17, 2003)

Discussion Thread

Posted 11:28 a.m., August 19, 2003 (#6) - tangotiger
  It's worth noting that BP and Dackle offer very close odds for everything except the 2 central divisions.

The differences between approaches are:
BP: head-to-head matchups do not have binary approaches
Dackle: does not do a good job at valuing a team's true talent level

What does this mean? As long as the number of games remaining is large enough, the BP estimates are more reliable. However, as soon as you've got say 20 games left, the BP estimate would have to be discarded, and Dackle's estimates would take precedence. You can't have the possibility as BP has it that the Yanks and Redsox can win the same game and hope it evens out over such a small number of remaining games.


Pankin - Walking Bonds - SABR presentation (August 17, 2003)

Discussion Thread

Posted 4:12 p.m., August 21, 2003 (#3) - tangotiger (homepage)
  Aurilia on 3rd, Grissom at 1st. 1 out, bottom of the 9th, tied at 1

Let's look at some win probabilities, assuming that you've got an average opponent, and you yourself are also average. (I'm guessing Smoltz was there, but I suppose we have a great pitcher for the Giants as well? Maybe not.)

Anwyay, bottom 9th, tied, 1 out and:
men on 1b/3b: .829
bases loaded: .835

uhmmm, I said "don't walk?"... let me see. According to the link above, I'm saying don't walk any time you have a runner at 1b or 3b with 1 out. Kinda strange, so let's look into it some more.

If Bonds gets a hit, game over. If Bonds gets a regular walk, he barely gains anything (.006 wins). If he gets an out, the Giants win prob goes down to .643.

So, a hit adds +.171 wins, an out drops -.186 wins, and a walk adds .006 wins.

You know, that really doesn't make any sense. The win prob cannot be .829, it must be much lower. In this case, it's very easy to figure out.

Win prob(when hit wins game) means:
freq(H) * (1 - winprob) = freq(out) * (winprob - .643)
(assuming that the walk is almost irrelevant)

That sets our winprob at .762.

This is really strange, I must have programmed something seriously wrong.

Thanks for pointing this out to me...


Pankin - Walking Bonds - SABR presentation (August 17, 2003)

Discussion Thread

Posted 10:58 p.m., August 21, 2003 (#5) - Tangotiger
  This is really strange, and I'm going to post my thoughts on the matter tomorrow.

The win probs that I have listed are correct, and I have one "sure-fire" way of doing them, and I have a second "fail-safe" way to verify them. I can't verify them for this 9th inning scenario.

As soon as aaron gave me the situation (man on 1b/3b, 1 out, tied, bottom of 9th), this was such an easy "walk now" situation (since the runner being on 1b or 2b almost virtually doesn't matter), that it really stunned me that I said "don't walk".

Anyway, I'll give more details tomorrow, and maybe one of the clever Primates can point out my flaw.


Pankin - Walking Bonds - SABR presentation (August 17, 2003)

Discussion Thread

Posted 10:09 a.m., August 22, 2003 (#7) - tangotiger
  Let's figure out how to calculate run expectancy (RE). Given the following:
- safe play occurs 33% of the time
- RE AFTER a safe play is 1.10 runs
- RE AFTER an out play is 0.30 runs

What is the RE BEFORE this PA?

RE = .333 x 1.10 + .667 x .30 = .5667

So, we can also say:
- LWTS value of safe play = 1.10 - .5667 = +.533
- LWTS value of out play = .30 - .5667 = -.2667

And
.333 x .533 = .667 x .2667

With me?

Ok, now let's try to do this with Win Expectancy.

What's the chance of scoring at least one run (and thus winning the game), with a man on 1b, 3b, 1 out, bottom 9th, tie game, assuming lg average opponents?

You have a 66% chance of scoring your run IN THIS INNING, and winning the game. The other 34% chances you get into Extra Innings, and you have a 50/50 shot at winning.

So,
WE (bottom 9th, tied, man on 1b/3b, 1 out) = .66 + .34x.50 = .830

Doing the same thing
WE (bottom 9th, tied, man on 1b/3b, 2 outs) = .28 + .72x.50 = .640

Ok, so those are our known true WE.

Now, doing the same as we started with RE:
- 26% chance of hit or RBOE = WE of 1.00
- 9% chance of walk = WE of .840
- 65% chance of out = WE of .640 (maybe less cause of DP)

So,
WE = .26 x 1.00 + .09 x .840 + .65 x .640 = .752

But, we expected .830

Can someone figure out where I'm going wrong?


Pankin - Walking Bonds - SABR presentation (August 17, 2003)

Discussion Thread

Posted 10:12 a.m., August 22, 2003 (#8) - tangotiger
  After I posted that, it finally hit me: sac flies! The WE can actually increase substantially, following an out, and my Bonds program looks like it did not consider the SF properly.


Pankin - Walking Bonds - SABR presentation (August 17, 2003)

Discussion Thread

Posted 6:57 p.m., August 22, 2003 (#10) - Tangotiger
  Yup, the win prob tables I published did consider the SF and grounders scoring the runners, etc.

But, my walk/don't walk Bonds did not balance that properly. Kinda embarrassing really. It's easy enough to fix, maybe 1 or 2 hours of work. Not sure when I'll do it though, but I'll target the 1st game of the playoffs.


Pankin - Walking Bonds - SABR presentation (August 17, 2003)

Discussion Thread

Posted 10:56 p.m., August 23, 2003 (#12) - Tangotiger
  Sorry, but I need the pbp event files to do that. I'll only be able to look at this in the off-season.



Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 7:15 a.m., August 19, 2003 (#2) - Tangotiger
  Greg, thanks for the kinds words, as I wake up from a night of a crying baby. I don't think any of the stuff I did is a great advance... maybe I'm making a better hammer, a better tool, but UZR or DIPS is a building by comparison.

I think the next great advance would be to get "tools-based" analysis done properly. There's a ton of information in the heads of scouts that needs to be extracted and quantified in a more systematic and widespread use. Of coure, this may already be done by others, and we just don't know about it.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 9:01 a.m., August 19, 2003 (#4) - tangotiger
  I've actually been working on #1. If you send me an email, I'll let you know what I've done.

Your #2 and #3 are great ones. I think the #3 will be a huge advance if we can also incorporate scouting.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 11:19 a.m., August 19, 2003 (#7) - tangotiger
  I meant more in say Brady Anderson or others with great 600 PA numbers, but not great 1800 PA numbers.

Essentially, and this happens in hockey too, you give a player a 4-year contract based on his first breakout year. And how did they know it was a breakout year? They were hoping/looking for it, and those 600 PAs confirmed it for them.

Unless something fundamental has changed about a player or pitcher's approach to a player, a breakout year is virtually impossible to find. They exist, but I don't think you can find it until well-after the fact. 1997 may have been a breakout year for Pedro, that maybe everything finally came together for him at that time. But we only know that if it's 1999 or 2000, if we use only the numbers.

Was 1988 a breakout year for Mark Davis? Well, he did even better in 1989, and then phhft. If you do a systematic view, I would be surprised if you can find a breakout year, based only on the numbers.

Your visual tools might have spotted Pedro's breakout in 1997, and you might have realized that Mark Davis might have been getting lucky in 88/89.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 12:33 p.m., August 19, 2003 (#9) - tangotiger
  It's interesting that we've now got 2 readers saying such a thing, that this advance is 20 years old.

Let me ask a question then: all you know is the following 2 bits of information
- Johnny Walker has an OBA of .380, with 600 PA
- the league OBA is .340

What is your best guess as to JW's true OBA talent level? That is, if he were to have 1 million PAs, what's your single best guess as to his true OBA level? Is his chances at really being .380+ equal to, more than, or less than 50%?


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 2:52 p.m., August 19, 2003 (#12) - tangotiger
  Ed, this is where we will have to disagree. Our best guess is that it is less than .380.

A player's performance numbers are a sample of his true talent. It is observed.

If I had 1000 such players, regression towards the mean would say that as a group, they would move towards .340. Therefore, if the group moves towards the mean, more than half has to move toward the mean (unless you think these moves are not symmetrical enough that we can say such a thing).

Little Johnny Walker: .380 OBA, 50 PAs
League: .340 OBA

What's your best guess there?


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 3:31 p.m., August 19, 2003 (#15) - tangotiger
  Rally, yup it would have to be under .380. And you are right, the second part, the degree of movement would be dependent on the number of PAs.

For 600 PAs, I'd guess a 30% regression on OBA, or .368. For 50 PAs, I'd guess a 80% regression, or .348. Just some guesses, and a little ingenuity really. Each component has its own regression factor based on # of PAs.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 3:36 p.m., August 19, 2003 (#16) - tangotiger
  I agree that James did do a good job of it. But, I'm not so sure he understood WHY he was doing it, or do I think the stat fan really understood the implication of all this. Even fans on this site will quote you the 11-20 shawn green has against Bacardi Rum, and that this means something of any significance.

James also "invented" replacement level, so I'm not sure what Gary H was talking about with Woolner's advances on that. Comparing the two, James inventing replacement level and regression towards the mean, and the extra knowledge that Woolner has added, and the extra knowledge that we've all come to know about regression from the statisticians around, I think the regression issue had a bigger advance.

Anyway, sorry for manipulating the topic. Any other advances, or advances-to-come?


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 8:50 p.m., August 19, 2003 (#21) - Tangotiger
  I get these insulting messages every now and then, so I suppose I should address them every now and then.

Off the top of my head, I linked to a BP article regarding sample size and the Redsox and BIP and I praised them for taking the time to take a full paragraph to explain the limitations.

Nate had an excellent piece on replacement level, and I praised that too.

Sheehan had an article on pitcher workloads, and I can't remember what I said, but I was I think complimentary.

So, that was probably in the last 2 or 3 months.

Over at Clutch, I linked to the Jack Morris reprint article of Joe Sheehan. I remember another link to Craig Biggio, and then another to Andruw Jones too.

I'm pretty sure I had a direct link back to the article.

Oh yeah, I think I linked to a Randy Johnson quote that Ryan Wilkins had, but I don't remember if I did link to them.

I've made I think 2 links to Keith Scherer on balls and strike counts, though I distinctly remember at least one.

Ok, so I didn't put a direct link in the title to Brook Keisheickiek, but I did reference it in my comment, and it was not even germane to BP. I just liked the idea of dual players. And same here. I just like the question that the BP reader had. Do I have to link to the whole chat?

And finally, I make some tongue in cheek comments about "unnamed authors" at BP with their TP series (no bylines). So, I intentionally didn't name Gary in my opening piece as a play on that. But,who cares anyway?

Now, is that satisfactory to you?

In terms of BP and Primer referencing each other, I have to say that there are 100 BP links from Primer to every 1 Primer link from BP. If you want to ask about policy, don't ask me.

Now that you've p-ssed me off, and taken 10 minutes out of my life, I'm going to play with my kid now. Why do I waste my time? Sheesh...


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 10:13 a.m., August 20, 2003 (#24) - tangotiger
  But the forecasts for t+1 are not the same thing as the best guess of the true values for individual players. How could they be? If there is measurement error or other random errors that lead us to believe that one set of 600 PA is only an estimate of a player's true value, we should believe the same thing about any set of 600 PA, right?

Actually, the way I worded the question, by saying "1 million PAs", was a way to say "don't worry about random variations that will exist in all PAs, as they will cancel out, at least to the 3 significant digits after 1 million PAs." I actually don't know if 1 million PAs will give me something as less than +/-.0005 99.999% of the time, but whatever "1 million" should be, that's what I meant. I should have used the safe Austin Powers "100 billion".

Even if you knew that his true OBA was .368, that doesn't mean that in the next 600 PAs he will perform at .368, just that his performance will center around that, with some distribution around it. Just like flipping a coin.

Anyway, I'm using "true rate" and "future estimated rate, with five 9s, within .0005" interchangeably, even though, technically that is inaccurate.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 12:00 p.m., August 20, 2003 (#26) - tangotiger
  I don't think you are wrong on anything.

In essence, the more data you have the more able you are able to discern the true ability from past performance.

I agree with this statement especially.

And for the regression towards the mean, I agree that the larger your group, the better you will be able to estimate the group's true mean by applying the appropriate regression factor, but as the group gets smaller and smaller (down to a group of 1), your confidence in that regression becomes smaller and smaller. While you may guess that that single player's OBA would be .368, that's really just an average estimate that is centered around .368, that could be between .310 and .440, with various probability rates for each point. In fact, even that exact point (.3680000000) is impossible.

The likelihood is that JW IS a below .380 true hitter, but I might only be [insert appropriate number here]% sure of that.

And like you mentioned, the more data you have (whether more "n", or more description of each "n"), the more reliable your estimate is possible.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 12:41 p.m., August 21, 2003 (#28) - tangotiger
  A player will regress 100% towards his true mean.

A group of players will regress [insert number]% toward the population mean, from which they were drawn.

Regression towards the mean is the second case, and not the first.

We are trying to estimate the group mean as best we can by looking at the observed mean of the group, the observed mean of the population, and regressing a certian amount based on other factors (correlation between the two samples of the population).

If we already knew the individual player's true mean, we wouldn't need regression towards the population mean. We already know his true mean.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 11:01 p.m., August 21, 2003 (#31) - Tangotiger
  His 2003 performance is a sample of his true mean.

We cannot know ever a player's true mean. We jsut know that every single day he is performing at his true talent level, which differs day-to-day, based on his conditioning, state of mind, etc.

I was just pointing out that regression is towards the population mean.

There's no need to think of regression towards his own mean, since, by definition, he will always play to his own mean.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 7:38 a.m., August 22, 2003 (#33) - Tangotiger
  To answer David directly, yes it was silly of me to say that you regress 100% to your own mean, and I probably made that more confusing that it was. My post 31 hopefully makes that clearer.

As for the Carroll statement, I'm reading and reading it, and I'm not sure what he's after there. That there's a large luck component to getting injured, and that other than your personal history and maybe position, there's not much more to it than that? Probably.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 9:59 a.m., August 22, 2003 (#35) - tangotiger
  9791 PA is not enough, and our best guess is that his .482 OBA IS a little less, probably regressed 5 to 10% towards the population mean, or .470 to .475 or so.

A player will always play to his true mean for every play, and this mean will be different play to play. As his sample number of plays approaches infinity, his average performance level in those plays will approach his average true talent level over that time span of plays.

So, I should never have said the "100% towards his own mean". I just meant the above paragraph.

So, a player himself does not regress towards the population mean, but we regress his sample performance towards the population mean to infer as a best guess what his average true talent level was over that time span.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 4:03 p.m., August 22, 2003 (#37) - tangotiger
  The regression factor would be different for various events. For example, with OBA, the year-to-year r might be .70 for 600 PA, and therefore, you would regress 30% towards the mean. But, the year-to-year r might be .50 for BA, and so you regress 50% towards the mean. Each metric has its own regression factor.

My guess is that at 10,000 PAs, the OBA needs to be regressed between 3 and 10%, while the BA needs to be regressed between 5 and 15%, towards the population mean.

Rob, I'm looking forward to your results.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 6:51 p.m., August 22, 2003 (#39) - Tangotiger
  We regress a player's sample performance mean towards the his population mean as an estimate of his true talent mean, such that we minimize the error for the group.

A player's sample performance was done at discrete points over a certain amount of time (days, years, etc). His true talent level at those discrete points was not constant (since he is human... except maybe Bonds).

So, seeing that you know how he performed, seeing how you can account for his context, and seeing how the population (i.e., the average player) does in that context, you're now ready to regress a group of player similar to your player a certain amount towards the mean.

*****
A true .400 player, over 10,000 PAs will have a stdev of .005. So, that true known .400 player will be between .390 and .410 95% of the time over 10,000 PAs (hopefully I'm doing this right, going from true to observed and not the other way around.... been a long day for me too).

Get 100 times more PA, and your factor multiplies by 10, and so that .005 becomes .0005. So, at 1 million PAs, your true talent and your performance levels converge (at +/- .0005 95% of the time).

I think.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 10:39 p.m., August 22, 2003 (#40) - Tangotiger
  To confirm the above numbers, I ran a sim, where I gave my true .400 OBA player 10,000 PAs each season for 600 seasons. The standard deviation was .00495.

Our expectation was sqrt(.4*.6/10000) = .00490

So, 10,000 PAs is not enough to say that performance ~ true talent.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 4:54 p.m., August 25, 2003 (#45) - tangotiger
  Yes! You can use anything you want really, as long as you specify your criteria not based on the numbers you are measuring. That is, is Sosa a power hitter because the variable you are studying says so about him?

But, yes, you can select RF who swing hard and are bulky and make that your represenatative population, and draw Sammy from there.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 10:12 a.m., August 26, 2003 (#49) - tangotiger
  Rob, great stuff! I can confirm that between 500 and 2000 PA, those results are inline with empirical results of year-to-year r, with the regression towards the mean being 1-r. Great!


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 11:20 a.m., August 26, 2003 (#51) - tangotiger
  I think they only apply to OBA, if OBA is dependent only on the batter's skills. Therefore, I think you have to regress a little more towards the mean with OBA, alot more with BABIP, etc, etc.

For SLG, it's more complicated. YOu have to have successes/trials, and SLG won't work that way.



Solving DIPS (August 20, 2003)

Discussion Thread

Posted 12:09 p.m., August 20, 2003 (#2) - tangotiger
  I'm actually exhausted Erik! I've been putting off doing some other baseball stuff for 2 weeks, and I'm really happy with the way this thread unfolded.

Feel free to run more sims, but I don't think they're necessary at this point. They might be valuable if you do more breakdowns, like with GB/FB and Lefty/Righty and by the 7 fielding positions, etc. I think ARvin's equation works on independent variables, but I don't think that would apply here.

But, I think your work and Arvin's should shake things up!


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 4:04 p.m., August 20, 2003 (#4) - tangotiger
  It's an interesting thought, and I'll pass it along to the group.

One thing I did a few months ago was to cleanup my website by ordering things, so that it's more useful. Of course, since then, I've added a few more articles, but I have not updated the index to point to them. Story of my life. My wife has been after me to update the pictures of my baby on our personal site. I'm 7 months behind that too.

Many many times I've thought about doing a "best of" kind of deal, and putting things in one place. But like you are alluding to: time/money/work/family is a tough thing to balance.

I agree though that it's nice to have everything in one place, and I think that within 1 year, maybe less, I'll have consolidated everything I've done into something organized, if not in PDF/book format, at least in a "finalized" fashion.

Thanks for the idea!


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 10:26 p.m., August 20, 2003 (#7) - Tangotiger (homepage)
  FJM: check out the above link. It lists the UZR for all players, min 120 games over 4 years. Maybe you can take that, bring up the threshhold to 240 games or 300 games or something, and run your thingie again. I'd like to see the results against UZR.

I agree that there is greater variability at 1b,3b than ss,2b. I think my numbers bore that out (.022 or something for ss and .027 or something for 3b). It's reassuring that ZR showed something similar, but a bit higher (which we'd expect because ZR includes the park factor, and pitcher tendency/handedness effect, which UZR strips away).

Anyway, just eyeballing the UZR chart, and things do look normal, but I agree that you would expect at positions that don't tolerate bad defense to have a different skew. 3b is neutral-type of position, and so we should expect no skew, and wide variance. 1b we expect the skew opposite of SS.

Good stuff!

Andrew: thanks for the offer. I'm not sure what can be done. I usually work on impulse, and have a habit of leaving alot of things unfinished.


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 7:49 a.m., August 21, 2003 (#9) - Tangotiger(e-mail)
  I can send you the annual Team UZRs. Send me an email.

To translate runs into a rate stat, you divide by the number of plays per year at that position. For example, I think I set 1B at 2x162 and 3B at 4x162.

(Actually, I kind fudge a little: if a SS makes 3 of the 21 outs on BIP, and there were 28 BIP, I give him 4 "plays". It kinda keeps things in line, since each BIP doesn't belong to any one fielder.)


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 10:54 a.m., August 21, 2003 (#11) - tangotiger
  Hmmm... the batter. From the perspective of the pitcher, the true variance of the batter, and any random element, would be zero (I think I'm saying that correctly). Even something as substantial as the park has a stdev of .004, barely making a dent into the equation.

I don't think it's an issue in this case.


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 2:24 p.m., August 22, 2003 (#12) - tangotiger (homepage)
  Guys,

I just wanted to thank you all for this thread again. It's been a very big eye opener for me, and I enjoyed tremendously the work that Erik and Arvin especially put in, as well as the different perspectives of everyone who posted. This may have been the only DIPS thread where it was truly a pleasure to read everyone's posts.

I don't think I will be doing a summary of this summary. If someone would like to do it, feel free to jump in.

I've been trying to "bend the wand" for about a month now, but this great DIPS work really reeled me in. And other things that I've read on other topics around (like at battersbox and Clutch) have conspired to pull me in further.

Anyway, looks like the only way for me to stop procrastinating is to go cold turkey. So, after this weekend, I won't be stopping by for a while, or reading anything else online. If someone wants me to post some links in Primate Studies, I'll be glad to do so, but I won't offer any of my thoughts on the matter. I'll be back in time for the World Series in a limited capacity.

MGL and I have talked about maybe starting a site to preview our research, so maybe we'll have something worked out by then. You can join the group at the "homepage" link above to be on the mailing list.

Thanks again guys.... truly fun to talk with all of you.

Tom


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 6:37 p.m., August 25, 2003 (#16) - Tangotiger
  Remember the equation:

True variance (DER) = True variance (pitching) + True variance(fielding) + True variance (park) + True variance (hitting) + True variance (fill in the blanks)

We know that the true variance is .012 for DER. My guess is that the true variance for hitting, from the perspective of the pitcher, to be close to zero.

I'm pretty sure this is how we are supposed to look at it, but I'll defer to the statisticians.


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 11:01 p.m., December 26, 2003 (#18) - tangotiger
  This article is this week's "Oprah's Book of the Week", and required reading for anyone who missed it.


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 9:50 a.m., December 27, 2003 (#20) - tangotiger
  I'm drawn by the intelligence of the readers here... it's my vice. But, yes, I am once again (third time now?) wondering whether to take a break.



Making (some) sense of RBI (August 20, 2003)

Discussion Thread

Posted 3:09 p.m., August 20, 2003 (#1) - tangotiger (homepage)
  The battersbox article is at the homepage above.

Rereading it, I see that Elias Bureau does virtually the same thing as I do, though I'm not sure how they handle the HR issue.


Making (some) sense of RBI (August 20, 2003)

Discussion Thread

Posted 10:45 p.m., August 20, 2003 (#4) - Tangotiger
  (1) Where do you get the data for #1 and #2.

All Ray Kerby, my lord and saviour.... uhmmm, my saviour anyway. The query is easy. Put "basesit:outs" in the key field, and "n pa rbi hr" in the output field, and "1999-2002" in the years field. 2 minutes later, you look prolific.

(2) If its already written up what is the laborious process for controlling (b) and (c). [Skip this if its not written up]

It's not written up. For (b) you figure out the following. Saying you haev a guy with 600 PA, 150 H, 50 2b, 30 HR, 25 BB, etc, etc... how many RDI would this guy get if he were to get a normal number of opps in each of the 24 base out states? (You have to figure how many RDI a double with man on 1b/2b, 1 out gets for the league, etc, etc.) So, what that gives you is "if my player performed the same across the 24 base-out states, how many RDI would he have gotten?" How many did he get? That's his clutch. For (c) you have a little tougher time. How often did he have Raines at 1b and 1 out? How does a speedster like this do when a double is hit? etc, etc. Kinda complicated, but something along those lines. You may also want to check the Tom Ruane article on Joe Carter at www.baseballstuff.com

If one were to try and build upon this work and determine if there are skills for Ichiro's magic ...if the HR has a contextual value it still must be considered?

I'm only considering the base/out state first of all. If you hit a HR, regardless of the base/out state, that's 1 run. If you happen to hit a HR with 2 men on, that's 2 RDI. So, you'll get "credit" for the RDI with the timely HR. You just don't need to get the credit for driving yourself in in a "timely" situation, since driving yourself in has nothing to do with the base/out state.

I ask this not to be a smart ass, but instead to understand and or extrapolate my worldly view on baseball. If one were to accept that (1) some batters were to change their approach on the AB based on the dynamics of the game; (2) they have differing performance parameters based on their approach.

There's no question that every batter/pitcher matchup is unique based on the context (inning/score/base/out/park, etc, etc). Therefore, absolutely everyone changes their approach to some degree, each hoping to perform optimally, and most likely everything cancelling out. But not quite, and certainly not in all instances.

We do know that certain base/out situations, like certain counts, are "hitter's states" and "pitcher's states". So, batters can leverage say the bases loaded 0 outs situation. Maybe they overcompensate, etc. A look at the league totals at each base/out state will show you the direction that these matchups go.

I do believe in all that. I don't believe we can say with much reliability who does what though. We just know general tendencies as to what they are probably doing. Clutch ability exists, but is more elusive to find than a pitcher's skill at preventing a hit on BIP. You won't be able to find the numbers to state at a high enough statistical significance that a player is clutch, whether he is Eddie Murray or Manny Ramirez. In fact, once you look at Murray by the 8 base states (and count the SF as a regular out... VERY IMPORTANT!), you will see that Murray's entire cluch value is when the bases are loaded. You can probably show some significance there, but I don't think anywhere else.

I hope I answered your issue, even though I went off to some other place!


Making (some) sense of RBI (August 20, 2003)

Discussion Thread

Posted 10:47 p.m., August 20, 2003 (#5) - Tangotiger
  how many RDI would this guy get if he were to get a normal number of opps in each of the 24 base out states? (You have to figure how many RDI a double with man on 1b/2b, 1 out gets for the league, etc, etc.)

Sorry... I said that wrong. How many RDI would he get if he performed the same in all baseout states, given his actual opps in the baseout states?


Making (some) sense of RBI (August 20, 2003)

Discussion Thread

Posted 9:15 a.m., August 21, 2003 (#6) - tangotiger (homepage)
  By the way, I would only take this RBI thing so far.

If you want to get serious about it, I suggest you click the homepage link above. That was written 2 years ago, but applicable all the time.



CF Rankings (August 22, 2003)

Discussion Thread

Posted 1:56 p.m., August 22, 2003 (#3) - tangotiger
  This is the number of putouts by OF, for each team. Feel free to come up with "league averages" (it sure seems as though you have alot of NL teams under 1000).

TOR 1088
TEX 1066
TBA 1215
SEA 1172
SLN 1093
SFN 1148
SDN 1014
PIT 907
PHI 940
OAK 976
NYN 1046
NYA 1031
MON 989
MIN 1202
MIL 1083
LAN 1017
KCA 1097
HOU 926
FLO 1051
DET 1150
COL 1062
CLE 1054
CIN 1065
CHN 973
CHA 1081
BOS 983
BAL 1069
ATL 1057
ARI 979
ANA 1182


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 3:32 p.m., August 25, 2003 (#2) - tangotiger
  Great perspective Patriot!

FWIW, using on 2001 superLWTS, and setting 300 PA as the line between regulars and backups, I get, on a /680 PA:
regulars: +6 overall, + 6 batting, 0 fielding
backups: -22 overall, -21 battting, -1 fielding

The *players* that are replacement level (backups, or shades below backups) are *average* fielders.

There's no such thing as a replacement level fielder or replacement level hitter... there are replacement level *players*. A replacement level player turns out to be an average fielder.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 4:30 p.m., August 25, 2003 (#7) - tangotiger
  In terms of "value" for Edgar, I always look at it this way: "How would a baseline player do, if he was in Edgar's shoes?" From that standpoint, your baseline player will have no fielding contributions, just like your baseline AL pitcher has no hitting contributions.

I agree that Edgar theoretically limits the way the team is set up, in that if they had a truly horrible fielder, they couldn't hide him at DH..... except, have you looked at the way teams use the DH? It's not reserved to just the bad fielders. You'll get decent fielders in there. Edgar being in the DH doesn't really affect the way teams do their business.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 4:50 p.m., August 25, 2003 (#8) - tangotiger
  I agree with Patriot. BP should never have added it up as they do.

I mean, why stop there? Why not set the replacement level for batting, for stealing, for taking the extra base, for range, for throwing, for every facet of play? And then add it up.

The idea of replacement level is exactly what Patriot is saying: that a replacement level player = 0 wins = 0 (or 300K) dollars in salary. It's the minimum level of play in which you will get paid MLB dollars.

As I mentioned in my scales article a few months ago, *first* compare everything to average, and then, as a *final* step, compare to replacement. Do *not* have your intermediary steps do replacement as well.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 10:52 a.m., August 26, 2003 (#12) - tangotiger
  I looked at 1999-2002 UZR. I selected, by year, all players with at least 81 "UZR games" (treat that as "full" games), including if they had 60 games at 2b and 30 games at SS. Those are my regulars.

Then, by position, I figured the regular's UZR/162. I did the same for the backups. Here are the results.

pos Regular Backups diff
3 0.5 -1.8 2.3
4 0.8 -1.9 2.7
5 2.0 -5.0 7.0
6 1.7 -6.0 7.7
7 0.9 -1.5 2.4
8 1.2 -3.9 5.1
9 0.6 -0.9 1.5

So, we see that the regulars are slightly above average fielding-wise, at about +1 relative to all players at their position. The backups are -3 relative to all players at their position. That makes the difference, 4 runs, how much an average regular is better than an average backup, fielding-wise.

If someone wants to repeat this exercise for hitters (I'd HIGHLY suggest using LWTS) by position, that would be nice.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 9:02 p.m., August 29, 2003 (#15) - Tangotiger
  I don't know what "all BP" does. I'm just telling you what is on their site, and you can see the result by looking at Mike Schmidt.

Look at they do, and not as they say.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 12:56 p.m., August 31, 2003 (#18) - Tangotiger
  You have to regress their observed performance to establish their true rates. At 900 PAs, you probably regress about 25%, so that -35 runs would come in at -26 runs.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 12:58 p.m., August 31, 2003 (#19) - Tangotiger
  To put it another way, you selectively sampled your players by looking at their performance after the fact, and selecting on that. That's a no-no.

However, if you take ALL those players based on 2000-2002, AND THEN, tell me what their average performance was in 2003, then, that's the correct figure to use as your replacement level.

And, it'll probably come in at around -26 runs or so.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 4:45 p.m., August 31, 2003 (#21) - Tangotiger
  You can't selectively sample your group after-the-fact on the metric that you are studying. To combat this selection issue, you regress. Otherwise, your sample is tainted.

Why did you choose a PA cutoff then? Why not select ALL players, and take the worst runs / PA of the bunch? If you have a guy who's rate is -60 runs / 600 PA, but he did this after only 25 PAs, then so be it. I agree, ridiculous.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 9:40 p.m., August 31, 2003 (#23) - Tangotiger
  The reliability of a metric is always dependent on the number of trials, and not the number of successes.

If you flip a weighted coin, and you get 73 heads in 100 flips, is that 73 successes or is it 27 successes? 73 hits may be a success to a hitter, but 27 would be the success to a pitcher.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 7:57 a.m., September 1, 2003 (#26) - Tangotiger
  Yes, it would apply. That is simply your best guess.

In the case of 100 PA, you'd regress probably 70% towards the mean. So, if goes 100 for 100, and the league mean is .300, your best guess as to the true talent level that would produce such an observed rate is .510.

You can instead of using a weighted coin, you can use an unbalanced die, where the weighting of the die changes for every roll, but skewed towards say landing on 4,5,6. This would be like a human where his "true rate" changes PA-by-PA, but centered around something.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 4:51 p.m., September 3, 2003 (#29) - tangotiger
  I was thinking a bit about this. The problem is that we use regression towards the mean on rate stats, when I'm not sure that's entirely accurate, especially when you have distributions such as this.

So, I propose the following, with an example. Say the league mean is .300 and your 100 PA player is a .950 player. The regression towards the mean is set at .700 for a player with 100 PAs.

Let's break out our ratios.
.300 = .300/.700 = .429
.950 = 19.000

.400 = .667

With a .400 player with 100 PA, we would normally do a regression as 70% towards .300, or .330. With our new-fangled ratio method, that would become
new ratio = .667 - (.667 - .429) * .700 = .500
new rate = .5 / (1+.5) = .333 (as opposed to our previous .330)

With your .950 player and 100 PA, that becomes
new ratio = 19 - (19-.429) * .700 = 6.00
new rate = 6 / (6+1) = .857

I don't know if that even makes mathematical sense, but I find that my trusty ratios always come through in the pinch.

(That 70% regression for a rate might translate to 73% for a ratio, or something.)


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 4:55 p.m., September 3, 2003 (#30) - tangotiger
  Btw, your .980 player becomes a .936 player using this process. I think I may be onto something here. Maybe I should break out my stats books from 15 years ago.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 1:53 p.m., September 8, 2003 (#33) - tangotiger
  Joe, that looks about right, though I can't comment on what the replacement level for fielding that is used for those golden years. Assuming it's set the same way, then yes.

As for pitchers, there's no double-counting going on, though they should have their own runs-per-win converter. I think you already do this.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 3:49 p.m., September 9, 2003 (#34) - tangotiger (homepage)
  Michael: what you are reporting about what Gary told you is inconsistent with what Clay is reporting at the above link. BP is, to the best that I can tell, double-counting the replacement level. This, according to the WARP-3, makes Loiaza a very viable MVP candidate.

My note to BP from last month was left unanswered, and therefore, I will report a "no comment" from them.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 3:52 p.m., September 9, 2003 (#35) - tangotiger
  To clarify, if WARP-3 did not double-count the replacement level for non-pitchers, it sees Loaiza as a viable MVP candidate.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 10:10 a.m., October 31, 2003 (#38) - tangotiger (homepage)
  Clay says: ...an assumption that the ultimate replacement level team was the Cleveland Spiders of 1899, a combination of craptastic hitting, pitching, and fielding. That puts my "replacement level" player at a .130 win pct., significantly below the "freely available" threshold (which typically involes a .300-.350 win pct), but still above the "no contribution whatsoever" of Win Shares.

Is this reasonable? Is it reasonable that you can have crappy hitting AND crappy fielding from the same players at the MLB level, in today's day and age? Is this who you are trying to be better against?

The most reasonable baseline is that a MLB scrub non-pitcher is a bit over 2 wins / 162 GP worse than average (hitting and fielding). For pitchers, it's probably a bit under 3 wins / 27 full games. So a team of non-pitchers would be -2 x 9 = -18. A team of pitchers would be -3 x 6 = -18 wins as well. That's -36 wins from an average team of 81 wins, or 45 wins, or .278.

You can present data, based on your varying assumptions, that'll put the baseline at somewhere between .250 and .350 for a team. If you want to set the replacement level to .130, this would mean you have a team scoring 2.90 runs and allowing 7.75. I find that completely unreasonable.

***********

When the Tigers and the Mets and the Spiders get brought up as examples, I have to remind the readers about (a) the difference between a sample performance and a true talent level, as well as (b) the non-random distribution of talent among teams.

Taking that last one (b): there's some players on the 62 Mets and 03 Tigers that would not have seen the light of day on any other team.

As for (a): the stats of players are samples... SAMPLES.... OBSERVATIONS... of their true talent level. You simply can't take a player's stats and assume that that's representative of their true talent level, and therefore base a theoretical team from those stats.

If you've got a theoretical team of .300 talent, they will NOT play .300. They will play between .200 and .400. So, if you knew (which is impossible) that you had a team that is expected to win .300 over 1 million games, it's quite possible that they will play .200 over 162 games.

So, if you've got the 62 Mets and 03 Tigers or the 99 Spiders, you must, absolutely must, regress their performance to some degree to establish the true talent level of that team of players.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 10:26 a.m., October 31, 2003 (#39) - tangotiger
  Not to pick on Clay, since he's doing great work with translations, but his statement on Matsui:

...it sent me back to the drawing board with respect to Japanese translations, with Extra Special Attention paid to power. The result is a revised system specifically meant to deal with Japan, and not treating it like every other league in the States. If I had had these revisions in the spring, my forecast would have been more like .290/.375/.479 (22 HR) instead of the .290/.421/.567 (32 HR) we actually forecast - since his actual line was .287/.353/.435 (16 HR), that cuts more than half the error away.

PECOTA does the same thing, as just about every regression equation out there. You can't include your samples to establish an equation, and then use those samples to test against. All you are doing is best-fitting your samples, which is not necessarily predictive of data outside your samples.

If Clay did not include Matsui in his samples, then that's another story, and I'd have no problem with it... as long as the equations being developed had no knowledge of Matsui. If on the other hand, Matsui was included, then you can't talk about "cutting errors in half", since Matsui was part of the sample group you established the equations on.

As an example, John Jarvis shows, using regression, that the value of a double is .67 runs. This is laughable. And then, he shows the RMSE of all teams, and shows that a LWTS equation, with the .67 figure, comes out the best! Well, it was best-fitted to do so. The true test would be to best-fit say the 1974-1990 time period, and then test against the 1961-1973 time period and 1991-2003 time period.

For PECOTA, MLEs, and other translation systems, you should only use a certain percentage of the data, and then test it against the rest of the data.

(Forewarning: if you're going to comment that my comment is too "harsh" or I'm picking on anyone, then send me an email to that effect, and we can discuss it privately. I'm not going to debate this type of issue in an open forum any more.)


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 5:03 p.m., October 31, 2003 (#41) - tangotiger
  Well, John Jarvis then goes out and starts using that figure for other purposes.

As well, the regression value itself has a margin of error, which is ignored as well.


Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 5:17 p.m., November 4, 2003 (#45) - tangotiger
  We're slowing trudging along. The outline is all written, and the data is all pretty-well parsed for easy querying. The hard part is really managing our family (and for me work) lives with this project. And, I know that writing and presenting the report will take up 80% of the time. As for a "pay-for" website, we've discussed it, but still not sure yet if/when to implement that.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 10:02 a.m., August 29, 2003 (#5) - tangotiger
  I understand your issue with innings, but that can't be right, especially with men on base and with the score.

I would add that interaction you did with Innings to "SIT" and "Difruns". Even if it adds little to the overall predictive power, it will add *alot* to the overall predictive power for the 9th inning of a tie game with men on base.

(Can I guess SIT2=8 will be multiplied by zero?)


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 10:52 a.m., August 29, 2003 (#7) - tangotiger (homepage)
  Tremendous stuff Alan!

Again, doing only the work that you feel is worth your time, see if a model for any of the following interests you:

- 9th inning, score within 3 runs
- 9th inning, rest
- 8th inning, score within 3 runs
- 8th inning, rest
- 7th inning, score within 3 runs
- 7th inning, rest
- 1 thru 6, all

If you go to the homepage link, you will see that I have generated WE using a math model. Feel free to run your system against that if you want.

Again, what has been presented is excellent work, so thanks!


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 9:43 a.m., August 31, 2003 (#11) - Tangotiger
  Cool, thanks. No need to email, I can generate this on my own.

It's worth pointing out that mine is math generated assuming that both teams are equals at all times, with no HFA.

I would expect discrepencies, especially in the later innings where the pitching talent would change drastically.

Good stuff again!!

With your permission, I will reproduce my chart, along with yours (and the empirical provided by Phil), side-by-side-by-side, so people can see how things compare.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 10:14 a.m., September 3, 2003 (#13) - tangotiger (homepage)
  Here is the win probability chart that shows my math model, Phil's empirical data, and Alan's function.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 10:20 a.m., September 3, 2003 (#14) - tangotiger
  The largest discrepencies between mine and Phil's real data are the following:
Inning HomeAway Score Base Out Tom Phil Alan
7 Away 0 2nd_3rd 0 0.279 0.167 0.288
7 Away 1 3rd 0 0.517 0.619 0.587
7 Away 1 2nd_3rd 2 0.665 0.768 0.631
7 Away 1 Loaded 1 0.500 0.608 0.533
7 Home 0 Loaded 0 0.826 0.967 0.830
8 Away 1 1st_3rd 0 0.453 0.340 0.574
8 Away 1 2nd_3rd 0 0.410 0.300 0.534
9 Away 1 2nd_3rd 1 0.552 0.448 0.696
9 Home -1 3rd 0 0.593 0.457 0.410
9 Home -1 2nd_3rd 0 0.741 0.628 0.509
9 Home -1 Loaded 0 0.766 0.614 0.536

You get oddball results with Phil's data because of the sample size issue. For this reason, I would not rely too much on that data.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 10:44 p.m., September 3, 2003 (#16) - Tangotiger
  I mean that I don't give a HFA advantage in terms of a home team winning about 54% of their games.

But, tied in the bottom of the 9th for the home team is far better than tied in the top of the 9th for the home team. After all, if the visiting team scores a run in the top of the 9th of a tied game, the home team can still win the game. But, the home team scoring in the bottom of the 9th guarantees the win.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 12:11 p.m., November 4, 2003 (#12) - tangotiger
  If true, wouldn't that indicate they must be hitting FB at a higher rate in SF situations since clearly not every FB will lead to a successful SF?

If you have a batter that has 100 FB and 60 GB, and the league rate is to have a SF on 40% of all FB, then our above hitter will end up with 25 SF per 100 (FB+GB).

100 * .40 + 60 * 0 = 40 SF, in 160 outs, or 25% of outs are SF

If you have the reverse, 60 FB and 100 GB, then our batter is expected to have 15% of outs as SF.

I doubt you will find hitters that have a special ability in hitting SF, beyond what is known about their hitting profile. (long FB rate, FB/GB ratio, etc).

And you have the flip side with GB as well. It all balanaces out nicely, more or less.

The point is that the batter did not "sacrifice" himself. He alters his hitting approach to maximize his team's chances of winning. If that means he might gets slightly more outs, so be it. That doesn't mean that, after the fact, after you know he has a FB out and the runner scores, that you should remove that as an opportunity.

If you want to be "right" about it, remove the PA for all "men on 3b and less than 2 outs", regardless of the outcome (HR, H, SF, GB, etc).

With sac bunts, you *should* throw out all bunt ATTEMPTS where the batter TRIED to give himself up, regardless of whether he was successful or not. As it is, only successful SH are removed. This is another case where the result is irrelevant, and it should depend on the initial intent.

The sac bunt and the SF are not the same thing at all. In the former case, we know that the batter has the bat taken out of his hands, and into the manager's (like the IBB). In the SF, the batter changes his hitting approach (as they do for ALL 24 base/out situations).

Anyway, it's stupid to make the distinction in the official stats this way.

Just record what happened.

What I almost always do is throw out the IBB and Bunts from the batter's and pitcher's line, and track them separately, since the pitching/hitting approach are completely different.

If there was a preponderance of SF, where the batter would completely change his hitting approach to "force" a FB, then I would remove those as well. As it is, a SF is alot more a regular PA than a sac PA.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 1:44 p.m., November 4, 2003 (#14) - tangotiger
  Nothing is ever completely random. Just our ability to spot these things is dependent on the size of the sample. What you do is assume randomness to make life easy, but being aware that there's a margin of error in so doing.

******

I would count a "reached base on error" as a "safe" play in OBA, even though the official records gives one AB and no safe play for it.

A batter does "his job" by scoring the runner from 3B? Nope. The win probability in almost all cases says that the batter REDUCED the chances of his team winning. The obvious exception is when it's the winning run. Can't a batter do his job by moving a runner from 2b to 3b, while hitting the ball to the 1b or 2b? Or is the job only about scoring the run, and not moving the runners over?

All these things are so contextual, that you might as well just break out the batter's PA by the 24 base out states, instead of inventing rules as what constitutes a job.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 1:59 p.m., November 4, 2003 (#15) - tangotiger
  In 2002, with men on 3b and less than 2 outs, this is what happened:

AVG : .329
AVG (but including SF as an AB): .278

airout/groundout ratio (without SF): 0.45
airout/groundout ratio (without SF): 1.00

The air/ground ratio for ALL situations is: 1.00.

So, what REALLY gives a more honest representation of what happened? Do you want to exclude the SF from the airout to ground out ratio? Nope, I don't think so. Do you want to exclude SF from AB, since they are not really failed AB, but only so-so failed AB? I don't think so either. The .278 is alot more representative than what .329 is.

What it comes down to is that regardless of the batter/pitcher intent to the approach in the man on 3b and less than 2 outs, the results are better represented when counting the SF as an out in the air/ground ratio, and countint the SF as an out in AB.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 2:01 p.m., November 4, 2003 (#16) - tangotiger
  That should obviously read as:
airout/groundout ratio (with SF): 1.00


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 7:51 a.m., November 5, 2003 (#18) - Tangotiger
  Why not remove singles where the runner is out trying to stretch into a double?

Why not remove singles where the runner is stranded on the bases, and never drove anyone in?

Why count the SF as an unsuccessful opp in OBA, but don't consider it in batting average?

Why treat a BB the same as a SF with batting average?

You are trying to separate the SF from the other outs, while not doing the same thing with the hits and other events.

Anyway, I'm bored already.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 10:04 a.m., November 5, 2003 (#19) - tangotiger
  I don't it is credible to say that a team is more likely to win with one out and a runner on third than they are with that runner scored and two outs.

I spoke too spoon. I was thinking about some other study I ran.

Anyway, is it better to have the man on 3b and 0 outs, or bases empty 1 out and a run scoring (assuming average batters all-round)?

If it's the home team and you have 0 outs:
- The SF is ALWAYS preferable if you are ahead.
- It is also preferable with the score tied in the 3rd and later innings (the closer to the 9th, the more preferable).
- It is also preferable being down by 1 run in the 7th and later innings

If you have 1 out:
- all the above applies, plus
- being down by as much as 6 in the 1st, 5 in the 2nd/3rd, 4 in the 4th, 3 in the 5th, 2 in the 6th


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 5:09 p.m., November 5, 2003 (#21) - tangotiger
  Because on base percentage measures how often a batter gets on base per plate appearance. If you want to change AVG so

Finally! My definition of batting average is:

"number of times batter reaches (but not limited to) 1B safely on a contacted ball in play, without forcing a runner out" divided by "number of times batter contacts a ball in play"

I know that's not what the rules say, but that's me. I might change the "contact" to "non-bunt contact".


Sabermetrics Crackpot Index (August 29, 2003)

Discussion Thread

Posted 9:50 a.m., August 29, 2003 (#1) - tangotiger
  Thanks to Andrew Clarke.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 11:40 p.m., September 6, 2003 (#1) - Tangotiger
  I tried posting this at batters box, but couldn't.

****
I just came across this.

Runs are created on a game-by-game basis (getting a runner on in April won't help you win a game in June). So, your run evaluator should be based on a game-by-game basis.

As for why I didn't test on a seasonal basis, as pointed out elsewhere, because of the incredible clustering of teams to the mean, virtually any half-decent run measure will be acceptable. All that that means is that any deviations will be masked by the 90% of the teams that are close to the mean.

However, when I selected games with 3 HR each, and then grouped them together, that gives you a few hundred or whatever games. So, instead of trying to select 100 teams with 180 HR or whatever, I've given you essentially a couple of teams that hit HR at the pace of Babe Ruth! (And of course, I game teams of HR at the 0 level, 1 level, 2, 3, 4...).

If nothing else, the one major point of BaseRuns, the one thing to keep in mind at all times, is that the HR does not generate runs the same way that the other events do. The more baserunners you have (at some point), the less valuable the HR. That's the takeaway from BaseRuns. That the HR does not have the ever-increasing value that all "multiplicative" methods says it does, or the always-stable value that all "linear" methods says it does. It's value increases to a point (around an OBA up to .350 to .400), AND THEN, it diminishes.

But, since no team actually exists at that level, then who cares?

But pitchers ARE their own teams, and you should care about that.

To evaluate hitters, custom-generated Linear Weights is probably the best thing to use.

Thanks for the interesting discussion!


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 9:50 p.m., September 7, 2003 (#4) - Tangotiger
  There are 3 reasons why Coleman, Willie, Raines, etc score more run per time on base than McCovey and his ilk:

1 - They are faster (this adds about +/- .04 runs / time on base, if I remember my research)

2 - They have better hitters behind them (#2 through cleanup, as opposed to #5 thru 7)

3 - They leadoff more, meaning they get on base with 0 outs more, meaning there are more PAs opportunities to drive them in


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 9:52 p.m., September 7, 2003 (#5) - Tangotiger
  The average IBB is virtually win-neutral based on research I published a few months ago. I did not look to see whether the IBB to Bonds specifically are win-neutral as well, but they probably are.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 9:58 a.m., September 8, 2003 (#7) - tangotiger (homepage)
  Ross, you can go to my site, and look for the link on "Batting Order". In there, I have MGL's run expectancy matrix by batting order and league (but only for 1999 I think). I have much better data with more years, and I'll be doing alot more with it sometime in the upcoming months, and they will happen to address your issues here, which are all legitimate.

In fact, the reason I started that batting order thread at fanhome was because I believed that Rickey Henderson and Tim Raines were being ripped off because of their skills were optimally suited for the leadoff spot, but all run evaluation methods were not given them that credit. I.e, they are able to leverage their particular skills more in the leadoff spot, than others would. This impact, for Rickey in particular, I think amounted to almost 1 win per season. You can reasonably add a whopping 10 to 15 wins to Rickey's career simply on the fact that his skills were ideally leveraged in the leadoff spot.

Considering that a HOF is about +30 to +40 wins above average for their career, this +10/15 thing is an enormous impact that is simply not quantified by any other sabermetrician (but is probably intuitively recognized by the average fan).


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 10:29 a.m., September 8, 2003 (#8) - tangotiger
  Ross, by the way, MGL's superLWTS *does* take into account the "taking the extra base" performance of players. You can check it out. If I remember right, Juan Pierre and Derek Jeter do quite well.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 3:07 p.m., September 8, 2003 (#10) - tangotiger
  I think you are on the right track, but we all should separate runs from wins. Since each component has its own runs-to-win conversion ratio, it makes little sense to compute a run evaluator using IBB, and then converting that overall runs to wins.

What you want to do is figure out the win value for each component.

Now, you make a great point, and I'll reiterate here: the win-neutral value of the IBB is for the GIVEN PLAYER and not a league average player.

That is, if the win expectancy in a given state is .764 with Bonds at bat and pitching to him and .761 with Bonds being IBB to face Santiago, then, the IBB is worth NEGATIVE .003 runs ... adn this is the important part... relative to Bonds himself. So, if Bonds is +.010 wins / PA above average in his "pitched-to" PAs, then in this particular IBB PA, he'd be +.007 wins / PA.

If we assume that managers sometimes walk Bonds when he shouldn't and walk him when they should so that overall they are win-neutral PAs *to Bonds*, then you would do the following:

Compute Bonds' runs above average excluding IBB and convert to win above average. Say that works out to +80 runs or + 8 wins over 400 PA, or +.02 wins / PA.

Suppose that he gets 100 IBB. He gets credit for +.02 x 100 = 2 wins above the average player (or zero wins above himself).

His new wins above average is +10 wins over 500 PA (including the IBB).

Remember, the IBB is win-neutral relative to the player at bat, but not relative to the average player.

Great comment Colin!!


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 3:08 p.m., September 8, 2003 (#11) - tangotiger
  Colin, I'm rereading what you said, and you said it better than I did.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 3:56 p.m., September 8, 2003 (#13) - tangotiger
  Suppose you have bottom of the 9th, home team down by 1, man on 2b, 1 out. With everyone in the game an average player, the chances of the home team winning is .296. Suppose though that with Barry Bonds at the plate, the chance that the home team will win is .370. (I didn't check what it is, so let's go with that.) So, do you walk him or not?

Well, the win expectancy for bottom of 9th, with men on 1b and 2b, and 1 out, and down by 1 is .351.

So, insofar as the visiting team is concerned, walking Bonds is worth -.019 wins to the home team.

But, to Bonds himself, he turned a .296 situation if he was not the batter into a .351 situation because he was the batter after the event completed. That is worth +.055 wins for Bonds' IBB.

Because managers probably walk Bonds and INCREASE the chance that Bonds' team wins, we can guess that the win expectancy before a Bonds PA and after a Bonds PA to be virtually the same, following an IBB.

It's a win-neutral event to the visiting team, but a huge win-gaining event for Bonds himself.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 4:50 p.m., September 8, 2003 (#15) - tangotiger (homepage)
  Colin,

I'll be working on that in the upcoming months (my guess is that it is win-neutral by batter), but in the meantime, you may be interested in the above link.

Tom


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 6:49 p.m., September 8, 2003 (#17) - Tangotiger
  No, what he is saying IS consistent with what I am saying. The defense is better off walking Bonds (they gain say +.02 wins in the process).

But, from the perspective of Bonds, Bonds alreay gains +.20 wins just for being in the batter's box. By being handed 1B in that situation, his worth is now +.18 wins instead.

It's a question of which perspective you have, the offense, the defense, or the batter.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 7:02 a.m., September 9, 2003 (#19) - Tangotiger
  Yes!

Just like Pedro would be less valuable if the opposing manager would be allowed to have him replaced for one batter when Pedro allows a runner to get on base, and replace him with the mop-up guy. (Not THAT bad, because Bonds does get to go to 1B.)

How much impact is this? I don't know, but it might be a bit. I did publish the "Win Probability Added" a few months ago, and Bonds' numbers were NOT out of this world (though they were pretty incredible and tops in the league), for 1999-2002.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 3:28 p.m., September 9, 2003 (#21) - tangotiger
  Be careful between your use of runs. The .125 is that absolute runs, or marginal runs?

The IBB has a marginal run value of .17 runs or so for the average player from the perspective of the team, and probably including Bonds. The win-value of the IBB is win-neutral from the perspective of the player, as discussed.

The NIBB has a marginal run value of about .32 runs for the average player, and probably for Bonds as well. Though, my guess is that for Bonds, because he probably gets alot more NIBB with 1B open, that the walk is worth less to Bonds (maybe .28 runs or something).

Now, suppose you have 2 equals, and we'll call them Pujols and Bonds. But one of them gets IBB alot, and the other, not as much. In the cases where Bonds can do alot of damage, he gets IBB. But, Pujols gets pitched to, and as a result can create more wins than Bonds in the exact same situation.

That is, if we have that late and close situation where walking Bonds will have a win expectancy of .35 and facing him will be .37, they walk him. But Pujols, they face him, and he makes them pay... to the average win expectancy of .37.

Like it or not, Pujols has now impacted his overall PAs more than Bonds (assuming they were equals to begin with).

So, yes, it does make a difference, if the managers are approach players in non-optimal ways.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 4:19 p.m., September 9, 2003 (#23) - tangotiger
  Robert is reporting marginal runs. In fact, you should ALWAYS talk about marginal runs in cases like this. ALWAYS.

****

Let me think about the rest of your post.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 2:58 p.m., March 18, 2004 (#34) - tangotiger
  For pitchers, you want BaseRuns per out (akin to ERA).

For hitters, you want LWTS per PA.

The formula shouldn't change based on era, though I have not yet tested whether the best-fitted 1974-1990 BsR matches that of 1991-2003. I'm sure it would be quite close.

***

Also, be careful on using the equation with missing data. Any fudge factor you apply can ONLY be applied to the "B" component. As Patriot rightly pointed out, Clay Davenport did NOT do this for BsR, thereby making BsR looks worse than it should have been.

The idea behind BsR is very simple. As the OBA approaches zero, the run value of the HR approaches 1, and the run values of all other events approach 0. As the OBA approaches 1, the run value of all non-out events approach 1, and the out event approaches infinity.

BsR is the only model that adheres to these known constraints. And, to boot, it's as accurate as anything out there in the "regular" MLB run environments.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 10:12 p.m., March 18, 2004 (#37) - tangotiger
  Those are good points. The fudge should go wherever the data is lacking. If HBP is lacking, then of course you need to fudge the A as well.

How you fudge is not clear. I fudge as a function of PAs, or estimated PAs. I don't think that the HBP fudge should be a function of H+BB. IBB might be a function of BB. I guess you'd have to run a regression to figure those things out.



By The Numbers - Sept 7 (September 8, 2003)

Discussion Thread

Posted 3:31 p.m., September 9, 2003 (#8) - tangotiger
  Keith Woolner has some good data at BP from a few years ago. Do a search for "Lumina", and you should get it.


By The Numbers - Sept 7 (September 8, 2003)

Discussion Thread

Posted 3:35 p.m., September 9, 2003 (#9) - tangotiger (homepage)
  Actually, the data you want can be found at the above link from diamond-mind.com .


By The Numbers - Sept 7 (September 8, 2003)

Discussion Thread

Posted 5:25 p.m., September 17, 2003 (#13) - tangotiger
  That Jai Alai equation makes no sense from what I'm looking at.

Plug in a .600 team against a .500 team, and the result SHOULD be .600, but it is nowhere close to that.

As it stands now, the best method to use is the Odds Ratio method. Maybe I'll put up a Javascript program so that people can use it.



Livan Hernandez and Scouting (September 10, 2003)

Discussion Thread

Posted 2:33 p.m., September 10, 2003 (#2) - tangotiger
  Suppose that two pitchers named Orlando and Livan have been pitching rather so-so (to the eye and performance-wise).

The pitching coach comes up to both of them and exclaims "I know what you are doing wrong!" They practice the two new pitching mechanics, and the pitching coach is satisfied that they both have adjusted properly, and should be more effective.

For the next 7 starts each facing a total of 200 batters, Orlando shows no change in his performance numbers (always striking out 15% of his batters), but Livan does improve his numbers (say from striking out 15% of his batters to striking out 25% of his batters).

You also have a third pitcher, say Pasqual, who always strikes out 15% of batters, but, without changing anything, struck out 25% of his next 200 batters.

Question to the Primate statisticians:
1 - what is your best guess as to Livan's true K rate (assume lg of 15% if you need that)?
2 - What is your best guess to Orlando's true K rate?
3 - What is your best guess to Pasqual's true K rate?

Please provide confidence levels and margin of error.


Livan Hernandez and Scouting (September 10, 2003)

Discussion Thread

Posted 3:52 p.m., September 10, 2003 (#4) - tangotiger
  Interesting approach. I'll give you my thoughts tomorrow, as I'd like to see how others would approach this.

I do agree with your last statement, and it is for this reason that I *do* believe that scouts serve a purpose, the extent of which is yet to be established by the public (though may been established privately).


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 11:51 a.m., September 12, 2003 (#1) - tangotiger
  Patriot tried more best-fit equations for BsR and RC and he is reporting the following

For RC:
1st--24.89
2nd--23.02
3rd--22.60
BsR:
1st--22.66
2nd--22.48
3rd--22.44

So, RC vaults from worst to 3rd best, and BsR jumps to best.

Like I said, all we are doing is best-fitting. It doesn't prove anything.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 1:39 p.m., September 12, 2003 (#4) - tangotiger
  Cool, good stuff!

What I wouldn't mind seeing (if not from Patriot, from some aspiring sabermetrician) is using the 1974-1990 data as the "sample" data to fix all your equations. What's good about this is that I already give you what the "plus 1 method" true value to fix against (at the bottom of the article in article 1, or at the bottom of the page of article 3, which links to the BaseRuns addendum). You can limit it to the fields you've been using (ab,h,2b,3b,hr,bb,sb).

Once you've fitted all the equations against this data, you then apply it to the 1961-1973 and 1991-2002 data.

As Patriot is starting to show here, I would guess that BsR would come up with better estimate than the best-fit linear equation, and probably anything else.

What's really cool about the time periods I am showing is that they each have their own pecularities, and so, should be a good test against extreme-type team-seasons.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 3:24 p.m., September 12, 2003 (#7) - tangotiger (homepage)
  Great stuff again Patriot!!

You will find the "absolute" (along with the "marginal") event values of empirical data from 1974-1990 at the above link. The CS value is something like -.28 runs.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 4:00 p.m., September 12, 2003 (#9) - tangotiger (homepage)
  You may be interested by the great work by Tom Ruane, who uses the "runs value-added" approach, on a PA-by-PA basis for all players from 1980 to 1999.

The next step is of course the Mills' brothers approach on WPA. That will come from me next year, unless someone else beats me to it.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 11:07 p.m., September 12, 2003 (#13) - Tangotiger
  I agree with Robert's general sentiment. Let's get it right first, and let others worry about how accurate they need something.

As for the CS, please note the difference between an "absolute" method and a "marginal" method. When the out value is set to -.10 runs or thereabouts, you are employing an absolute method. When the out value is set to about -.27 runs, you are using a marginal method.

The same applies to CS. -.28 runs? Absolute. -.45 runs? Marginal. Check out article 2 of "How runs are created" for more on this.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 10:15 p.m., September 13, 2003 (#18) - Tangotiger
  All of Patriot testing shows that there is an enormous number of teams where there is not a large distinction between them. Essentially, most of the teams are .320 to .340 OBA and .390 to .420 SLG (or whatever).

So, what his testing shows is that all these run estimators "work", not for any logical reason, but simply because every team in the sample group also has a matching team in the testing group (more or less).

However, to extend these things beyond your sample group, to pitchers like Gibson for example, you need to be grounded in logic. And for that, you need a non-linear interdependent model. And that can be generated using a math model or sim or custom-RE matrix (and looking at change-in-states). Or, you can use the thing that most closely matches what you really want: BaseRuns, or the custom-LWTS generated from BaseRuns.

Can you use ERP or XR or LWTS or RC or EqR or ....? Sure thing. As it turns out, while run creation is non-linear interdependent, you can assume a linear independent process and you will be pretty close (say within 3 runs / 600 PA for a hitter).

Much ado about nothing for most people. But, if someone says that EqR or RC basic LWTS is fatally flawed, you can't argue with that either.

The proper thing would be for say Bill James or Clay Davenport to say "hey, this is how accurate it is... it won't work for these kinds of teams or players.... so, it's up to you to decide if this is good enough".


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 3:32 p.m., September 15, 2003 (#20) - tangotiger (homepage)
  I made a comment at battersbox.ca at the above link, and again 2 posts later.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 7:19 a.m., September 17, 2003 (#22) - Tangotiger
  I have not done any of that work, but the pitcher thing is I think the most worthwhile to pursue.

The results of that will make it clear as to the relevane of BsR compared to the others. And, since we've got pbp for the last 30 years, we'd be in great shape to get all the data we want at the pitcher level.



DIPS bookmarks (September 13, 2003)

Discussion Thread

Posted 11:32 p.m., September 14, 2003 (#6) - Tangotiger
  Charlie, I would guess it would not matter.

For example, I think MGL showed that there was a 2 to 3 run difference max / 600 PA with the quality of opposing pitchers for each batter. That is, 1 SD = 1 run / 600 PA. In there, it includes HR, BB, K. I would therefore guess that 1 SD would equal about .5 hits / 400 BIP, or 1 SD = .001.

1 SD for the park variation is .004, and the fielding is probably .007, and the pitching is .009.

So, I would guess that the hitting variation would be virtually insignificant.

Just an educated guess though...


DIPS bookmarks (September 13, 2003)

Discussion Thread

Posted 10:44 a.m., September 15, 2003 (#8) - tangotiger
  Remember, as long as the opposition hitting distribution is random, then we don't have to worry about it.

That is, you are trying to answer the following question:
"What is the true variance of the opposition hitting, over and above luck, that would produce the observed variance?"

If the observed variance is exactly as would be predicted by luck, then the true variance of the opposition hitting is zero, and we don't need to consider it as a variable. Don't forget that we are looking at the pitchers as a group, and we are not trying to pinpoint the effect on any single one pitcher.

I think that's right.



Clemens' turnaround? (September 15, 2003)

Discussion Thread

Posted 5:11 p.m., September 15, 2003 (#1) - tangotiger
  And of course, win-loss records are very heavy team-dependent. Don't we all know this already?


Clemens' turnaround? (September 15, 2003)

Discussion Thread

Posted 10:57 p.m., September 15, 2003 (#3) - Tangotiger
  I just used one of my quick things. I looked for all pitchers with a K/9IP and B/9IP that were within .7 of Clemens, and between 600 and 900 IP over 4 years, and at the age of 31-34.

Certainly not an exhaustive study, nor the best ones. But, I'm simply providing some data that Mr Edes did not.

First, Edes was wrong in calling Clemens' 4 years "mediocre", since the other pitchers I've shown had quite good 4 years. Secondly, it wasn't such a great turnaround. Clemens was already starting from sky high, and he became very good, and then followed that up with sensational to great.

The other comps did not start so high, but they also continued to have a few more good years after those similar "mediocre" years.

And to say "100 years" as if Edes actually looked into it? All these comps were within the last 10 years.



Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 9:27 a.m., September 18, 2003 (#2) - tangotiger
  My personal preference is to present TWO figures,
the "Wins" and "Loss" or
"Runs scored" and "Runs Allowed" or
"Player x" and "Average"

and then let the reader decide how to manipulate the numbers.

If you want win differential, fine. If you want 2*wins - losses, fine too.

As Patriot noted, we each have our own objectives and questions to answer. Providing two numbers allows all those objectives to be answered individually, instead of having the Win Shares or TPR model imposed on us.


Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 10:50 a.m., September 18, 2003 (#4) - tangotiger
  I agree with Michael's assertion about the comprehensiveness and balance to the article.

The tier-ed approach is also excellent because it attempts to model reality, and I'm a big proponent of that line of thinking.

I also agree with the quote of Patriot that the "playing time issue" and the "negative /positive" paradox is not limited to a .500 baseline but to any baseline. This was well-described by Patriot.

In terms of paying someone money, you'd pay someone the minimum to perform the minimum. Kinda like a college graduate coming in as a stage at my company. Any marginal performance above this marginal player gets money at a multiple of this marginal performance.

(You can of course get a non-linear relationship, but I don't believe in that, unless you factor in playoffs.)

Therefore, what a team pays is based on overall team performance. If you lose Vlad, you have a chaining process so that the team won't be as bad as replacing all his PAs with a schlub.

A team pays based on the marginal change to the overall team, but crediting that change to the variable that changed.

It's a fascinating topic, and as Patriot points out, there's no 1 right answer. From this standpoint, Pete Palmer and Bill James should listen and read Patriot's article.


Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 8:28 a.m., September 19, 2003 (#9) - Tangotiger
  What if the true FAT line is .340? .330? .300?


Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 10:26 a.m., September 19, 2003 (#11) - tangotiger (homepage)
  I just want to point people who may not have seen it to the above link. It's my theoretical work (with some empirical data to support it) on the talent distribution in MLB and around the world.

I think it's easy to see that while the ideal line might be at 80% of MLB average, that the non-uniform distribution of talent at the team and position level would make it a very non-stationary line.

Again, depending what you want, using an average baseline is perfectly fine. Going forward, as Patriot noted, all you want is a rate stat. You just need to know that this player is a "101" and that player is a "98" and it's irrelevant that the average is "100" or that the minor leaguer is a "75" or whatever. 101 is better than 98.

Going backwards, the 101 might have contributed 1.1 wins and 1.0 losses, while the 98 might have contributed 3.9 wins and 4.0 losses. Playing time is a consideration. While the 101 did contribute more than the opponent that he actually played against, the 101 contributed LESS than the opponent that he did NOT play against (because he was on the bench at the time). There's an opportunity cost in sitting down and not playing, and that's in letting someone else, presumably worse than you, play.


Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 11:48 a.m., September 19, 2003 (#13) - tangotiger
  This assumes that the replacement line is fixed, whereas the likelihood is that the replacement line is centered around some point, say 80% of league average, with a distribution around it, of say 1 SD = 3%.

So, the question is: what is the probability true talent distribution of the .345 player in 500 PA? You might say that it is centered at .355, with 1 SD = .020. We can never know for certain what his true talent level is, so the best you can do is come up with a distribution of what his true talent probably is.

Then, you ask the same question about the .355 player in 10 PA. Maybe it's centered at .370, with 1 SD = .050. You are less certain of his true talent level, and therefore, you distribution is much wider than the first guy.

Now, overlaying on these 2 distribution is the "replacement-level" true distribution. And again here, we don't have 1 fixed point. The true level might be .350, with 1 SD = .01.

Finally, the question you can ask is: what is the probability that the first player is above a replacement-level player? In essence, what the chance that a .345 in 500 player (or a true .355 player, +/- .020 = 1SD) would "win" against a .350 +/- .01 player?

And you ask the same question of the .370 +/- .050.


Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 8:05 a.m., November 12, 2003 (#17) - tangotiger
  Bringing this forward for those who missed it.


Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 10:24 a.m., November 12, 2003 (#19) - tangotiger
  Herman had three times more value above his actual .500 opponent than did Myer

is meaningfully different from saying

Herman was three times the player that Buddy Myer was

The first one is a relative scale, and the second one is an absolute scale.

You cannot, just cannot, perform your division/multiplication on a relative scale and think it's going to give you anything meaningful.

Compare -1 celsius to +1 celsius. Compare -1 runs to +1 runs. Compare +.0001 runs to +1 run. Compare +1 runs to +10 runs.

Why in the world would you try to do +1/.0001 ? Or +1/-1?

Now, if you had one player being "101" and the other being "99" (where "100" is average), then you'd be on firmer ground.


Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 11:19 a.m., November 12, 2003 (#22) - tangotiger
  I agree on the issue of Palmer's intent. Palmer is wrong about his intent, and James is wrong for blasting wins above average, because of Palmer's intent. Just because Palmer misused it doesn't mean that the whole framework is wrong.


Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 1:34 p.m., November 12, 2003 (#24) - tangotiger
  Patriot, I reread your article again. Just an excellent piece!

Thanks for the clarification from the Palmer quote. He makes perfect sense there. I'll guess that David's Palmer quote was probably some "rush statement" he made, similar to stuff James would say in ESPN Chats.


Fanhome's Dackle: World Series Odds (September 18, 2003)

Discussion Thread

Posted 12:05 p.m., September 18, 2003 (#8) - tangotiger
  Good job Joe!

Here's his list in "pecentage" format. You'll note the juice adds an extra 38%. Ahh, to be a bookie.

NewYork 25.0%
Oakland 20.0%
SanFrancisco 20.0%
Atlanta 14.3%
Boston 14.3%
Minnesota 11.1%

Chicago 7.7%
Florida 6.3%
Houston 5.3%
Philadelphia 4.8%
Chicago 3.8%
LosAngeles 2.0%
Seattle 2.0%


Fanhome's Dackle: World Series Odds (September 18, 2003)

Discussion Thread

Posted 12:22 p.m., September 18, 2003 (#10) - tangotiger
  We also should remember a few things that would change the odds drastically:

1 - Your top 3 starters have a higher % of innings in the playoffs than regular season

2 - Your top 2 relievers have a higher % of innings (and high-leverage innings) in the playoffs than the regular season... I think Mariano Rivera has something like 80 innings in 87 Yankee playoff games

3 - Certain types of hitters, though we haven't established which ones, may be less able to optimize their abilities against higher level of pitching

4 - The park affects every player (hitter or pitcher) differently. While in a season, the road parks balance all this out, more or less, it's not necessarily the case in a short series (say Ted Lilly at Fenway).

Because of the non-random nature of the contexts faced by the players, the "true talent" level of each team might vary greatly in the "playoff universe".

And finally, injuries.


Fanhome's Dackle: World Series Odds (September 18, 2003)

Discussion Thread

Posted 1:35 p.m., September 18, 2003 (#11) - tangotiger
  And by the way, kudos to Nate Silver for putting this up:

DISCLAIMER: Because this analysis does not take into account head-to-head matchups, it may be less reliable from this point in the season onward.

It's good to see analysts establishing the boundaries of their work to the readers.


Fanhome's Dackle: World Series Odds (September 18, 2003)

Discussion Thread

Posted 5:08 p.m., September 18, 2003 (#16) - tangotiger
  Primer policy has: Comments ... may be removed...if the comment ... really does nothing to move the conversation forward.

It's (unfortunately) rarely exercised, but I thought the volume (especially by me) was just too much. I'll be happy to send anyone the exchange.

In the words of the great Leslie Nielson: "There's nothing to see here. Please move on. "


Fanhome's Dackle: World Series Odds (September 18, 2003)

Discussion Thread

Posted 5:20 p.m., September 18, 2003 (#18) - tangotiger
  The big differences are all at the top:

YANKEES 23.2 ... 16.6
GIANTS 15 .... 21.9
BRAVES 13.8 .... 24.8
ATHLETICS 12.5 .... 14
REDSOX 11 .... 7.8

I think the Yanks have a bias for whatever reason. The A's+Sox are 23.5% on the one side and 21.8% on the other. So, it comes down to the Giants+Braves being so much higher with dackle.

So, the question to ask first is:
Is the AL/NL talent even that you can do as dackle is doing? Or are some NL teams getting the benefit on beating up on worse teams than the AL does? Or are the AL teams so evenly matched that no 1 team can really stand out?

The other thing to remember is that if the Giants/Braves are really as good as suggested by their W/L record, compared to the Yanks/A's, etc, then it's no surprise that a team from the NL will be more likely to win than a team in the AL.


Fanhome's Dackle: World Series Odds (September 18, 2003)

Discussion Thread

Posted 10:18 a.m., September 19, 2003 (#20) - tangotiger
  I'm sure it does not.

I would think that the best way to do these odds is to use Diamond-Mind baseball or some similar game, where the random creation of injuries, or the occurrence of injuries based on past history can be incorporated into the game.

As well, since pitcher usage changes in the playoffs, again, you can incorporate that as well.


Fanhome's Dackle: World Series Odds (September 18, 2003)

Discussion Thread

Posted 12:50 p.m., September 19, 2003 (#22) - tangotiger
  Can someone take the top 5 NL teams and top 5 AL teams, and see how they did against "same competition"?

That is, lump ATL, SF, et al into "NL leaders", and find their records against their AL opponents. Weighting only by the games against the AL opponents, how did those 5 unweighted AL leaders do?

And vice-versa. W/L, RS/RA would be nice.

Anyone, anyone? Bueller?


Fanhome's Dackle: World Series Odds (September 18, 2003)

Discussion Thread

Posted 3:50 p.m., September 19, 2003 (#24) - tangotiger
  Just to add some data, NL teams are .544 when facing AL teams.

That is, an average NL team will win .544 of their games against an AL team. An average AL team will win .500 of their games against an average AL team.

RS/RA would have been better to use, but I no got.

1 standard deviation is about .030, so this may be completely due to luck.



TheStar.com - Analyze this: NBA '04 (September 19, 2003)

Discussion Thread

Posted 11:06 a.m., September 19, 2003 (#3) - tangotiger
  Funny thing, Andrew. I've been doing the goalie stuff for a few years now. I've got a system in place that I think is pretty simple and logical. Same thing for plus/minus for players. I've got "sim scores" for players as well.

But, seeing my time management skills aren't up-to-speed yet, I'm not sure when I can deliver on this publicly, if ever.


TheStar.com - Analyze this: NBA '04 (September 19, 2003)

Discussion Thread

Posted 3:41 p.m., September 19, 2003 (#5) - tangotiger
  I agree that soccer won't translate as well, since you would get enough sample to do your testing on.

The way to do the plus/minus thing is virtually the same as you would do with strength-of-schedule. I would call this "strength-of-context" (SoC).

Jason Kidd + player1 + player2 + player3 + player4 + opp1 + opp2 + opp3 + opp4 + opp5 + Home/Away = 14 pts for + 11 pts allowed over 30 minutes

You do this for every single combination of teammates and opponents. And for every player. It simply becomes a mathematical problem.

I agree you lose sample size at this level, but there's ways around it. (I wrote to a basketball exec asking him for the pbp files, and I'd do it for free, but no dice. Funny isn't it? I would imagine if I told him I'd do it for 10,000$ that he would take me more seriously.)


Pitchers, MVP, Quality of opposing hitters (September 19, 2003)

Discussion Thread

Posted 3:33 p.m., September 19, 2003 (#2) - tangotiger
  Here's data to support your conclusion:

***

Looking at all hitters with at least 400 PA, the standard deviation of their opposition pitcher's OPS is .009.

Looking at all pitchers with at least 400 PA, the standard deviation of their opposing hitter's OPS is .017.

Concentrating only on those hitters and pitchers with 400 to 600 PAs, the standard deviations are .010 and .022, respectively.

Pitchers are much more likely to be influenced by their schedule than hitters are.


Pitchers, MVP, Quality of opposing hitters (September 19, 2003)

Discussion Thread

Posted 10:34 p.m., September 19, 2003 (#3) - Tangotiger
  By the way, the reason that pitchers will be more influenced is not necessarily the distribution of their opposing teams, but rather than there is a larger spread of true talent among hitters than pitchers on a per PA basis.

If someone wanted to, they can figure out the stdev on OPS for hitters and pitchers with at least 400 PA. I don't think it will be twice as much for hitters, but it will probably be close to it.


Pitchers, MVP, Quality of opposing hitters (September 19, 2003)

Discussion Thread

Posted 3:33 p.m., September 20, 2003 (#6) - Tangotiger
  Charlie,

Suppose that pitchers did not have a home park, and instead randomly played at a park for each start. They won't pitch exactly once at each park. Some parks they won't pitch in, and others they might pitch 2 or 3 times in.

As long as the distribution of where they pitch can be explained by random chance, then we don't need to consider the park factor. I think that the Central Limit Theorem would apply (though don't quote me on that).

The same would be the case with their opposing hitters. As long as the distribution can be explained by random chance, then the TRUE VARIANCE (which is what we are after) would be equal to zero. Now, I grant you, that the opposing hitters might not be due to random chance, and there is something at work here. However, I would guess that we are talking about a true variance of .001 to .002. I would be surprised if it's any higher than that.


Pitchers, MVP, Quality of opposing hitters (September 19, 2003)

Discussion Thread

Posted 6:43 p.m., September 20, 2003 (#8) - Tangotiger
  My guess is that the #3 and #4 hitters have an LI of 1.05 to 1.10.

As for 2B/SS, I don't think you'll find much spread in talent there compared to 3B or CF, fielding-wise.

I agree that it is tough for a pitcher to keep up, but it is not unreasonable to think a great year from a pitcher is equal to an almost great year from a hitter. Pedro and Gooden and Maddux and RJ easily equal the greats, regardless what other analysts say. Loiza/Halladay? Maybe not, but they are in the top 10.


Sabermetrics >WIN SHARES bibliography (September 19, 2003)

Discussion Thread

Posted 10:30 p.m., September 19, 2003 (#4) - Tangotiger
  I think Sean Smith has the basketball win shares somewhere...


Sabermetrics >WIN SHARES bibliography (September 19, 2003)

Discussion Thread

Posted 11:35 a.m., September 29, 2003 (#11) - tangotiger
  Rally: if you want, I can post it up, though I think "studes" at baseballgraphs.com might want to post it.



Instructions for MVP (September 22, 2003)

Discussion Thread

Posted 4:19 p.m., September 22, 2003 (#3) - tangotiger
  The genius of BBWAA is that by not making things stone-cold, it opens up the debate every single damn year, making this debate probably the most debated topic after Pete Rose.


Instructions for MVP (September 22, 2003)

Discussion Thread

Posted 12:40 p.m., September 29, 2003 (#14) - tangotiger
  Value in a loss:

It depends what the objective of the player is. I look at the objective of the player in "trying to help his team towards winning that game".

From that standpoint, having Ted Lilly pitch a 1-0 no-hit loss would qualify as having alot of value. He kept his team in the game as much as he could. Therefore, I measure value from the "win probability added" perspective, where you measure the change in theoretical win expectancy after every discrete event, and attribute (somehow) the change to the players involved.

However, people can view it from "playoff probability added", or "playoff probability added in games won", or whatever. Again, you can really make the definition however you want it, since it's not clearly defined as to what value is.

My only problem is peop