Individual Poster Page

See copyright notice at the bottom of this page.

June 26, 2002 - tangotiger (www) (e-mail)

In a study I did in March 2001 (which included the hitter's last year, but used a much larger sample of players): hitters improve their walk ratio virtually every year, they strikeout the least at age 29, get their best HR ratio at age 27, their balls in play success goes down almost instantly, their line drive power stays pretty flat for a long period of time, their speed as measured by triples goes down instantly, their speed as measured by SB peaks at 24 and goes down almost at the same rate as the triples.
My intention is to eventually re-run that study but with the new information I've discovered recently regarding the last year effect.
A thread with all the data can be found here - http://baseball.fanhome.com/forums/showthread.php?threadid=662692#post1958322

Aging Patterns

June 26, 2002 - tangotiger (www) (e-mail)

As to your other question on different aging patterns for different types of players, I also took a look at this a while ago. My sample set was pretty small, and so, I wouldn't want to make any strong conclusions based on it, but the evidence was showing that all types of players age the same way. The Tim Raines class of runners would lose his abilities across the board (SB, HR, Hits) to the same extent that the Wade Boggs class of runners would. This is another area that I will be (eventually) looking at.

Aging Patterns

June 26, 2002 - tangotiger (www) (e-mail)

I usually only look at 1919 and later because I don't think that "power" is well-represented in the pre-1919 time period. Even HR are not true representation of "power", but it's still pretty good to use.
In that March 2001 study (which I'll reiterate used a slightly different methodology), I found virtually no difference between the aging patterns of the various skillset between 1919-1979, and 1979-1999.
In that study, I concluded the following: "...the historical averages match up very well with the recent period. While today's ballplayers may be better, and playing longer, the "curve" of their aging is the same. There is no age bias with today's regimen of training and medication. It affects all age groups the same."

Aging Patterns

June 26, 2002 - tangotiger (www) (e-mail)

jmac: yes, I agree that you have other forces at work. This is why I first presented that large chart breaking up the performances by number of years in the league (for players debut at age 25).
What I can do is present a similar chart for players debut at age 22, 23... etc, so that we can determine a more specific pattern. The only reason I did not do so is that it would be so overwhelming that the reader wouldn't know where to begin. As well, this kind of analysis suffers from sample size issues, and so conclusions will not be reliable. Let me think how I can best present such data.

Aging Patterns

June 27, 2002 - tangotiger (www) (e-mail)

In other words (and more simply), age is calculated as of Dec 31st.
I was wondering when someone would say that. Yes, I simply make Age = Year - YOB . Not only is it a snap to calculate, you also don't need to know them month the player was born.
Do you know that batting averages ...follows a normal istribution?
I seem to recall looking at this a long time ago to determine how many single-hit and multi-hit games a player would have, if it did follow such a distribution. And it did.
Do you stick to that in strike years, when 300 PA were harder to come by?...
Yes, on all counts. It's not as if I can say that the reliability of 300 PA in 1982 is similar to 200 PA in 1981. 300 PA is 300 PA. In the cases like this, my sample size goes down somewhat. My other option would be to limit my sample set so that 1981 is not part of the study. I can do this for pairs of seasons, but at the extent that I looked at this issue, I needed to have 5 and 10 and 15 consecutive years. Removing 1981 from such a study would drastically reduce my sample size. Your point is valid however.
I do think the league and park adjustments should be done, so that there is more confidence in the conclusions.
I agree. As my sample size goes down, these adjustments become much more important. As for followup articles, I think I'm going to have to make it a whole series of them because there are so many results that this dataset will give us.

Aging Patterns

June 28, 2002 - tangotiger (www) (e-mail)

One of the previous posters mentioned that players who play longer will have different aging curves than those who don't.
Working always from the same sample, I broke my main sample into three subsets. The first subset is those players who play between the ages of 24 (or earlier) and 33 (or later). This is a group of players that have at least 10 years experience, and who have had the chance to play during the "traditional" peak years. The second sample are those players whose career was over by the age of 31. Therefore, these might be those guys who you think might have peaked earlier. The third sample is everyone else. This means, it's a group of players whose career was over after the age of 31. These are just a whole bunch of different types of guys, but skewed slightly towards the older players.
Because of sample size issues, some of the results might look strange. In any case, here it is:

Age # Long # Short # Rest
19 2 0.565 1 0.810
20 5 0.687 4 0.871
21 25 0.816 15 0.921 3 0.936
22 61 0.857 36 0.939 9 1.025
23 99 0.930 81 0.999 16 1.007
24 140 0.936 131 0.987 21 0.972
25 140 0.982 167 0.994 79 0.929
26 140 0.989 191 0.994 119 0.973
27 140 1.000 181 1.000 166 0.991
28 140 0.987 141 0.994 185 0.989
29 140 0.981 80 0.966 203 1.000
30 140 0.979 218 0.969
31 140 0.977 162 0.980
32 119 0.954 114 0.976
33 88 0.938 80 0.951
34 58 0.930 55 0.945
35 39 0.885 35 0.930
36 27 0.887 20 0.905
37 17 0.856 11 0.937
38 11 0.842 4 0.906
39 5 0.809 1 1.014
40 4 0.790 1 0.942
41 2 0.672
42 2 0.563

Hopefully, the formatting comes out here.
What do we see? As you would expect, those players with long careers centered around the expected peak years did just that. They had their best years at the 25-29 level, peaking at age 27. They had a bit of a jump prior to that. Then they had a slowly declining phase to age 34, after which they plummeted. However, don't forget we specifically selected our subset for players who played between the ages of 24 and 33. Therefore, we should not be surprised to see demarcation points close to these ages. Furthermore, the peak point was still age 27. The slowly declining phase is a result of the selective sampling.
The second set of players is far more interesting. These are players, for whatever reason, had their careers over by the age of 31. These players did not have the traditional aging curve. Essentially, they stayed at a peak level between the ages of 23 and 29. Is it possible that there is a class of players that don't get to the next level? That perhaps, there is a class of players that peaks at age 23 and stays there? Or is this again a bias in our sample? That because we specifically chose our players whose careers ended prior to age 31 then this is exactly what we expected to see? This is a much more likely result. Management did not give the players a chance to show their stuff, and simply cut them before their truly good seasons could be shown.
The last set of players has a bias to be an older type of player, and the results show this. These players peaked between the ages of 26 to 32, peaking at age 29.
Selectively choosing your sample set leads to many biases.

Aging Patterns

July 1, 2002 - tangotiger (www) (e-mail)

Do you feel the Linear Weights Ratio is a much better measure of offensive performance than OPS? If so, what evidence do you have? If not, may I suggest you use OPS+ in your future studies? On June 26, I posted the following on fanhome "[OPS]'s extraordinarily useful and practical because: - it's readily available - it's made up of the two most important rate stats we have - it's highly correlated to runs scored - it can be used in research when you have the power of sample size that masks its deficiencies
It is NOT useful because: - you can not count on it for game-level decisions - you can not count on it to evaluate players of weird profiles - it does properly weight all the events
So, depending what you are trying to do, OPS is either a godsend or a bane.
The reason I hate it is that people use it for the exact reasons that it is not useful."
In sum, OPS is used as a stand-in when you don't have something better. LWR is something better.
I'm not sure why I need to show "better" evidence to use LWR over OPS. All the deficiencies of OPS are taken care of in LWR. I can use LWR to convert it into Runs Created or Runs over average, or really anything, in a simple one-step process. LWR is the best "rate" measure we have. (If you want more discussion on LWR, you can check out my site at http://www.geocities.com/tmasc/lwr.html which will give you the full formula, as well as a link that discusses LWR.)
******** Not only will this take care of the adjustments for park and year which you indicate are necessary, but it will make it easier to incorporate your findings with other studies that are OPS+ based...
I have the data to calculate the park/year adjustments, I just didn't want to add another layer of complexity. (If someone wanted to reproduce the above research, they could. If I added park and year adjustments, they couldn't.) As I indicated, I'll add that layer the next time. All studies that are OPS+ based are flawed for the reason that they rely on an indicator that has deficiencies that are circumventable.

Felipe Alou: Is He Afraid of the Walk?

November 13, 2002 - tangotiger (www) (e-mail)

I know that Walker hated the way Felipe would talk to him about hitting approach.
The first poster is doing exactly what I said we shouldn't: looking at team totals.
However, the first poster is correct that Alou does have a choice from within the 15 hitters which ones to play. But Alou was not dealt a good hand. If you're given the bottom of the barrel, you should expect to have low walk totals period.
The team was selected by the GM, and therefore, the low team walk totals is more an indication of the type of team that the GM has selected.
I'm going to continue the analysis tomorrow, looking at it from another angle. I don't know what the results will be, so I'll report whether they favor Alou or not.

Banner Years

October 31, 2002 - tangotiger (www)

Good comments, guys. I actually meant to address these two issues, and I'm glad you brought them up.
Age: definitely should be looked at, but I can tell you that there is no big age bias, even with the 149 group. I will do the breakdown, hopefully before the end of the day.
Banner selection: one of the considerations I had was that I did not want to select players that were say 110-110-110-140, because of the "regression towards the mean" issue I brought up. That is, even though you've got a guy who you think is a 110 level for three years, he might actually be 107 or 113, etc. The closer you are to 100, the more likely it is that this is a 100 player. Furthermore, by introducing all players, then I get into trouble with losing players. While it is unlikely that you will lose a player from the pool who is 100-100-100, it is very possible that you might lose a player who is 80-80, thereby introducing a bias. Of course, this also depends on position.
Having said that, I was thinking about running through the data anyway, and see what happens. And with the much larger sample size this would allow me, I can select 35% above "previous 3 years" to really highlight the banner years. I'll try to get to this next week.

Banner Years

October 31, 2002 - tangotiger (www)

Walt, good comment at the end, and this is exactly what I did with the HR study I linked to. And rather than seeing a "retention", we see essentially that what the player did the year after the banner year was repeated the year after that as well.
Again, what I am talking about is not "retention", even though I used that term. We are presuming that a player's performance level is a sample of his true talent level. Therefore, by selecting 130-100-100, I am choosing those players that had a great year followed by 2 average years. This does not imply that this player had an injury or something that forced him to go down to 100. The more years I tack on, the smaller my sample size. You are correct that I can simply show year1, select on years 2 through 4 (whether 100-100-130, or 130-100-100), and then look at year 5. My guess is that year 5 production will be only slightly different than year 1 production, age notwithstanding. This is a good idea, and I will run that next week as well.
Walt: any comment about the Hank Aaron issue?

Banner Years

October 31, 2002 - tangotiger (www)

MGL, yes, I agree with almost everything you said. Two points:
1 - yes, not my best writing work, as I wrote it in 30 minutes, but what is it that is unclear? Was it the weighting thing at the end? It basically means that you put more weight on the most recent year, and you have some weight to regress towards the mean. Or was it something else?
2 - As for the 149,149,149, which I selected for, the 4th year was 142. However, you then say that this group is actually a "147" group . Not! Because my group is "fairly large", then I would say that this selected group of 149,149,149 is a 142 player. And, if I looked at the 5th year, I would bet that this group would also exhibit 142. I would also bet that the year prior to 149 would also be 142. I would say *every* year around the 149 years would be 142. Do you agree? (Age of course is an issue if I start going crazy and start to consider pre-24 and post-36, etc years.)
However, for a single player, if I had a 149,149,149,142 player, since I didn't select such a player, then I would have to guess that he is a 147 player.
I think we are on the same page, but I'm not sure.
*** As for parks and changing teams, etc, yes that is always a problem. It's "possible" that the park may play an influence in the selection of my players, but I doubt it. The banner year was 25% above the base years, and so, while playing at Coors does increase the chances that he will be selected in the banner year, I don't think this is the case. I'll look into it though.
*** By the way, the more I look into this, this is just like MGL's hot/cold streak study. While he is looking at 15-day periods, I am looking at 3-year periods. We are (or will in my case) looking at the pre-selected and post-selected period, and we are (or might/will in my case) finding that those two values pretty much match, regardless of the intervening period.

Banner Years

October 31, 2002 - tangotiger (www)

First off, I'm not trying to capture ALL banner years, just some of them. As well, I am not suggesting at all that 149,149,149 is banner performance. I am using that type of player to show that a 149,149,149 player is not in fact a 149 player but a 142 player.
So, when you look at a 130,100,100 player, a player that certainly had a banner year, we should treat the 130 with some hesitation, since, as we've seen, this performance was "lucky" in some respect.
****
Anyway, I've re-run, so that we have "x", 149,149,149, "y". That is, how did the players with 3 great years do just before the "banner 3 years", and just after? Here are the results
Year 1 1.42 Year 2 1.49 Year 3 1.49 Year 4 1.49 Year 5 1.41
This population of players had 593 strings.
Now, if we break it down by age (in Year 5), this is what we get: Age 1 2 3 4 5 n
34+ 1.46 1.51 1.51 1.48 1.37 173 30-33 1.44 1.49 1.49 1.47 1.41 229 29- 1.36 1.46 1.49 1.51 1.45 191
Again, as you can see, the "3 selected years", were pretty constant around that 149 level. The before/after years are consistent with the age grouping. But, in all cases, the before year was less than the selected period, even for the old guys.
There is also about an annualized 2% change in performance level between year 1 and year 5, which is also consistent with my findings in aging patterns previously done.
So, the "true talent level" is year 1 and year 5, and everything in-between is "lucky".

Banner Years

October 31, 2002 - tangotiger (www)

MGL, sorry for the bugaboo.
To go back to your question, let me amplify. The 149 performance is regressed 14% towards the mean to match the "expected probable" true talent level of 142. So, generally speaking, we should regress all 3 year performances by 14%.
Now, of the three remaining components (year x, year x-1, year x-2), we weight the most recent seasons (x) as 38%, and the other two as 24% equally.
As a shorthand, rather than remembering kooky percentages, you can apply integer weights of "3" for "x", "2" for "x-1","x-2", and "1" for "mean". Maybe I should have skipped this part, as it's probably more confusing than it should be.

Banner Years

November 1, 2002 - tangotiger (www)

Good job, MGL!
The mean of the players who played in those 5 year spans, with at least 300 PA is 115%. Now, this may sound like alot, but don't forget, we have alot of repeating players in there (like Aaron).
I don't think that the regress towards the mean would regress to 115%, but I'd like to hear from the statistics-oriented fellows about their thoughts on this matter. I would guess at this point that the Aaron situation comes up, and I should identify unique players only.

Banner Years

November 1, 2002 - tangotiger (www)

Since age is an issue, and I can easily control for it, I will re-run using that.
As well, the "mean" of the players is 115%. If we look only at one age group for the 5 year period (say ages 26-30), we see that each year they average 115%. If we select any other time period like 24-28, you also get similar results. And of course, no player could possible exist more than once in each age group. Therefore, the mean is 115%.
Therefore, I should probably select players that center around 115%, and that center around the 27 age group. I'll get to this next week.

Banner Years

November 1, 2002 - tangotiger (www)

MGL, maybe you missed my last post, but if I only look at one 5-yr period, say ages 24-28, then of course Aaron can only exist once in this string. And, the players in this group are 115% of league average. Now, if I select some other age group, the unique players in that group are also 115%.
However, if I decide to combine the two groups, I might have two Aarons, and two Ruths, etc. I don't see why I would want to remove one of them from the groups.
I think it would be easier to keep all the age groups separate (24-28, 25-29, 26-30, etc, etc) and report on each one separately. This removes the conflicting players, but addresses the Aaron issue. However, I don't see the problem in then combining these three age groups afterwards, AND KEEPING the mean at 115%.
Or maybe I'm missing something?

Banner Years

November 2, 2002 - tangotiger (www)

Contrarian: I've already admited my shortcomings in many areas, including statistics. I've taken enough that I can follow conversations, but that's as far as I would take it. I also know enough to apply the basics. This is no news to people who've been reading me, and any of my comments should be taken like that.
I am always interested to hear from Walt Davis, and frankly I just missed his second post (the way Primer regenerates the site, there is a lag, and Walt's post got sandwiched in-between).
I have no problems with people criticizing my approach, or my comments, or anything I do. It would be nice though if you would provide an email address so that we can correspond privately, and you can elaborate further.

Banner Years

November 4, 2002 - tangotiger (www)

Sancho, thanks for the links. The first one I had not seen, and is a not bad one. As for Albert, I'm frankly disappointed. There's a long list of math professors who have tackled baseball issues, and really either miss something, or write so dry that I miss something. (Of course, there's an even longer list of sabermetricians who miss some math issues as well.)

Banner Years

November 5, 2002 - tangotiger (www) (e-mail)

Shaun, I agree age should be taken into account, and I'm currently working on this. I should have something to show as soon as I get the time (which these days is not too much).
As for contract status, certainly this would have an impact. However, by having an aggregate of players, this impact should not be noticeable too much. And of course, since my data is from 1919 onwards, there's an even smaller population which would even be affected by this at all.
As for learning and improving, etc. This is the issue. Is it the case that the player is learning and improving, or is it simply random chance that the player happens to have a banner year. Hopefully, with the new data I have, we'll have a better answer.

Banner Years

November 7, 2002 - tangotiger (www) (e-mail)

MGL, no F James specifically said that these 149,149,149 players would not regress, except for aging. In fact, they do regress to 142.
This group of players will regress towards THEIR mean, I agree. In fact, they will regress 100% towards their mean. But since we don't know what their mean is (without looking at other non-sampled years), we take the next best thing: the mean of the population they were drawn from. This mean is in fact 115%. Therefore, given the number of years (3), the number of players (I don't remember, let's say 100), and the number of PAs (let's say 500 / player / year), the best players will regress 7/34 (20%) towards the mean of the population they were drawn from. Different years, different # of players, and different # of PAs will regress differently.
Now, I know little of statistics, and perhaps Walt Davis or Ben V can put this matter to rest.
I'll be back in a week or two with detailed data, broken down by age.

Banner Years

November 8, 2002 - tangotiger (www) (e-mail)

F James: I think I wrote this already, but it might got lost with all of MGL's explanation, but the year before the 149,149,149 string was 141 and the year after the string is 142. Subsequent years drop off slightly from 142, and in fact matches what you would expect from normal aging. (This will become more clear when I do the breakdown by age... eventually, whenever that is.)
Essentially, MGL's point boils down to: whatever period you take, how many ever players you take, how many ever PAs that performance makes up, you have to regress to some degree. The amount to regress is related to the variables I just mentioned. By choosing 1 day, we are regressing almost 100%, by choosing 5 years of performance between age of 25 and 29, and in each of those years the player has 1 google PAs, you regress close to 0%. Everything in-between is subject to more analysis.
Given my sample of 3 years of 600+ players of about 500 PAs, the regression of the 149 player is 20% towards the mean of 115 to achieve the true talent level of 142 (more or less).

Let's Contract Two Different Teams

July 12, 2002 - tangotiger (www) (e-mail)

Proofreader guy: you know, I read and reread and re-reread my article, and it amazes me what I miss. How about "here" for "hear", and "marker" for "market"? Competitif is french for "competitive", so I don't think I can use the french excuse.
Common Sense: do you think that if Steinbrenner reduces his payroll from 140 million to 90 million that he will give that 50 million$ of savings to you? In fact, don't you think that now that he set up the YankeeNets that it will be very easy for Steinbrenner to claim much less revenue because the YankeeNets corporation owns the Cable rights, and not Steinbrenner?
If teams claim that they can't play in the same playing field as the Yankees, then either level the playing field by introducing teams into a lucrative market to siphon off some of that revenue, or take some of that Cable money, or realign the two leagues by market size. Let the Yankees and MEts and REdsox and Braves and Dodgers spend themselves crazy. Let the A's and Expos and Royals and Twins spend smart.
To think that by controlling player salaries that you will get an outcome that is different from today is ludicrous. Nothing is going to change. In 5 years, you'll be right back to where you started.

Let's Contract Two Different Teams

July 14, 2002 - tangotiger (www) (e-mail)

There's no question that we are introducing accountants into the fold with the owners' plan. As if lawyers aren't bad enough. How many white collar solutions do we have to introduce to "solve" the problem?
Just re-align based on market size. 4 divisions of 8 teams. The top team of each division goes foward, while the 2nd and 3rd place go into a wild-card system where the 2nd place of Divison 1 plays 3rd place of Division 4, etc. There's no need to force a socialistic solution. Just change dance partners.
There's no need to overhaul anything. If you want to overhaul, then disband the league, and do it right.

Let's Contract Two Different Teams

July 15, 2002 - tangotiger (www) (e-mail)

Common sense: it seems that you've been getting more and more common sense. How much longer before we get Commen Sense the third?
Seriously, when I say to "contract" the New York teams, I intended it to be in a humorous note. But the point of contracting the teams is to reposition the power that is highly concentrated in the New York teams. Since Steinbrenner is consolidating and hiding his power and revenue in a second enterprise (that exists only because of the first), it is unlikely that he will reduce the market value of his interests.
Why would 29 intelligent men buyout a franchise that has limited value (Expos) when they can buy out a franchise that has substantial value (Yankees). Steinbrenner used the system to its fullest, he capitalized on it with the unanticipated TV value that has created the great divide. Everyone has his price. So, buy out Steinbrenner at fair market price, and redeploy the value of the Yankees by siphoning away the cable and TV value, and selling the rest of the team to an interested buyer. That is, buy Steinbrenner's TV and cable rights away from Steinbrenner.
If that is too hard or too expensive to do (as if maintaining the status quo does not have its own expenses), then just take the "barnstorming" idea to something more palatable. Put the Yanks, Mets, Redsox, Dodgers, Braves, Orioles, Rangers, and Cubs in the "Division 1" league. Put the Expos, Pirates, Brewers, Reds, A's, Marlins, Devil Rays, and Blue Jays into the "Division 4" league.
What would happen? Well, all those Division 1 teams will soon realize that they can't hope to buy their way in because they've got too much competition for too few spots. They'll have to be smart. The Division 4 teams will realize that with just a little effort and smarts, they'll have a decent chance to make a run for the playoffs.
Once in the playoffs, anything can happen (especially if you make the first round 5 games instead of 7).
Without spending a single dollar on either side, we can reshape the entire competitive balance by simply changing divisions.
And what's more shocking: that I say to redistribute the wealth of the Yankees to the poor teams, or Selig redistributing the wealth of the Expos to the rich teams?

Let's Contract Two Different Teams

July 16, 2002 - tangotiger (www) (e-mail)

Willy Loman is all in favor of the American dream. And I didn't say to steal it from the Boss, but buy it back from him. MLB made a huge error in not securing the TV rights the way the NFL did. Now they've got to pay for it. Literally. Once they do that, the chips will fall into place. But to restrict player salaries through non-American ways? I don't think so.

Let's Contract Two Different Teams

July 24, 2002 - tangotiger (www) (e-mail)

I think the soccer relegation/promotion idea is viable. But I question the 30 teams/league decision. The disparity will still exist. Why not have a 12 team premier league, 24 team division-1, etc, etc. Which just brings us back to my proposal of having leagues segregated by market size, but having ALL of them play for the World Series. By having each league have its own championship, the fans will question the legitimacy of any except the World Series.

Let's Contract Two Different Teams

July 24, 2002 - tangotiger (www) (e-mail)

As for pay for performance, why not simply limit contracts to 1 year? And make everyone a free agent? That would make it truly free market. You'd end up paying rotisserie style prices (about 15% to your top player), because of the abundance of supply. So, a team with a 60 million$ payroll will pay say Mike Piazza 9 million$. A-Rod would have a tough time getting more than 15 million$.
So, we have a mechanism that can severely limit top players' earning potential. All owners have to do is declare everyone a free agent, and no more guaranteed contracts. Too hard. It's like going to the Playboy mansion and being told you have a chance with 1 girl, and 1 night only. The owners want control, and they want to feel empowered. It'll cost you.

Let's Contract Two Different Teams

July 31, 2002 - tangotiger (www) (e-mail)

It would turn exactly into a rotisserie style system. The top guy would get at most 20% of the payroll. In any case, it doesn't matter how much the #1 guy gets. It's the overall payroll that matters. Players will be willing to sign for below market in some cases, simply because they don't want to be left out.
Teams will have their budgets before the bidding starts, and they won't try to run up prices, because of all the other fish in the sea.
Owners need help controlling themselves, and this is the best way. And if they overpay? So what, it'll only be for 1 year. You won't have all those 5 year contracts guaranteed to worry about.

Forecasting 2003

February 13, 2003 - tangotiger (www) (e-mail)

I still have not decided how to "rank". With only 32 players, using differentials or RMSE might not be the most appropriate (esp with the Bonds thing). I could create "classes of differentials" (consider each class to be 1 SD of error, and max out at 3 SDs or something like that). Or I might use differentials, while capping the individual differential at 3 SDs. Really, it's not important. I'm going to present the full data, and the reader is free to analyze and interpret the data as well.

Forecasting 2003

February 13, 2003 - tangotiger (www) (e-mail)

...retrospectively, using the same methodology, at last year or other prior years?
I guess surprises are out of the question around here! Voros has looked at the various forecasters for the year 2000 hitters . I was going to also add in the "baseline forecast" to his list to see how that stacks up. Stay tuned in a couple of weeks.
When do fantasy drafts usually occur? The last weekend in March?

Forecasting 2003

February 13, 2003 - tangotiger (www) (e-mail)

I probably should have said this in the article.
If I were to ask the Primer readers to estimate 200 players ERA or OPS, I'd get a smattering of response. By limiting it to something reasonable (32) I hope to get a decent participation, while at the same time getting reasonable (though not conclusive) results from the forecasters. This is similar to what the WSJ does with using the top 10 picks from the brokerages. The intent is not to prove anything. I also selected those 32 players who showed the most deviations, and therefore, we'd expect the forecasters and the Primer readers to have little agreement on these.
I have also asked the forecasters to participate in a second parallel study, where they would submit the projections for a large number of players. I've only received a positive response from 2 of them. This is essentially what Voros did with his study, except he did the hard work by compiling everything himself. I can understand that the forecasters don't want to give everything away (which is why it was easy to ask them for only 32). I hope though that by the end of the season, they'll give me their list, so that I can save some work. So, you'll get the study that you are looking for, plus the other readers will have some fun (I hope) as well.
I hope this answers your concern.

Forecasting 2003

February 13, 2003 - tangotiger (www) (e-mail)

Yes, included with the ballot for the 32 players will be your estimate of MLB OPS and ERA (which will default to the 2002 level if you don't choose anything).
This is critical because if a forecaster underestimates all his projections, it doesn't matter, as long as you only use his system. Therefore, that's not a bad thing.
Really, I wanted to ask everyone to submit their OPS/lgOPS, but that loses too much meaning.
Great question!

Forecasting 2003

February 13, 2003 - tangotiger (www) (e-mail)

By the way, if anyone has a systematic forecasting system, then send me an email. It could really be based on anything, like
- weighted or unweighted recent performance
- lefty/righty splits
- gb/fb tendencies
- comparable players
- age 
- height/weight
- position
- regression towards mean
- injury history
You don't have to tell me how your engine processes everything, but just what/how does the engine consider. I can then throw you into the systematic forecaster pool. Thanks...

Forecasting 2003

February 17, 2003 - tangotiger (www) (e-mail)

Just to reiterate (or maybe iterate, since I was not very clear), the point is not to figure out who has the best forecasting system, but rather if a systematic forecasting system is any better than a baseline or back of the envelope (card) system.
What the WSJ study shows is not that the Lehman brothers have a better forecasting system (hard to say with only 10 stocks) but rather that the mom&pop do better using a baseline (the S&P500 index) than in paying off the professionals.
To determine which professionals are better, you need far more than just 10 sample points, and the WSJ also does this by looking at all stock picks. This would be part of a second parallel study if I get a decent participation from the forecasters as well, similar to what Voros did in the 2000 link I provided. However, given that I've chosen 30 players who have very inconsistent performances, I think it might show something about the forecasters, but will be far from conclusive. (If I had chosen the 30 most consistent players, my guess is that all systematic forecasters would come up with very very similar estimates. I've removed the Colorado and the inexperienced players from the study, and there again, some forecasting systems might be better with those players.)

Forecasting 2003

February 18, 2003 - tangotiger (www) (e-mail)

Erstad: good call!
The next set of players that missed making the cut were, in order: Renteria, Erstad, Beltran, Sosa, Javy Lopez, Mark Loretta, Vina, Giles, Magglio Ordonez, Garret Anderson, Ben Molina.

Forecasting 2003

February 19, 2003 - tangotiger (www) (e-mail)

Minks: no, you would only supply the unadjusted OPS. The only reason to supply lgOPS and lgERA would be to establish your basis. Suppose that you miss all your OPS projections by 50 points, but that you also projected the lgOPS to be off by 50 too. Then, this scores 100% (in my book). A person using the results of such a projection will be perfectly happy (as long as he uses only this projection).
David: the back-of-the-card forecasters are just like mom&pop investor. They each have access to their own private information and public information and intuition, and combine all their data into some sort of target price for a stock. The collection of all these investors makes up the market. You can benefit from this "wisdom" by buying the S&P500 index (SPY). The systematic forecasters follow a rigid, repeatable process, like the various brokerage houses, like Lehman and Smith Barney. The baseline is the monkey throwing darts at a stock chart. So, whether I am comparing apples or oranges I don't really care (for this study). I'm trying to put this study on the same plane that the WSJ puts its study in.
A second parallel study, looking at extended picks that the systematic forecasters provide (which the WSJ also does when selecting their best analysts), might satisfy the fruit requirements.

Forecasting 2003

February 21, 2003 - tangotiger (www) (e-mail)

Vinay, excellent. I did not know about this. We are essentially after the same goal, but where they have 27 humans projecting 125 players, I'm hoping to get the reverse (100+ humans projecting 32 players).
What is very interesting to me, and which matches the stock market with its S&P 500 index, is that the collective wisdom of the market matches the top forecaster, with all of his intricacies.
The "missing big or getting big" projections of Wilton, I think is probably attributed to lack of regression towards the mean in that system. I'd have to look at the data more carefully though. Because we are dealing with sample performances, you should expect a few guys to have seasons that are out of the norm, and therefore a system like STATS or Palmer will miss the outliers at the gain of the large population. Silver's PECOTA should give the readers the best of both worlds.

Forecasting 2003

February 24, 2003 - tangotiger (www) (e-mail)

We are trying to forecast a player's performance for the upcoming year. This performance is a combination of a player's expected true talent level, context in which that talent will manifest itself, and luck.
ERA has more luck (from the pitcher's perspective) than other measures. The point of this forecast is to try to predict a player's performance numbers, with the reader trying to do as little as possible.

The 2003 Projections

May 6, 2003 - tangotiger (www) (e-mail)

I didn't think about that sort of thing when making my projections; they were more seat-of-the-pants than that, and I assume that they were for most people.
I hope this is the case, as this is what I was hoping for. Can you get 100+ baseball fans to make seat-of-the-pants calls on extreme players, average them, and come up with something decent? We'll see in a few months...

Crucial Situations

December 3, 2002 - tangotiger (www) (e-mail)

Really? Hmmm, are you using an old version of Netscape? What's your browser version?

Crucial Situations

December 4, 2002 - tangotiger (www) (e-mail)

...but the shading doesn't print for me. Just pages of empty grids.
Hmmm... maybe I should put text and color? I'll see what I can do about that.
In some innings, does the blue shading bleed into the -4/+4 run columns (or even further)?
Good question! I was thinking about that, but since I used the same program as for Bonds, I limited to -3/+3. Maybe next time I'll expand to something larger.
...although the column headings would be better if repeated every half-inning, not just every inning.
Thanks for the suggestion! My artistic skills are not what even an average person has, so any formatting improvements suggestions are appreciated. I'll do this next time as well.
But is there anything here that isn't intuitive? I was a bit surprised by how much the leverage changes as soon as you get one guy on base, especially the late innings.
...but can there really be an argument for pinch hitting for guys (other than your pitcher) in the 3rd or 4th inning. I mean, you'd run through your bench, pretty fast. I mentioned at the end that that was not what I was suggesting. Though I would consider this if my batter was Ordonez, and Piazza had the day off.
...like what to do when you're in a particular colored box, so the practical value can be perceived. In contrast, your earlier, similar piece on when to walk Bonds seemed eminently practical, ...
There's really no end to this WE stuff. Eventually, I will be producing charts for the SB break-even points, when to bring in your reliever, should you go for the DP or try the runner at home, should you test the RF's arm, etc, etc. Any suggestions you can offer would be appreciated as well.
What are your definitions of 'Very high-leverage', 'High Leverage', etc.?
It gets a bit dry (series of math equations), but I just picked some arbitrary threshholds to try to distinguish easily the various situations. I could have put +.054 wins and +.013 wins, etc, but who the heck knows what that means?

Crucial Situations

December 4, 2002 - tangotiger (www) (e-mail)

I'll do my best.
This is what you do: 1 - Determine the WE for every inning/game/base/out for an average team. I've provided a subset of that in the initial link.
2 - Assume that your "great pitcher" or "great hitter" or whatever is going to come in for 1 PA. What is the expected WE following this player's PA? (I used a player whose component stats translates to a .750 win%)
3 - Take the difference between the two. That is the impact in wins of a "typical super-great" player for 1 PA.
The biggest swing, in this example, is about +.07 wins, and that occurs in the bottom of the 9th, home team up by 1, and you have men on 2b and 3b and 1 out. That is, if you bring in say Pedro or RJ or Mo Bonds or Thome or Giambo for ONE SINGLE PA, he will have an effect of .07 wins (assuming these guys are .750 players) over an average player.
How much is +.07 wins? Well, the typical star is +6 wins in 600 PA (+.01 wins). If you bring in Giambi IN THIS PARTICULAR SITUATION 100 times, he'll have as much impact as playing full time.
Now, now, you won't have this situation 100 times, and not having Giambi regularly in the lineup might even mean you might have this situation zero times, who knows. But this is the magnitude of the impact.
So, while Theo Epstein and Bill James are saying that tied games in late innnings are very important (AND THEY ARE!), my research shows that up by 1 for the fielding will have more of an impact to have a great pitcher pitching.
Anyway, the thressholds I used are .01 / .02 / .04. Just made them up to try to get a balance to the chart. Well, I used the .01 because that's what a great player is worth randomly. And .04 cause that would make it 4 PAs in a game. So, given the choice to hit Piazza 4 times randomly, or once in the "very high-leverage" situation, it's a wash. Of course, if that situation doesn't come up, well, you lose on the deal.
Is that enough detail? Too much?

Crucial Situations

December 4, 2002 - tangotiger (www) (e-mail)

Chris, you got it!
If you followed my "Runs Created" series, it shows that the "run environment" (really WIN environment) already exists when the batter/pitcher matchup comes up. That is, the runners on base already have a built-in chance of scoring, given the environment they are playing under.
So, if you then introduce a great player into the mix at that point in time, the entire environment changes. Now, the chances of winning change (sometimes drastically). With 1 out, more damage can be done (not only with the runner on base, but with the batter getting on base). You bring Bonds as a PH with 1 out, not only is the guy on base likely to score, but Bonds will now put himself in a position to extend the inning.

Crucial Situations

December 4, 2002 - tangotiger (www) (e-mail)

Oliver, I really don't know. You'd really have to compare how teams should make their choices optimally against how they really make their choices. And you'd have to break it down by the kinds of choices as well (steals, sacs, taking extra base, throwing to wrong base, bringing in the wrong reliever, batting order, etc, etc, etc). It's gotta be a few wins at least. I don't know, 5? 6?
In the business world, I would perform a cost/benefit analysis. But, the reason I'm doing all this baseball stuff is so that I can get away from doing these boring dry cost/benefit reports! Please don't make baseball like a job for me!!

Crucial Situations

December 5, 2002 - tangotiger (www) (e-mail)

Here's a printer-friendlier version
http://groups.yahoo.com/group/tangotiger/files/crucialpa.pdf
Still no text, though.

Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

Vinay, the starter is almost exactly 1.00. I looked at two starters, one who was good and went long (Blyleven), and one who went short and was not so good (Knepper). Bert was .99 and Bob was .98.
As for historical, I have the LI for the 20 pitchers with the most relief games from 1974-1990:
pitcherid Leverage Index suttb001 1.90 smitl001 1.76 fingr001 1.75 gossr001 1.72 rearj001 1.69 smitd001 1.52 orosj001 1.48 laveg001 1.46 quisd001 1.45 mintg001 1.45 tekuk001 1.41 garbg001 1.38 lyles101 1.35 campb001 1.31 stanb001 1.30 martt001 1.24 leffc001 1.20 hernw001 1.18 baird001 1.10 andel001 1.03
This is the LI only while as a reliever.
Sorry, but my data is limited to the pbp provided by Retrosheet.
I agree that doing the LI by year, and then doing the multiplication by year would add the "timeliness" aspect as well. I might do it for one of these guys, maybe Quiz.

Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

Thanks... good question. I haven't calculated it yet, but I have another tool (which for a lack of a better name I call the Tango Distribution), which shows the expected runs per game distribution, given the runs per game of a team. Using this, I can figure out the win% for any two teams, broken down by run differential. Surprisingly (to me), there is little difference between a .400,.500, and .600 team in terms of "number of close games". I would suspect that I could extend this to "number of crucial situations" as well, and therefore, expect that most team face a similar number of crucial situation. The more you get away from being a .500 team, the less the number of crucial situations. I'm not sure what the relationship is between team win% and crucial situations (yet). I'll keep this in mind next time I'm working on this. Great question!

Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

Craig: Mark Eichorn, in 1986, was 1.32. If you look at the list of 20 players I listed in my followups, you will note that Bob Stanley is 1.30. I think this is probably what you'd find with your multi-inning non-closing firemen. Eichorn did have 10 saves, and finished half his games, so you might be careful about how you extrapolate his usage to other relievers. In any case, his 157 relief innings is equivalent to 207 typical innings. The guys below 1.00 in LI are the true mop-up guys.
Devin: Your point is valid regarding what it would take. Since the HOF is a self-defining institution, I don't see how I can answer that question with any basis. Writers, like fans, are flying by the seat of their pants in trying to establish the potential impact a reliever has.
Pete Palmer and Bill James try to answer this question by using a combination of SV, GF, and G to come up with a reasonable estimate. I'm offering the same type of solution from a different angle. Therefore, I think it is irrelevant how we think they impact, and how they've changed the way the game is played and managed. The fact is that the impact of the best relievers, while real, is not substantial enough to catapult them to the levels of the superstars. And the best of the lot is good enough to put them in line with star pitchers who lack longevity. This is why relievers are paid they way they are. GMs may have figured out their true value already.
However, your point is just as valid, and that the HOF may not simply be about "overall value". And perhaps relievers do deserve a special spot. I don't know, and I think that the writers also don't know.
As for my sims, I'll run a couple more, like for Quiz and Reardon and Bob Stanley, using their LI. I'll let you know what shows up.
Charles: thank you! I had alot of fun doing this piece! I just wish I could devote more time to this.

Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

No, you didn't miss it. I explained it in another article.
If you go to the first line of the article, and click on the link, you get a general explanation of how I determined the leverage of the situations. If you then go into the comments section, in one of the December 4 comments, I elaborate on how I derived the leverage values. Hope that's good enough?
I apologize for making each of these win expectancy articles links to links to links. They're all related, and it's very tough (for me) to write it adequately, without making it a mathfest.

Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

If ever I get pbp prior to 1999, and I get the World Series pbp, I'd *love* to look at Mariano Rivera. He may turn out to be a borderline candidate like Goose or Lee Smith, based on regular season numbers. But when you add in his tremendous playoff performance, that may be enough to get him over.
In fact, I am surprised how little play playoff heroics get. In the NHL, they have the same problem. The NHL and NBA is *all* about the playoffs. But the awards and HOF, etc, is mostly about the regular season. Rarely do you see the two combined. I believe in soccer, they combine all games, regardless of "league". Pele has 1,241 (or whatever) goals, with no split.

Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

Thanks Colin.
Yes, your point is very valid, and Vinay also brought up the issue in his comment. Essentially, the next breakdown is to look at each PA within the context of the leverage of the situation (which is what the Mills Brothers and Doug Drinen did). So, while Bruce Sutter may be 1.90 overall, he'd have say 30% at a 4.0 leverage, and 40% at a 1.50 leverage, and 30% at a 0.3 leverage. And maybe during the 4.0 leverage situations, that's when he was his best (whether because his manager used him during his peak years, or he rose to the occasion, or by luck), and therefore, he was even more valuable.
It's possible that there is some impact here, especially with the relief wild card.
It requires some upfront work on my part to get the whole thing set up. I'll see if I can devote some time to this. Perhaps after XMAS.

Bruce, Lee, and the Goose

December 17, 2002 - tangotiger (www) (e-mail)

I think this is the exercise that Vinay did, but in response to Joe's point, let's go through it step-by-step.
Let's assume that Gossage has an ERA+ of 126. Let's assume that he had 251 IP as a starter, and 1558 as a reliever. Let's assume that as a starter, his ERA+ was 100, and as a reliever it was 130. Fair enough?
Now, his LI as a starter was 0.99. His LI as a reliever was 1.72. As we see in the above paragraph, he pitched his best as a reliever. So, if we take his 1558 innings and multiply by 1.72, that gives us his "adjusted typical" innings. Do the same for the 251 x .99. Good so far?
Now, weight the 130 ERA+ by 1558x1.72 and the 100 ERA+ by 251x.99. You end up getting an ERA+ of 127. That's compared to the initial value of 126.
The point is that because very little of Goose's innings came as a starter, the change won't affect much. If this was Eckersley, then that's a different story.
That said, while the impact is small in this case, we should still do the breakdown as I mentioned in the previous post, so that we are leveraging each particular PA, and not applying an overall leverage on the sums of the PA.

Bruce, Lee, and the Goose

December 18, 2002 - tangotiger (www) (e-mail)

Good point.
There are two things to consider with "leverage". You can take the position of what was the leverage of the situation, assuming that the pitcher will pitch to the end of the inning. So, if it's top of 9th, 1 out, man on 1B, up by 1, the leverage is not that particular situation, but rather that particular situation as the starting point, until the end of the inning. It could be that that particular PA may have a leverage of "4", but the "starting from that PA to end of inning" may have a leverage of "2".
Furthermore, you can also take the point of view that if a reliever gets himself into a jam that the manager is "bringing him in" to get himself out. That is, after every PA, the manager is deciding whether to bring in his existing pitcher, or bring in a new pitcher.
Remember, my point of view is crucial PAs. So, PA by PA, what is the leverage. I don't know if it's the pitcher or the fielders that caused the change in leverage. And really, I don't care. What I care about is how often did he face a high-leverage situation.
It is important that you don't make a stat do what it wasn't designed to do.
If I were designing a model to decide when is the optimal point in the game to bring in a reliever, such that he will pitch to end of inning, I would have different leverage numbers. And if I design a model, such that my pitcher will pitch to end of inning, plus one more full inning, I'd have again, different leverage numbers.
All these methods are good, within the context of their design assumptions.

Bruce, Lee, and the Goose

December 19, 2002 - tangotiger (www) (e-mail)

To add to the point about "how often do reliever cause high-leverage PAs for themselves": Bruce Sutter comes into a high-leverage situation, and he keeps it high-leverage. Fat Rojas comes into a high-leverage situation, and turns it into a low-leverage situation (by giving up 3 run HRs).
So, there are various reasons as to turning a type of leverage situation into another type of leverage situation. It's not just a "if he's bad, then..." kind of deal. It's alot more intricate than that.

Bruce, Lee, and the Goose

December 19, 2002 - tangotiger (www) (e-mail)

Gossage gave up .5 more walks than Gooden, but .7 less non-HR hits. Gooden's run environment was slightly higher than Gossage. The actual runs allowed by relievers are a little suspect because of the "accountability" issue. Not that it should be ignored, but just you have to account for it.
On a "rate" basis, of pitchers born since 1950, I have Guidry, Cone, Rijo, Sutter, Blylven, Gooden, Gossage all being "equivalent". In terms of IP or leveraged-IP, clearly Blyleven is the one that stands out here. By the way, John Smoltz is also in this group.
Gossage is borderline, in my view.

Bruce, Lee, and the Goose

December 19, 2002 - tangotiger (www) (e-mail)

Paul, welcome! I don't think I've seen you around here? I'd love for Retrosheet to get more PBP, and I'd love to run the 73 Hiller, and the Franco career through their paces.
Walt, my inclination is to say that they are in the pen because they are not Roger Clemens or Greg Maddux. However, they are David Cone or Ron Guidry, and those guys were pretty darned good. I don't have separate standard for catchers or anyone else. I look at it as how many wins did they contribute over some baseline. If you are a catcher, and you only play 120 games, and you are done by 34, then I don't have different standards. Not to say I'm right or anything.
You make a valid point that relievers can be considered similar to catchers (can't play long enough in a season or a career). So, you have to first resolve why they have shorter careers (because of the position, or the quality of the players there). Then you have to resolve if you want to have a different standard.
It's a tough call no matter what your perspective is.

Bruce, Lee, and the Goose

December 20, 2002 - tangotiger (www) (e-mail)

If I remember my post, Eichorn had 157 real innings, and 203 leveraged innings. It may be that that's as much mileage as you can get out of a reliever. That, because of all his warm up tosses, etc, etc, you won't get more than that. Then, he's got to do that for 15 years.
However, I don't know if this is a physical limitation, as it is for catchers (who play 130 games instead of 150, and who play 13 years instead of 18, e.g.). If the relievers are physically limited to 200 leveraged innings insead of 250 for starters, and 12 years instead of 16 for starters (just examples), then it may be fair to consider the relievers to have lower standards, like catchers.
However, this should be studied to the extent that catchers' careers have been, before we pronounce sentence.
Even after all this though, people can still choose to not lower the standards for the C/RP.

Bruce, Lee, and the Goose

December 26, 2002 - tangotiger (www) (e-mail)

If there's anyone still out there, Eric Gagne's LI last year was 1.83, and Smoltzie was 1.79.

Are Managers Optimizing Their Best Relievers?

December 31, 2002 - tangotiger (www) (e-mail)

But first, I'd like to suggest that the most optimal use of the best relievers would generally be as a starter.
Agreed.
Why don't we use the same thinking for relievers? Why is the 9th inning any more important than the 1st?
If you bring in Mariano Rivera with a 6-run lead 50 times, you won't change the outcome of the game, than if you brought in an average pitcher.
If you bring in Mo 50 times with a 1-run lead, the Yanks will win a few more games than if you brought in an average pitcher.
If there's a one-run game, aren't each of the starters' six innings just as vital as the closer's ninth?
I'm not taking anything away from the starters. Their LI is about 1.00.
In fact, I would even argue that the first 7 innings of pitching are MORE important than the 9th because the score after the 7th (and often the 6th) influences the choice of relievers the opposing manager will use.
7 innings of LI of 1.00 is 7 leveraged innings. 2 innings of LI of 2.00 is 4 leveraged innings. Yes, the first 7 are more important, or at least, they have more impact to the final outcome of the game.

Are Managers Optimizing Their Best Relievers?

December 31, 2002 - tangotiger (www) (e-mail)

Is it reasonable to conclude that Rivera's LI could be somewhat deflated due to the fact that the Yankees have been consistant winners over the last few years...
I believe I mentioned that as a possibility that the Yanks pay (earn) this price.
I don't recall any mention in the previous article of a correlation between overall team LI and team W-L records (though I would expect really bad teams to have the lowest LI's for their relievers).
On my to do list. I should be able to come up with the LI, on a team-by-team, year-by-year basis, from 1974-1990. I expect the LI to peak with teams at .500, and slowly degrade the more the team's win% is from .500 (on either side).
It would also be fun to see the converse of this study, i.e. what is the xFIP for pitchers with the highest LI?
Also on my to-do list. I just ran a prelimiary report for 1974-1990, and Todd Worrell actually tops the list at 1.97. Bruce Sutter is second at 1.90. The top of the list is all the usual suspects. The first name that I didn't recognize was Victor Cruz at 1.58. Next was Steve Foucault at 1.50.
Among "middle-relievers", Tim Burke was 1.54. He's a favorite of mine, and it certainly looks like he was used prominently. Paul asked earlier, and john Hiller was 1.62. Mike Marshall was 1.51.
Among pitchers with at least 2000 PA, Dave Tomlin was 0.73, and worst of the bunch.

Are Managers Optimizing Their Best Relievers?

December 31, 2002 - tangotiger (www) (e-mail)

Thanks, I'm enjoying this as well!
The problem with the "out" is that sometimes an out increases your WE (win expectancy), say a flyball with a man on 3b, of a tie game in the 9th inning. Strictly speaking, you have to look at the change in WE for every possible event, and then come up with the variance (and the frequency of those events). In essence, how much swing potential in winning does a particular game state provide? That's the question to answer.
I'd love a faster computer, as I'm running this on a 650 MHz (but 512 RAM). Sometimes, I have to run stuff overnight.

Are Managers Optimizing Their Best Relievers?

January 1, 2003 - tangotiger (www) (e-mail)

I will give a performance breakdown for Shuey and Stanton, among crucial, normal, and non-crucial situations. Look for this in a few days. We'll see if they can "handle" the pressure...

Are Managers Optimizing Their Best Relievers?

January 2, 2003 - tangotiger (www) (e-mail)

What we are after is *not* to maximize a pitcher's LI, but rather to maximize their leveraged-innings (LI x IP). LI of 1.00 with 120 IP will have the same win impact as 1.50 LI with 80 IP to a reliever. Of course, it's not that simple, as you have to take the totality of your starters and relievers, and maximize the leveraged innings for the good pitchers, and minimize the leveraged innings for the bad pitchers, such that all innings are accounted for. You have other constraints as well, with respect to the tiredness of a pitcher's arm, etc.
Mark Eichorn, for example, had 200 leveraged innings (LI of about 1.3) in his great year. That is an excellent total.

Are Managers Optimizing Their Best Relievers?

January 2, 2003 - tangotiger (www) (e-mail)

All things equal, you are better off having your pitcher as a starter.
Your considerations would be to take someone like Urbina and Wetteland, and determine their level of effectiveness as a starter or reliever.
Say that as a starter, their performance would be a win% of .600. And as a reliever, they would be .650. You know that you can get say 160 leveraged-innings as a reliever, or 220 leveraged-innings as a starter. What do you do?
Compared to a baseline level of .450 (the effective level of rejigging your whole pitching lineup), you get 160/9 * (.650-.450)= +3.6 wins as a reliever or 220/9 * (.600-.450) = +3.7 wins. Essentially, a wash.
So, you really have to go into it deeply, determine the effectiveness level of all your pitchers based on the starter/reliever role, determine how you can best optimize your leverageable innings, and come up with your plan. It's not so easy, especially considering injuries throw a wrench in your whole plan. Unless you are the Yankees.

Are Managers Optimizing Their Best Relievers?

January 2, 2003 - tangotiger (www) (e-mail)

Shuey and Stanton breakdown
The leverage classes were broken up into high-leverage (LI of 2 or greater), low-leverage (LI of 0.5 or less), and the rest.
$H is non-Hr hits per ball in play. All the others should be self-explanatory.
Paul Shuey? He was at his best in high-leverage situations. Mike Stanton? He was by far his best in high-leverage situations. Note the small sample of PAs. Note also that it's easier to get more WP in high-leverage situations, since high-leverage situations occur more often with men on base. In any case, Shuey's WP rate wasn't so high, relative to his other situations.
I think there's some interesting DIPS numbers in there as well. With the leverage situations different, each pitcher gave up fewer hits / ball in play, and fewer Ks as well. Almost as if the pitcher had to bear down in the high-leverage situation, and therefore, has a different pitching approach, thereby lowering his K rate, and improving his $H rate. We may in fact find that pitchers DO control the hits/ball in play ALOT. And it may simply be the fact that once you reach the majors, the pitchers are similar in this regard overall.

Are Managers Optimizing Their Best Relievers?

January 2, 2003 - tangotiger (www) (e-mail)

FJM: yes that is correct. The second guy was on a hotter seat, and that's what LI is reflecting. As I mentioned on another thread, LI is not about rewarding a player, but classifying each PA.
Note that a manager is choosing to bring back the same reliever. If he had chosen to replace the reliever after 2 hits with another reliever, we'd have no problem saying that the replacement was on a hot seat.
It doesn't matter who sets the fire. We are capturing the existence of the fire, and we are capturing that the manager is letting someone pitch in that fire.
Doug Drinen's reliever reports works based on when the reliever enters and exits the inning. This metric works great in other areas, for other purposes. Eventually, I'll probaby create an LI for this as well.

Are Managers Optimizing Their Best Relievers?

January 3, 2003 - tangotiger (www) (e-mail)

Well, I provided the LI for 10 top relievers of 99-02, as well as the historical LI for all pitchers in the 74-90 time period (see Clutch hits).
As for biased, again, there is no bias. It's a reflection of the game state for each PA. I know what you are saying about say John Franco or Mel Rojas being arsonists.
But it's not like there is a giant in a land of pygmies, even Mariano, that we should be concerned about. In the 74-90 time period, Clemens and Gooden are probably the giants. Their LI are 0.96 and 1.03. Hershiser was 1.03. Ryan was 1.05 and Blyleven was 0.98.

Are Managers Optimizing Their Best Relievers?

January 4, 2003 - tangotiger (www) (e-mail)

FJM: Again, I don't know how much effect it has, but I suspect a little. I'll find out eventually.
But again, remember the purpose of leveraged PAs. It's about describing the level of fire during that PA, regardless of whether that fire was arson or not. The manager is bringing back Mel Rojas, the arsonist, for the next PA.
As mentioned in another article, I can also create leveraged appearances, whereby I only note the fire level when the reliever is first brought in. This I will also do eventually. (Drinen essentially did this already.)
It's important to realize that a stat is constructed to answer a specific question, and it should not necessarily be used to answer other questions. Nor is it a shortcoming of the stat if it can't answer this new question.

Are Managers Optimizing Their Best Relievers?

January 4, 2003 - tangotiger (www) (e-mail)

If you page up to my Jan 2 comment, you will see a link to Paul Shuey and Mike Stanton, and how they performed in the various leverage situations. Paul Shuey, and especially Stanton, have excelled in high-leverage situations, when given the chance. The sample size is small, so who knows.
I was surprised with Percy too. I thought he was better, but his K,BB,HR numbers don't compare with the best, though he would have come in the 11-20 list.
As for more analysis, I would love to do it. But my time is really constrained. I want to do an analysis on a team-by-team year-by-year basis for the last 4 years, and within that, show how each pitcher performed in the high-leverage and low-leverage situations. There is really so much I want to do, I don't know where to begin.
Right now, I'm taking a break from relievers and concentrating on baserunners.

Are Managers Optimizing Their Best Relievers?

January 6, 2003 - tangotiger (www) (e-mail)

David, thanks much! I'm actually using alot of different concepts into all this, so it's rewarding to me as well.
As for 2002 PBP, astrosdaily.net has it, so I'm fine there. What I need is *time*. Can you help me there?

OPS: Begone!

May 20, 2003 - tangotiger (www) (e-mail)

The A's have an additional point that by being able to work the count longer, a team can "choose" their opposing pitchers to the point where the average opposing pitcher is worse than by random chance.
They "choose" the pitcher by forcing their opponent to bring in the 10th reliever, because they wore out the starter. While this is certainly conceivable, you would need a whole team of such batters for this to work. As well, there's no guarantee that your team will benefit from it, since your opposition's next opponent might reap the benefits.
In the end, we are talking about a max .20 run difference/GP (see a previous Clutch hit for calculation), if the whole team is like this, and they are the ones who get the benefit. I fail to see how jumping the OBA to 3x from 1.8x would capture this. The "extra pitches" is not a function of OBA, but of (BB+K)/PA. By jumping the number from 1.8 to 3, you are capturing only part of this effect (BB/PA), in a whole bunch of other noise (H,HR,outs). This extra 1.2 is sort of trying to rise above the noise to find the BB/PA. If this is what the A's are trying to do, I don't think they're doing it in the best way. It's hard to comment further, without having the specifics (like James / Todd Walker comment as the best #2 hitter). From what we think they are trying to do, they are wrong.

OPS: Begone!

May 20, 2003 - tangotiger (www)

The "additional point" thing is what I'm capturing. It doesn't matter if you do: 3*(OBA-.3)+(SLG-.35) OR 3*OBA+SLG-1.25
It's the same thing.
========== As for the "wearing out the starter", Ted is correct in his approach. If you have a team of player's whose "true talent level" was .333/.400, this team would score about 4.5 runs per game. However, because these guys all work the count, they have a synergistic effect in tiring out the starter, and bringing in the 10th man. These guys, because they feed off each other in this manner, will end up with .343/.405 numbers (let's say). Now, all of a sudden, this team of talent of .333 with the synergy effect, acts just like a team of .343 with no synergy effect.
This extra effect the A's are capturing inside the OBA, by overweighting that metric. However, there's no reason to rely on such a noise-filled metric, when what you want is (BB+K)/PA or (pitches/PA). Because of the amount of noise, to try to capture the little extra pitches/PA in the OBA, you have to severely overvalue the OBA to find it.

OPS: Begone!

May 20, 2003 - tangotiger (www) (e-mail)

The other reason for using "3" for OPS is if you are actively looking for those types of players. If you really really want guys with high OBA, then you would overweight OBA. You would do this because maybe you feel that it's a better predictor of future production. Or you feel that you need to get the players to toe the company line, or whatever. Guy like Vlad, Nomar, and Soriano would not be properly appreciated in such a system.

OPS: Begone!

May 20, 2003 - tangotiger (www) (e-mail)

This is how SLOB*k and SLOB*PA*k (where k is some constant to make things add up nicely) for 6 equivalent players from that last chart look:
 81 	 76 

 81 	 78 

 79 	 79 

 77 	 79 

 74 	 78 

 70 	 76 
SLOB by itself works ok, except at the real extreme. SLOB*PA works much better. SLOB*PA is essentially Runs Created, and we already know that BaseRuns is more logical/accurate than Runs Created.
The best one in this group remains static Linear Weights. The best one "on the market" right now is BaseRuns-generated custom Linear Weights.

OPS: Begone!

May 20, 2003 - tangotiger (www) (e-mail)

Rob, you know what, you are right! I goofed.
While I was using outs as my baseline in the last chart, I should have used PA instead. Each player on the team should have the same number of PAs, not outs. Let me re-run the chart, and I'll publish the update on my site.
Good catch!
(Vinay, you are right about RC = SLOB*AB, and not PA as I mentioned in my last post.)

OPS: Begone!

May 20, 2003 - tangotiger (www) (e-mail)

For the last example, I should have been more careful.
What happens is that I should fix the team outs to something. In my example, I actually fixed it to each player making the same number of outs (440) which is wrong.
Anyway, what I now did (see link) was started with the team outs (3960), and, making sure each player had the same number of PAs, found the 8 typical guys and the 1 variable guy that would produce 3960 outs.
Things actually change. The Best-Fit becomes 1.64 (and not 1.75). I suspect that the best-fit will fall somewhere between 1.5 and 2.0, and for ease, probably use 1.5.
(Static) Linear Weights now looks less good than originally. I like this change, as it shows that the component values should change if the underlying environment also changes.
Custom Linear Weights wouldn't have this issue. Though at this point, I don't want to pronounce that custom LWTS will see all these guys as the same. It would definitely see all the teams as the same (just like BaseRuns). I think there will be some differences among these players though through custom LWTS. I'm not sure how much difference though.
Great catch again, Rob!

OPS: Begone!

May 21, 2003 - tangotiger (www) (e-mail)

See link for the values I used. For the categories I didn't use, I set them to "zero". It's not too important for what I am trying to do though.
As for the other question, you are asking if you can only know one thing, OBA or SLG, which one correlates to run scoring the best? I seem to remember Dan Werr doing a correlation study a month or 2 ago that showed the r to be pretty even between the two. That doesn't mean they are "equally important", especially if you have both.
As well, the coefficient itself (1.56, 1.64 or whatever) doesn't specificy the level of importance. If you made lilSLG = 1/4S + 2/4D + 3/4T + HR, all divided by AB, what do you think would happen? The best-fit would be 1.64*OBA + 4*lilSLG. That doesn't make lilSLG twice as important as OBA, now, does it?

OPS: Begone!

May 21, 2003 - tangotiger (www) (e-mail)

I agree that it would be a rush to judgement to make any conclusions without having all the information.
While you can conclude that using 3*OBA+SLG is a poor way to evaluate current run production, it's not so clear if you want to use that equation to try to evaluate future run production (or for other secondary reasons). And you certainly can't indict someone or some organization overall. Sample size! You need alot more evidence.
I also agree that being able to work with a group of people, respecting their views, regardless of what it is, as long as they respect your views as well, is very important. Respect, courtesy, professionalism. Isn't that the police motto?
However, I'll note that in the ESPN chat, Bill James said: Baldelli's a lot of fun. In my office we were making fun of some scout who compared him to Joe DiMaggio, but when you see him play you realize what people are reacting to. Of course, he doesn't have DiMaggio's entire package, but he does have more than half of it. I kinda didn't like the first part, which left me with the impression that the stat-heads and the scouts clash behind each other's backs. But, this was a throwaway sentence, so who knows what James meant.
Finally, as for anyone's ability to deal with people, I'm not sure that you can necessarily say that DePodesta is good or bad, nor could you say that with me, or Voros, or anyone else, unless you deal with these people on different issues in different settings (or you have some second-hand knowledge... definitely not third-hand or worse). I don't think that an executive is a better people-person, or can deal with people, than a non-exec.
I agree that arrogance is a turn-off to most people, and that's something that a speaker should be conscious of. Mike Gimbel, who I've had occasion to e-mail from time-to-time, seems like a pleasant enough fellow. But I've heard from many many people that he is insufferable. That by itself, truth or perception, will keep Gimbel out of MLB, in my view.

OPS: Begone!

May 21, 2003 - tangotiger (www) (e-mail)

I agree with your comment on the corporate world (as I've been here for...geez, almost 13 years... my "corporate world" anniversary will be in 1 month).
Rather, I'm talking about the ability to be persuasive when dealing with people who have dissenting or at least ambivalent viewpoints, which at the very least involves some combination of:
That sentence alone is interesting to read!
But to really do all of those things well is fairly unusual, and I would guess that among the pool of the 15 or 50 or 500 or whatever leading analysts, there's a lot more differentiation in terms of interpersonal ability than technical ability.
That's an interesting thought too. I'm not sure if there is more differentiation in one or the other, or how you would qualify/quantify all that. And even if the differentiation is more in one category, the impact of that differentiation might not be as much as the other category.
Did it just feel like we had an OBA v SLG discussion? (More differentiation in SLG, but more impact with OBA differences.)
As with everything, there's degrees of impact to everything, and it's rather pointless to label them black/white (not that that's what I think anyone is doing here). Even if you have a terribly insufferable analyst, his work might be of such quality that it tips the scales towards good. Even if you would be able to classify DePodesta as a mediocre sabermetrician (and I'm not doing that), the rest of his skills might be so strong, that he can make an impact with his research, while others might not (even with better "stuff").
The fact that a successful organization has him employed, and he is highly regarded by other successful people, even though his experience is not as vast as other baseball execs, must show that his total package is something to respect highly. He's a mover and a shaker, and he gets things moving and shaking in generally the right direction.

OPS: Begone!

May 21, 2003 - tangotiger

I think that you should give the benefit of the doubt when you can. I've heard nothing but good (in fact great) things about DePodesta, so, without him actually saying anything, I give him that benefit.
Now, I can interpret the 3 thing as being "you know, I've got this great formula, and you know what, this correlates highly to 3*OBA+SLG. I don't use 3OPS, I have my own, but as it turns out, it's close to 3OPS. BaseRuns, which I don't use, is close to 1.6OPS. I'm sure Tango/David don't use 1.6OPS, but their equation is close to that".
I don't think that explanation is unreasonable, is it?

OPS: Begone!

May 22, 2003 - tangotiger (www) (e-mail)

David, I agree that the 1.64 value is a little suspect since it is based only on those 6 players that I happen to construct. I mentioned that 1.50 to 2.00 would be the correct value, if you were to look for it.
I've used the plus-1 method in the past, and I find I can minimize the runs error by using 1.83 as the coefficient for OBA. That is, 1.83*OBA+SLG. I think that as long as you use something between 1.5 and 2.0, you'll be ok, or at least better than not. I suppose if you really wanted to find the best-fit via the "plus 1" method, you'd look at 200 regular hitters, and figure it out that way.
(For the uninitiated, the "plus 1" method was described in the "Runs Really Created" series last year. Check out the archives.)

OPS: Begone!

May 22, 2003 - tangotiger (www) (e-mail)

Interesting. You know, I'm pretty sure I never include the IBB, but it was several months ago when I did that 1.8 thing. Interesting results though. I suppose we should compare it to the full-blown BsR version in that case.

OPS: Begone!

June 2, 2003 - tangotiger

3*(OBA-x)+(SLG-y)
This works out to 3OBA+SLG-(3x+y) which works out to 3OBA+SLG-k
Therefore, it is irrelevant what "k", "x", or "y" is. Whatever numbers you choose won't affect the ranking of the players, or the degree of their rankings, relative to each other, than if you simply used 3OBA+SLG

OPS: Begone! Part 2

May 27, 2003 - tangotiger (www)

Nick, very well said, and I especially liked this
...because he generates extra PA at (mostly) his teammates ability levels, not at his own. It would have taken me a paragraph to explain this, but you said it perfectly in half a sentence.
As for the batting average thing, I suppose that's another myth. It's pretty clear that given two guys with the same OBA and SLG, you want the guy with the LOWER BA (though in reality, we're not talking about much difference).
I suppose if you really needed to quantify it, probably something like 3*OBA+2*SLG-BA (I really don't know, but it would be of some form like that). I'll bow out of any discussion on trying to find the best-fit equation using OBA,SLG,BA. I already don't have much use for OPS, and I know I won't like OPSMB!

OPS: Begone! Part 2

May 27, 2003 - tangotiger (www) (e-mail)

Jason, interesting thought.
I just tried with a weird environment (OBA/SLG of .393/.493), and in this case, the higher the BA, the more runs scored. I then tried the other way, with .289/.351, and this time the LOWER the BA, the more runs scored.
The "break-even" point seems to be about .360/.450. That is, at that level, the change in batting average (and I checked from .200 to .340) made zero change to the run production of the team.
Great call!

OPS: Begone! Part 2

May 27, 2003 - tangotiger (www) (e-mail)

"Key" situation is another topic entirely.
Click the above link, select your "key" situation, and plug in the numbers (on a /PA or /600PA basis). That'll tell you which guy you want.
If by key you mean inning/score as well as base/out, then you need another tool to evaluate it.

OPS: Begone! Part 2

May 27, 2003 - tangotiger (www) (e-mail)

I just want to make it clear: do not, absolutely do NOT, rely on OBA/SLG/AVG to make game decisions.
You must break it down to your components, and you must apply those components against the context being faced (base/out states, inning/score/base/out game state, game/pitcher state, etc, etc).
OPS is quick and dirty and has no place in game decisions. Relying on it for some cases will make you rely on it for most cases, and sometimes all cases. That's a bad habit to start. OPS, begone!

OPS: Begone! Part 2

May 27, 2003 - tangotiger

Every game context produces different "win potential" for H, HR, BB, outs, SB, sacs. The values between those components are not static. In a completely "run potential" world, you would never call for an IBB or a sac. But in a "win potential" world, there are many many times that you need to call for the IBB or sac.
OPS, if left to its own devices, would become the defacto mechanism to evaluate game situations, when in fact its purpose is to gloss over player evaluations. I don't believe in taking baby steps, and the long path to get the job done. I also don't believe that we should hand hold the manager for 20 years to lead him to the proper tools.
Give them the right tools for the right job, and let them decide if they want it. If Felipe Alou says that looking at OPS is b.s. to decide whether to walk Bonds, I'm going to agree with him. Should I say that OPS is less b.s. than using BA? A rose by any other name...

OPS: Begone! Part 2

May 30, 2003 - tangotiger

Yes, what you want is win-based LWTS (or a sim). And I would guess that a manager will be able to be right (using only his experience) more often than using just OPS, in a tight in-game decision.

OPS: Begone! Part 2

May 30, 2003 - tangotiger (www)

1) How can injecting more than 40 extra bases into the same number of plate appearances or outs produce a negative result?
40 extra bases on hits, but 100 less bases on walks.

OPS: Begone! Part 2

May 30, 2003 - tangotiger (www)

The differences between the top guy and the bottom guy, the bottom guy has: 100 more walks 19.7 more HR 119.7 less singles
everything else is the same.
Straight static LWTS says that works out to +33, +28, -56 = +5, or some such.

OPS: Begone! Part 2

May 30, 2003 - tangotiger

RC has its own problems, magnified substantially when the HR/H or HR/PA becomes out of whack. RC does not model run scoring at all: it just got lucky that it looks like it models it. If you've got a computer, there's zero reason to use RC, when you've got BsR (unless you want to propose a model that's better).
I don't really care about the different denominators. The whole thing of OPS centers around: more good, less bad. The more walks, the more hits, the more TB, the less outs, the better the number. There's nothing inherent in OPS that ensures that the balance is proper. It's just plain old luck that for the run environment of MLB, that it works out that way.
Believe me, if the run environment was half what it is today, or double what it is, there'd be some other "quick" estimator that would get lucky to model run creation.
Sorry for the rant.

How are Runs Really Created

August 12, 2002 - tangotiger (www) (e-mail)

Devin, excellent points.
...If the results of the RC formula didn't correspond roughly to actual runs, James wouldn't be using it.
As I mentioned, as long as you are using typical teams in the .300 to .400 OBA range, and as long as the HR/game hit is around the norm, then RC works fine as something useful.
The problem is when you try to extend that to Barry Bonds types of teams (not that they exist) or Pedro Martinez types of teams (and they exist plenty, as Pedro, when on the mound, is his own team).
My point is to make sure that just because the results of Runs Created works on a particular set of samples doesn't mean that you can extend that methodology to other types of things you may be doing.
There's a reaons RC fails, and it's in its treatment of the HR.
2) Okay, my common sense has a problem with a run value system that has events with the same outcome (walk, HPB, interference) having different run values.
Let's take a real simple example: a regular walk v IBB. Since an IBB walk occurs almost always with first base open, then an IBB has zero "moving over" value. Since the IBB is given out much more with 2 outs than with 0 outs, the "run scoring" value of the IBB is much less than a regular walk.
So, based on the frequency of when the events happen, and the effect of each event, the values can change drastically.
As for a regular walk v HBP, HBP occur in more or less random fashion. A walk occurs with more frequency with 2 outs than 0 outs, and with more frequency with no runners on 1B than expected in random fashion. The effect of these two things reduce the "moving over" value of the walk and the "run scoring potential" of the walk.
If you are thirsty for more, I've published PRELIMINARY results on the run values of various hitting events by the 24 base-out states. (I should be publishing an updated table in a few weeks.) From there you will see there are virtually no differences between the walk, IBB, and HBP, as you'd expect.
http://www.tangotiger.net/lwtsrobo.html
Thanks, Tom

How are Runs Really Created

August 13, 2002 - tangotiger (www) (e-mail)

Rob, your question on base-out differences in run value can be found here http://www.geocities.com/tmasc/lwtsrobo.html
I looked at the batting order differences of run values, and there was a long thread posted on fanhome. It is not easily digestable, and someday I'll write an article on the discoveries there. But yes, as you'd expect the leadoff hitter's HR value was 1.30 while the #5 hitter was somewhere around 1.47.
John Warren: the steal is an interesting point. The run value of the SB is very independent of the run environment, as the additive value of the SB is around .17 to .21 for the most part. The CS however changes HIGHLY, as the out is the most dependent on the run environment. The break-even point is therefore much lower with Pedro, and more steals should be attempted against him.
Mike: I've previously published charts on win expectancy which I have to update in the near future. There's no doubt that win expectancy is really the most important aspect of analysis since that's what we are after. Again, for those thirsty for more, you can consult my prelimiary chart on WE here: http://www.geocities.com/tmasc/we.htm . Again, where this comes most into play is the IBB. While the run value of a regular walk is .30 runs and the run value of the IBB is .17 runs, the win values are far different. Because the IBB occurs in game situations where it is "controlled" to minimize the impact of win/loss, then it's win value would also decrease.
Thanks for all your great comments.

How are Runs Really Created

August 13, 2002 - tangotiger (www) (e-mail)

GIDP: it's worth around -.45 runs. I was thinking of breaking up the "outs" PA into "outs 1, outs 2, outs 3", but decided against it. Maybe I will fix that.
Jason: what I am presenting is how runs are really created. It's the building blocks to whatever it is you want answered. From this, you can generate win expectancy tables, if you like, or the more detailed run values by the 24 base-out states. You can then further extend this to a 24x9 run values that ALSO includes batting order. And from that standpoint, you can evaluate the #9 v Bonds with the bases loaded.
These other run evaluators give no option to do this simply because they are the end to the means. They were built to answer a specific question, and therefore are not very extendable. Play-by-play analysis is very extendable.

How are Runs Really Created

August 14, 2002 - tangotiger (www) (e-mail)

Linear Regression
There are certain things that must be understood about linear regression and using it to determine the relationship between hitting events and runs scored.
First, a little background on linear regression. If you have two things, say, the price of a stock and the earnings per share, you can probably find a relationship between these two variables. The higher the earnings, the higher the price of the stock. You will end up with a formula like P = m times E + b, where P is price, E is earnings, b is some constant and m is the slope. The price of a stock, and runs in baseball, is influenced by more than one variable. You end up with an equation that says y = m1a1 + m2a2 + m3a3 + ... + b. Linear regression lets you input the independent variables a1, a2, a3..., the dependent variable y, and solve for m1, m2, m3..., and b.
Here are 4 major problems with using this in baseball: 1 - Linear regression is LINEAR. Linear as in a straight line. While there is a somewhat linear relationship between runs and singles, doubles, triples, and walks, there is NOT a linear relationship between runs and HR, or runs and everything else like SB, WP, BK, etc. Baseball is non-linear.
2 - The independent variables are not independent. There is an interdependence between all these variables. A walk is only worth what it is because of the other things that happen. Linear regression attempts to "freeze" all the other variables when calculating the value of the unfrozen variable. As your run environment increases however, we know that the values of these variables change. Baseball is interdependent.
3 - Even if you assume for ease that run creation is linear and independent (a safe assumption for very controlled environments), what sample data will you use to run your regression against? Most people will use team season totals, which is an aggregate of individual games, which is an aggregate of individual innings. If you want to run a proper regression analysis, at the very least run it on a game or inning level. Your sample size will explode to something much more reliable.
4 - Not accounting for all the variables. Triples have a strong relationship to speed. If you don't have SB in your sets of variables, the regression analysis will award more weight to the triples as a stand-in (because of its relationship to steals). It is possible, based on some samples, that the value of a triple could exceed the value of the HR! What other variables are you not accounting for?
Arvid - Let me get back to your post. The purpose of this article is to explain the building blocks of run creation at the team level. I have not shown how to extrapolate this to individual players. The end-result is not to end up with linear values for each hitting event, since these linear values only apply to a given run environment. We need to determine the linear values for EVERY run environment! As I said, the value of a single in Pedro's run environment is far less than a single in an average pitcher's run environment.
I am interested in the pieces of how runs get created, an actual model. I am not interested in a formula that estimates runs based on whatever variables that ONLY works for a given run environment. Runs Created and Linear Weights work fine for that. BaseRuns is the key, and I will present this hopefully by the end of this month.
Michael - The building blocks of run creation does lie in run expectancy tables for the 24 base-out states. I am not introducing anything new here, but rather showing how we should extend this to other run environments. I have not read Curve Ball. Please clarify your post further so that I can properly answer you.
Rob - Are you asking me what would a player's run value be using a context-neutral approach (i.e., the final weighted average values I presented) compared to a context-specific approach (i.e., the specific values by the 24 base-out states)? If this is the case, the answer is about +/- 10 runs at the extremes. I looked at this last year, with regards to Ichiro. You can find that article here http://www.geocities.com/tmasc/lwbymob.htm though I only looked at the 8 base states. If this is not what you are talking about, please clarify further.

How are Runs Really Created

August 14, 2002 - tangotiger (www) (e-mail)

Michael: I agree that the easily most digestable measure of run creating is one that is context-neutral, and therefore, I am not adding anything new here, except more perfect values to use (and adding values to the obscure events like RBOE or BK).
My interest lies "under the hood", and the how and the why.
The important point that I'm also trying to get across is that even if you stick to a linear context-neutral measure like linear weights, that you should use a custom version, based on the run environment. It really makes no sense to apply the same formula to Mel Rojas as to Pedro Martinez. We only do this, because it's easy for us. And if we keep doing it, we will forget to question why we do it. Runs Created, as great as it was then, is an example of this. It completely fails us at the extreme player level.
I think I am in basic agreement with your point of view.
Rob: OUCH! First of all, I did look at the batting order about 2 years ago, and there was an effect of something like 15-20 runs for Rickey Henderson in the leadoff spot. That is, putting a player whose skillset is uniquely qualify for a batting spot that has the most variability (which is Rickey to a tee) with his best season I think had a variability of close to 20 runs (against putting Rickey say in the #5 spot). The #2 hitter also showed great variability, and I concluded elsewhere that in certain (many!) situations, your best hitter should bat #2.
With the MVP/Ichiro thread, I showed that batting great with men on base, or being given alot of men on base will add 10 runs. Give both, and you're close to 20 runs as well.
I really don't need to run a simulator to determine all this though. This is a simple problem of determing the frequency of facing the 24 base-out states, and your success in those same states.
I wouldn't be surprised if you have a player who is ideally qualified for a particular spot (say Ichiro for #2, though I don't know that), who faces more than normal high-leverage situations, who is one of the best hitters in the game and who performs far above his "neutral" performance level would add 30 more runs than if placed in a "neutral" spot and performing at his normal high level. This is of course a rarety, and I would guess in practical terms that 1 standard deviation would be +/- 4 runs.
This issue however is very interesting to look at, but it would be something that I would have to prioritize in with the other equally interesting things I'm looking at.

How are Runs Really Created

August 14, 2002 - tangotiger (www) (e-mail)

Michael, I would not look at SF actual run output to determine anything since 6000 PA is not a very small sample.
Anyway, I once ran sims where I had a team of 9 .333 OBA guys, and another team with 8 .300 OBA guys and 1 .600 OBA guy. Overall, both groups are the same. I also made the SLG average about 30 or 40% higher for each player.
I then moved this Bonds type player through the batting order.
From what I remember, I did not notice much difference between the 9 equal guys and the Bonds + bad team.
I'll have to redo that study now that I have better data available. It is again another interesting question that I must look at.

How are Runs Really Created

August 14, 2002 - tangotiger

I meant IS a small sample.

How are Runs Really Created

August 15, 2002 - tangotiger (www) (e-mail)

Michael, I guess I didn't make myself very clear, since what you replied is exactly what I said.
"Anyway, I once ran sims where I had a team of 9 .333 OBA guys, and another team with 8 .300 OBA guys and 1 .600 OBA guy. Overall, both groups are the same. I also made the SLG average about 30 or 40% higher for each player."
So, the 9 equals of .333 OBA had a team weighted team average of .333 OBA. The 8 equals of .300 OBA plus the Bonds-like .600 OBA would have a team weighted average of .333. So, the first team has the Bonds magic spread around. We are talking about two equal teams in terms of overall talent, except that the spread is far different.
As I mentioned, I don't remember seeing any noticeable difference. It might have been maybe 2% difference (say 15 runs over a season) only to the extent that you'd be able to optimize the batting order so that the .600 guy could do the most damage. I will redo the study at some point in the future though to get more accurate results.
Here is a link to the results of the study I did last year. Please take it as preliminary and crude. Spreading the Bonds magic

How are Runs Really Created

August 15, 2002 - tangotiger (www) (e-mail)

tango I think your crude Bonds analysis goofed up in exactly the types of ways you intended to prevent with your article.
I was doing my best to avoid lone gunmen types like Bonds. I did that analysis ONLY to show the effect of runs at a team level, with having either 9 guys equals, or 8 guys equals, and 1 outlier, even though overall, they have the same stats. I did not want to talk about the "run environement" because...
The problems I see are that as you so elegantly noted walks are valuable only because others drive you in. By using just OBA you missed that completely as most of Bonds exceptional value is in the walks.
...because Bonds doesn't get to partake in his own run environment. Bonds's run environment, the chances that the runners ahead of him will score, and the chances that he himself will score is derived by all the other batters. You can't measure Bonds value of moving runners over, if those chances include partly Bonds' effect.
So, I was hoping that everyone would overlook this, because the Bonds effect to the run environment is outside the scope here. However, since you brought it up, what you have to do, in this case, is establish a run environment for each batting spot for this particular team, such that if Bonds is the #3 hitter, then the run environment of the #2 hitter includes Bonds, but the run environment of the #3 hitter should "assume" an "average" type of ballplayer.
I went into this into great and deathly details in the batting order thread on fanhome. I really want to avoid talking about that here, because we are going to get away from the basics too fast.
Your point is well-taken and accurate.
It would be interesting to analyze all of the line-ups that have been tried to see which would be the most effective based on the run environment concept, and of course see if you could find a better one.
The run environment concept applies to the basic building blocks of run creation, and I did apply this to the above mentioned thread on batting order.
The correct and proper way to do what you are suggesting is to use the proper model (a simulator) to go through all the variations. The run environment concept with its building blocks of run creation however will reduce the different combinations of players to look for greatly.
For those interested in the batting order thread, drop me a line, and I'll point you there. tom@tangotiger.net

How are Runs Really Created

August 15, 2002 - tangotiger (www) (e-mail)

new: you pretty much have got it, except near the end. The run environment is established by the overall offense + pitching + fielding. You CAN create the run expectancy tables and all that with a little programming. You can also extend this into win expectancy tables, which is where the real fun and learning experience lies.

How are Runs Really Created

August 16, 2002 - tangotiger (www) (e-mail)

Voros: BaseRuns does not fall into the trap that RC does nor LWTS. You will find it an appropriate measure, though you lose the great additive advantages that LWTS affords you. Readers of fanhome know what I am talking about here. For the others, please bear with me until the end of the month.
Walt: your terrific dissection deserves a generous response. I will in due course. I do want to make three specific points in the meantime though: 1 - the linear models that are presented with regards to baseball are almost always to the power of 1, and therefore that was my basis for my statement
2 - John Jarvis did a regression analysis on I believe the 1976-2000 TEAM SEASONAL totals and came away with a regression value of .62 (or something) for a double, and .87 (or so) for a triple. Those values are nonsensical in reality. It doesn't matter that his r-squared was 90% or that the standard error was very low. It's wrong. I've done regression analysis on team totals by era, and the results also were strange in some cases.
3 - But I'm really not seeing what this has to do with how slopes change by run environment. Modeling that would suggest other possibilities like a series of dummy variables representing different run-scoring eras or a multi-level random effects model.
Yes, we've tried that, but it doesn't work. As I've shown, each element would have to have its own best-fit linear or parabolic or whatever equation, with respect to the run environment. And the run environment itself would have to be known before the fact. Since we are attempting to determine what is the actual run environment without knowing the number of runs scored, we're stuck. This is where BaseRuns comes in. An elegant, simple and accurate equation.
I will reply to your lengthy post soon. Thanks...

How are Runs Really Created

August 16, 2002 - tangotiger (www) (e-mail)

Not exactly. Linear regression is linear because it's "linear in the parameters." There are many ways to model non-linearities among the variables using multiple regression.
Thanks for clarifying some points. I should then say that baseball is virtually linear in the parameters, but is non-linear to its environment.
In regression, a coefficient gives you the impact of adding that particularly variable to the model, after having removed all the influence of the other variables from both the dependent variable and the independent variable in question (aka "statistical control"). I don't see any inherent problem with doing that here, but perhaps I'm missing something.
The problem is that if you freeze say all the hits, HR, etc, but leave the walk to be the independent variable in question, its value is dependent on the values of hits and HR. So, you freeze hits and HR at say 10 and 2, then the value of 1 walk might be .30, the value of 2 walks might average .32, the value of 3 walks might average .34. Furthermore, if you then freeze the hits at 11 and the HR at 1, all these values change. So, exactly what is the value of the walk?
But I'm really not seeing what this has to do with how slopes change by run environment. Modeling that would suggest other possibilities like a series of dummy variables representing different run-scoring eras or a multi-level random effects model.
Yes. But that's really really hard.
This is a good point, but of course this has nothing to do with the appropriateness or inappropriateness of multiple regression, but rather with what the proper unit of analysis is.
Yes, my third statement was exactly this point.
...That's a lot of conditional RE tables. :-)
Yes, the 24 basic states is the least amount of states that you should accept. 24 x 9 to include the batters would be better. The point of about the fielders etc should be factored into the RE tables before the game so that you have a customized set of a 24 x 9 RE tables that is based on the actual 9 hitters, the pitcher, and the fielders.
However, if things like HBP and interference are truly random, omitting them from the model will not bias the coefficients for variables included in the model.
Things like HBP may be an indication of poor or wild pitching and therefore before we omit anything, we have to determine if they are truly random. Interference I'm sure we can ignore.
All this aside, chances are none of this will have much impact. Baseball scoring is not all that variable, most of the important variables have been identified, etc. Chances are the best we can hope for is minor improvement in the level of error. The proof's in the pudding there and I hope a future article will compare the accuracy of your method to the existing ones.
John Jarvis has gone through the exercise of comparing the various estimators, so I don't need to rehash that.
As I said in the article, as long as you adhere to typical MLB teams who play at the typical OBA levels, then really any run estimator will "work". That's because at a very narrow given specific run environment, what you say is correct, and there is not much variability.
For a "team" like Pedro, this does not apply whatsoever. And Voros is correct that while there is no 9 Bonds hitters, there is effectively 9 Bonds hitters when a really bad pitcher is on the mound. This pitcher would provide his opposition with a Bonds environment.
This is why it is important to understand the building block of run creation, and its high dependence to the run environment (which itself is determined by the various offensive events working together in a non-linear interdependent fashion).
Great comments Walt, and I hope that my lack of knowledge on specific statistical concepts did not take away from the comments I have presented. Thanks.

How are Runs Really Created

August 17, 2002 - tangotiger (www) (e-mail)

Here are the results of a basic linear regression, using team totals from 1969-1999 (808 teams). The second line is the standard error.
outs 1b 2b 3b hr bb k sb cs (0.11) 0.51 0.72 1.10 1.47 0.34 (0.10) 0.21 (0.19) 0.00 0.01 0.03 0.08 0.03 0.01 0.01 0.03 0.07
The r-squared is 95.5%. I still wouldn't take those numbers. They are nice guidelines. Very good ones in fact. But when we have access to play-by-play that tells us exactly what each event, on average, is worth, what does looking at the aggregated seasonal line tell us?

Related Web Pages and Articles

How are Runs Really Created by Tangotiger

How are Runs Really Created - Second Installment

August 20, 2002 - tangotiger (www) (e-mail)

I've gotten a few response emails with some nice remarks, but I guess there's not much controversy in what I'm saying.

How are Runs Really Created - Second Installment

August 25, 2002 - tangotiger (www) (e-mail)

Tango, don't let the lack of response deter you from completing this project. It is important to get all this down on paper in one place. Combined, it will become an important source for proper RC understanding.
David, I agree with your sentiment. I have a half-dozen other projects that I don't yet know the answer to, and therefore have more interest to me (things like when to actually walk Barry Bonds specifically, based on score, inning, men on base, outs, and batting order). However, I've read too many run evaluator articles that I hope to put a stop to the gobbledygook type approaches, and steer the search in the right direction.
Your BaseRuns is using the only right approach, by definition. As I've mentioned, the only thing left in understanding run creation is the score rate, and how to calculate it.
Tango, trust me, it's not yawning -- it's digesting. I am thoroughly enjoying the work.
Thanks, and hopefully the next article in the series will be as satisfying.
...it was the position of several well-qualified analysts that models don't matter--all that matters is accuracy in the range of interest. To them, the range of interest was the 3 to 6 R/G range of real MLB teams. To me, and to you, the range of interest is 0 to infinity R/G.
Yes, the shortcut way, while easier, and sometimes even more accurate, does not lend itself to extrapolating beyond what it was designed for. Since we are living in the time of Pedro and Bonds, maybe we care more about the extreme types.
I hope you delve into the subject of how to apply a successful team run formula to individuals. I believe that this area has NOT to this point been analyzed properly, and I am curious as to what you have come up with.
This area also needs alot of work, but I will present what I have nonetheless (probably in the 4th installment).
*** A few others have responded by email, and I appreciate any feedback (positive or negative).

How are Runs Really Created - Second Installment

August 26, 2002 - tangotiger (www) (e-mail)

Thanks Ben. To answer your question, there are 24 base-out states to consider. Depending when the K occurs, it's value would be different than a regular out.
You can check out a chart that breaks down the various run values by the 24 base-out states here: Run values by the 24 base-out states .
The K also has an extra wrinkle in that the batter can be safe after a K, or other events can happen afterwards. I've chosen to include them as part of the K, but in reality the effect is very tiny.

Related Web Pages and Articles

How are Runs Really Created - Third Installment

September 16, 2002 - tangotiger (www) (e-mail)

Actually, the linear weights that I use can determine either absolute runs scored, or runs above average.
As described in the previous article, the reconciliation between the two methods is simply subtracting .16 runs per out (for 1974-1990).

How are Runs Really Created - Third Installment

September 16, 2002 - tangotiger (www) (e-mail)

That is, the number of home runs hit isn't independent of other run-generating events. It's not that more runners are scoring per home run when a large number of home runs are hit, but rather, that there are more (run-generating) singles, doubles and triples in these games as well.
*** First of all, this is not true. Generally speaking, those events ARE independent of the HR. I ran further studies that controlled for those events (for example, looked only at games with 2 to 4 walks, 6 to 8 singles, etc, and separated them by the HR class) so that there was virtually no difference in those events. The results were the same.
Why the hell should I care about extreme outcomes?
*** In extreme examples, you can't hide the shortcomings of your models or estimates. They stick out like a sore thumb. And note, in my extreme examples, the dataset went only so far as Barry Bonds' 01/02. So, it was not "unrealistic" extreme, but realistic extremes.
If I'm a major league GM, how does BaseRuns help me to build a better team? Does it have better predictive value than runs created?
*** As I said, almost all run evaluators are similar at the .300 to .400 OBA range. This will help you determine the true value of those extreme players that GMs are trying to figure if they are overpaying them.
Better predictive value? This system, nor runs created, does not talk or explain about predictive value. Voros, MGL and a few others do a good job there.
Does it do a better job of explaining run creation in "realistic extreme" environments such as Barry Bonds 01/02 or the Deadball Era or Coors Field?
Yes. This series of articles explains a team of Barry Bonds, a team of Pedros (i.e., Pedro himself), and virtually any run environment, regardless of whether that run environment is due to the hitter, the pitcher, the fielders, or the park. If Runs Created says that the run value of a HR is LESS than 1 for dead ball (an impossibility) pitchers, or worth more than 2, close to 3 runs (!), for Barry Bonds run environment, what does that tell you?

How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

There is some correlation. Here's the data. But since I've shown the breakdown by OBA (where there is HIGH correlation by definition between the # of singles, doubles and OBA), and I broke it down by HR (where the correlation that does exist has very little impact overall), I don't know what it is that you are after.
S D T HR HBP BB 6.3 1.4 0.2 - 0.2 3.1
6.3 1.5 0.2 1.0 0.2 3.3
6.4 1.6 0.2 2.0 0.2 3.4
6.5 1.6 0.2 3.0 0.2 3.6
6.4 1.6 0.2 4.0 0.2 3.7
6.7 1.7 0.2 5.0 0.3 3.8
7.3 1.7 0.3 6.0 0.2 4.2
7.6 2.1 0.3 7.0 0.4 4.4
As you can see, the most important variable here is the HR. It has by far the most impact on how many runs should be scored, *in this grouping of data*.
And the impact that it does have is nowhere as high as Runs Created would say it is. It's impact, as shown in the article, is virtually exactly what BaseRuns says it should be.
If BaseRuns is 50% more accurate at dealing with extreme cases, and 1% less accurate at dealing with realistic cases (I don't know that it is), that seems to me like one step forward and two back.
Again, the point of the articles is to paint a picture as to how runs are created. BaseRuns is the first step in trying to figure out what the score rate should be. I'm sure there'll be better ones to come around. But the basic model is correct by definition. What Runs Created, static Linear Weights, et al do is to ignore the model, and instead fit their formula to the sample data they have on hand.
What is the question you're trying to answer?
How are runs really created.
What are the implications of your research?
The implication is that by ignoring the actual basis of how runs are scored (that the HR has an absolute minumum value of 1, and caps off at somewhere below 2, and that all events should converge to 1 as the OBA converges to 1), you are fixing your formula to reduce the RMSE. While you might (and should!) get better results by fixing your formula against known sample data, you are deceiving the reader into how runs are really created.
The implication is that the value of the HR does not have an ever-increasing value. There is a law of diminishing returns for the HR specifically.
I apologize in advance for my confrontational tenor,
I would appreciate if the confrontation aspect is reduced slightly. Thanks. I'd prefer the debate center on the merits of the data and interpretation of the data.
but your advocacy of BaseRuns comes across as almost cultish, based on a series of assumptions it conflates with Truth
Again, except for BaseRuns' definition of the score rate, everything I've said is truth. What assumptions are you referring to?
We start off with a point of fact that runs = BR x scoreRate + HR. Now, we're trying to figure out what the score rate is. David' B/B+C seems too simple, but in actual fact, this ends up conforming to reality. There's a problem at the very high end, and that's where we should try to look for better answers. But the runs = BR x scoreRate + HR must hold.
, without regard for the world around it.
BaseRuns is the only model that accounts for the world around it.
But how about one Barry Bonds and eight mortals? What I'd like to see is a comparison of the systems within an actual major league context, not a simulation that you've designed to produce an outcome that is preordained to be favorable to your cause.
Preordained? Would you believe me if I told you that I wrote the first 2 articles BEFORE I ran the data in the third article? I was happy to use my sim, but then I decided to run against real data. I was the biggest skeptic of BaseRuns when David first introduced it to me. There's no bigger skeptic of "new math" than me, be it DIPS, BaseRuns, or Win Shares.
As for 1 Barry and 8 mortals, that would require the use of a sim, because of the problem I mentioned regarding the pond and the aquarium. There are other factors, specifically, the batting spot you put Barry in. He has a different effect if batted 1st than 5th. I intend to look at the batting order effect at some point, but I'd be happy to share anything specific you would like to know.

How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger (www) (e-mail)
is Base Runs designed to provide new information
At the risk of repeating myself, it is designed to present how runs are really created. It's up to the reader to decide how valuable it is to know this.
As I mentioned on a few occasions, if all you look at are players and teams with an OBA around .300 to .400, it really doesn't matter what you use.
But, if you are interested in extreme examples, like Pedro, or a high run scoring environment, BaseRuns value comes through in that it doesn't give you the shortcuts the other run estimators rely on to be accurate.
I agree, if you are not bothered by back of the envelope calculations, and you don't care about extreme situations, then stick with basic RC or static LWTS. They'll serve your purpose. I've said as much in past articles.
I am presenting a framework to understand how runs are created, and that at some point the marginal value of the HR decreases, while other run evaluators never consider this.
And if you are looking at college or high school ball, then BaseRuns becomes much more valuable.

How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger (www) (e-mail)
As for how accurate is BaseRuns is in "real-life" situations, here's the data behind the "by OBA" chart. The "R" is actual Runs scored.
oba R BsR LWTS RC 0.030 0.12 0.10 (1.93) 0.03 ... BsR better
0.077 0.41 0.31 (1.18) 0.20 ... BsR better
0.124 0.61 0.63 (0.37) 0.51 ... BsR better
0.176 1.17 1.22 0.63 1.09 ... BsR better
0.224 1.88 1.97 1.67 1.86 ... RC better
0.275 2.83 2.99 2.89 2.93 ... LWTS better
0.324 4.04 4.21 4.19 4.21 ... LWTS better
0.371 5.31 5.58 5.51 5.65 ... LWTS better
0.421 6.87 7.28 6.95 7.42 ... LWTS better
0.468 8.64 9.18 8.41 9.33 ... LWTS better
0.515 10.34 11.33 9.89 11.49 ... LWTS better
0.566 12.10 13.91 11.78 13.84 ... LWTS better
As you can see, when the OBA is between 275 and 375, all three measures are very very similar. But for the Pedros and Thomes and Bonds of the world, things are different.
Notice also that LWTS is better at the "high-end". This is because LWTS takes advantage of the HR value to be fixed at 1.40, and it doesn't fall into the RC trap.
If I present a similar table broken by HR class, BsR will take over in *all* respects. This is why I say that David's BsR is the first step. Clearly, it still falls into a similar trap as RC in that it overvalues each event (but not as much) that RC does. The search is to find out how to better represent the interaction.

How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger (www) (e-mail)
Ugh, just trying to turn off those italics. Sorry about all that.

How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger (www) (e-mail)
Are those italics ever going to die?
...this as the sort of tough love you'd encounter in defending a dissertation. It is clear that all of this makes sense to you, but it is not so transparent to a well-informed audience.
*** Yes, this is very clear to me. Since I don't have the natural honed gift of Bill James in writing, I'll do my best to convey my message better.
I also think that you're inviting somewhat more ... aggressive feedback when you say things like "Runs Created is dead, BaseRuns is the now",
*** That's ok. I say these things with basis. I've gone to great lengths to show different scenarios, etc.
...or invoke (incorrectly) something like the Heisenberg Uncertainty Principle. I mean, you're talking the talk.
*** I didn't mean to suggest that the Heisenberg Uncertainty Principle was at work here, nor that my example was one of Heisenberg. The specific quote I took was one where it's hard to distinguish between what is being observed, without interacting with the system you are observing. Barry Bonds does not interact with himself, only with his teammates. But by throwing Bonds into the mix, you are changing the relative values of the teammates you are trying to study.
2. Thank you for presenting the table of correlations.
*** Sure. I'd be glad to show more detailed data. Just the forum here is not very appealing for it. Email me if you want more.
3. The fundamental point is that there's no "Holy Grail" here. Runs are created by the particular combination of batting and baserunning events in a particular inning of a particular game.
*** The search is for the holy grail, and BaseRuns is *not* it.
Any attempt to generalize these unique sequences of events into something universally applicable has got to make approximations and assumptions.
*** You don't want to make them universally applicable. The holy grail reference was in reference to things to come. You have to understand all the contexts, the base-out situation, the pitcher/batter matchup, the runners, the fielders, the park. I don't expect that we will end up with 1 formula for all that. I do expect that we will get a series of principles that will follow all that. The work that I have done all leads to this.
Linear weights cuts a different corner than BaseRuns does, by focusing on data at the season level rather than at the game level. You assert that focusing on data at the game level is True, without presenting evidence either as to the utility of this approach
*** The game is the unit since the interaction of the events occur at the game level. You make a more accurate point that the interaction occurs at the inning level, and this is true. If I had the data, I would have presented at the inning level.
(what would Billy Beane do with it?),
*** I'm sure he's a smart guy. But not all these things have to be applied by GMs. My audience is myself, and people who think like me. Maybe there's not many people out there like that, that's fine.
or to its aesthetic purity. Why focus on the game level, rather than at the inning level? Why try and take all of the context out of run creation at all?
*** BaseRuns tries to (wrongly) take the context out. LWTS, as I do them, forces all the context right back in.
4. My point in criticising your experimental design is that the coefficients you use in BaseRuns were derived based on data gathered at the game level, and that you then use game-level data in order to test its superiority.
*** As discussed, the interactions occur at the inning level. Game-level is the best I had.
If you tested the various systems based on season-level data, linear weights would triumph, because that's how it is derived.
*** Not necessarily so. Dynamic Custom Linear Weights would always triumph over static linear weights. Assuming you meant static linear weights, this is probably true, but not necessarily. But I agree generally with this statement. The point however is that the single from last week does nothing to determine the impact of run scoring tomorrow. It might have some predictive value, but it has no impact on it. This is why inning-view (or game-view if you are also looking at wins and lineup construction) is the correct view.
5. In the OBA chart you present above, BaseRuns is considerably less accurate over the entire normal range of OBA's. Missing by an additional .25 runs per game would amount to about 40 runs or 4-5 games over the course of a season. That is not "very very similar"; you have made a substantial trade-off here!
*** If I present the chart by HR, or by OPS, BaseRuns would triumph over the normal range. It depends what data it is that you are using, the context that you are presenting. Again, the strength of BaseRuns is how it handles the HR. Its weakness (relatively speaking to itself, but not to the other evaluators) is the rest of the components. This is why I don't support BsR as the end-all and be-all, but as the first step. As I've mentioned a few times, the search is on for a better score rate. If you have a run model that doesn't adhere to something as fundamental as R = BR x scoreRate + HR, what are you supposed to do with this?
*** Here is the data by the OPS class (grouped by .100)
opsClass R BsR LWTS RC
0.055 - 0.03 (1.76) 0.02 ... RC is better
0.160 0.22 0.18 (0.75) 0.17 ... BsR is better
0.258 0.47 0.50 0.11 0.48 ... RC is better
0.358 0.93 1.02 1.01 0.97 ... RC is better
0.454 1.62 1.72 1.94 1.65 ... RC is better
0.552 2.49 2.60 2.90 2.52 ... RC is better
0.651 3.47 3.63 3.88 3.59 ... RC is better
0.748 4.62 4.79 4.82 4.83 ... BsR is better
0.846 5.85 6.08 5.72 6.27 ... LWTS is better
0.945 7.17 7.48 6.59 7.92 ... BsR is better
1.043 8.60 9.00 7.42 9.78 ... BsR is better
1.141 10.11 10.53 8.18 11.85 ... BsR is better
1.239 11.57 12.25 9.06 14.14 ... BsR is better
1.338 13.16 13.89 9.76 16.66 ... BsR is better
1.443 15.78 16.15 10.78 19.62 ... BsR is better
(you'll note that when RC is better, is is barely better. As the OPS rises, BsR is far better.)
And here's broken down by the HR class
HR R BsR LWTS RC
- 3.08 3.06 3.79 3.03 ... BsR is better
1 4.62 4.62 4.44 4.66 ... BsR is better
2 6.12 6.12 5.00 6.41 ... BsR is better
3 7.65 7.65 5.62 8.37 ... BsR is better
4 9.03 9.00 6.07 10.29 ... BsR is better
5 10.55 10.49 6.73 12.45 ... BsR is better
6 12.33 12.32 7.52 15.35 ... BsR is better
7 16.22 14.32 8.34 18.27 ... BsR is better

How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger (www) (e-mail)

At the risk of being overly reductive, if a model like this doesn't have a use for baseball management then I'm not sure what the point of the research is.
*** The point of the research is to enlighten people as to how runs are really created. It doesn't have to have an application beyond that. However, if you want to properly value a player, you should value him on how he really creates runs. And BaseRuns helps in that regard for the extreme players.
BR could help with strategic questions concerning lineup selection or how to efficiently run your offense against a top notch pitcher.
*** That's possible, but I would not rely on BsR for that. Personally, I would use BsR to generate custom linear weights values, and THEN I'd use linear weights to assist in answering those questions. This is what I do, and I am very very confidant in the results I get from that.
...so I'm curious if you have a sense of how using BaseRuns should change the approaches we've all been using. And I apologize if this is a point you feel you've hammered home in your previous articles, because if it is I'm not sure I've understood it.
*** I don't think I've really addressed this issue. The approach is to get away from the "typical" run estimators, because they don't model reality. To quote someone's "Equivalent Runs" or "XRuns" or "Runs Created" almost makes it seem as if those estimators are accurate. They may yield accurate results in some or most cases, but the calculation to derive those calculations are not correct. Suppose that we know that 3 = 6 x .33 + 1. But, I come out and say, well, you know 3 is also equal to (6+1) x .429. I may end up getting the same answer using the same data, but the way I combined the data is wrong. But since most players and most teams do not deviate much from the norm, then, really who cares? It all works out.
However, I care about the extremes, about Pedro, Bonds, Thome, et al. And just because something works "on average" doesn't mean it works in the extreme.
So, to get back to your question, BaseRuns should change your approach as to how you view how runs are created, and should force you to question when you see a run evaluator that "works".
For low-level, or game-level actions, a custom set of LWTS or RE or WE charts is what you want (and I've provided some links above throughout the article).
============== Rob, here is the full "B" component I use. Just a caution: you DON'T need to have all this data. But I have this data, and this is what I am using. You will recognize most from the Retrosheet event files. If you want me to clarify some of the items, let me know. Note: because of "partial innings", you have to be very very careful (which is why I have that last entry). The short answer is that the RE chart at the bottom of the 9th inning of a tied game is DIFFERENT from the RE chart at any other point in the game. Again, if you need the long answer, let me know.
To all: Again, adding each of these components beyond the basics adds very very little to the accuracy of the run construction. But, for completeness, I am providing it.
0.73 Single 1.95 Double 3.13 Triple 1.69 HR 0.05 Walk (0.48) IBB 0.16 HBP 0.80 Error 0.28 Interference 1.43 OtherSafe 0.73 Sac (0.06) Strikeout (0.00) Out 0.81 SB (1.19) CS (0.51) Pickoff (0.35) PickoffError 1.05 Balk 1.17 PB 1.17 WP 0.56 DefensiveIndiff (1.06) OtherAdvance 0.00 FoulError (1.49) implied outs

How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

Italics: yes, this was all my fault. I did not have a proper closing italic tag, and it left everything subsequent in italics. I threw in a whole bunch of closing italic tags in my previous post just to make sure that there was no nesting going on, to close it off, and that seemed to work. Sorry about all that...

How are Runs Really Created - Third Installment

September 17, 2002 - tangotiger (www) (e-mail)

Arvid:
1. It seems almost oxymoronic that BaseRuns doesn't do particuarly well relative to different levels of OBP, but does do very well relative to different levels of OPS. This suggests to me that there is some sort of interaction between the "getting on base" element and the "moving runners along" element that has been lost in the attempt to segregate those two things from one another. For one thing, the probabilities of particular batting outcomes aren't independent of the bases occupied during a given plate appearance.
*** As mentioned, BaseRuns is the first step at the score rate. It does very well with high OPS and HR classes, simply because it handles the HR properly. Further improvement is called for in cases where no HR are hit. This is why I mention that the search is on for a better score rate.
2. I suppose I'm still somewhat put off by the implications that BaseRuns is a "true" or "real" or otherwise aesthetically pure measure of run creation. Even if you look at data on an inning-by-inning basis, it is still an approximation:
BB-1B-1B-HR-K-K-K produces 4 runs, whereas HR-K-K-1B-1B-BB-K normally produces 1.
*** The model assumes somewhat random distribution of events. It does not purport otherwise. You can pick any single example, and any model will be wrong. You need the sample size behind it.
That, in aggregating data to the season level, unusual and random sequences tend to get lost in the noise, is as much an advantage as a disadvantage.
*** No need to aggregate by team though, since this introduces a bias. By aggregating on other terms, as I've done, you get "better" data.
Paul:
is sufficient to justify replacing existing estimators when measuring typical examples within the controlled set of major league baseball teams
*** As I said, if all you care about is the typical example, then the typical evaluators is all you need.
but you can hardly claim to be surprised to meet some challenges when you throw down a gauntlet like that.
*** I have no problem arguing against RC, since it does not have a basis in logic. Its basis is gobbledygook math that is fixed to the sample narrow data, and its flaws are exposed when taking it out of its environment. This is also true of static LWTS. This is not the case with BsR or with custom LWTS.
because as Davenport himself acknowledged, for most situations in actual MLB you�ll do just fine using 1.83 (or even 2), but that if your particular interest is in studying extreme examples, you need a custom exponent
*** I didn't know Clay said this, but this is exactly my position as well.
I�ve found Tango�s tone to be far more modest in subsequent posts on this discussion
*** I must be getting old these last few hours. I'll try to be more "O'Reilly Factor" from time-to-time.
Arvid:
BaseRuns is simply a refinement of linear weights
*** BsR allows the generation of custom LWTS. There's no other relationship between the two.
I would suggest that he discuss the offensive performance of the 2001 San Francisco Giants in terms of BaseRuns.
*** For that you need custom LWTS by batting order by the 24 base-out states. BsR is not appropriate, except to help in establishing the baseline custom values.

How are Runs Really Created - Third Installment

September 18, 2002 - tangotiger (www) (e-mail)

Paul:
No, I only meant modest on a relative scale, where 0 is the amount of arrogance to be found in an average Tango article. For God's sake, there's never a need to turn on Fox News.
Ah-hahaha... the Linear Weights Arrogance Tango Scale! I love it! As for Fox News, they are extremely biased. PBS, CNN maybe, and 60 minutes are really the only good ones out there. Seriously, watch BBC, or other world news and you get such a different perspective on the world. Did you watch those 3 Arabic-American kids from Florida on Larry King 2 nights ago? They were extremely believable, and given the choice between them and that lady, I'd choose them. Of course, the American media was all over them before the King appearance, and since then? Exactly.
You know what else they said when asked if they would sue her? No! They said no! How un-American is that??
Brian:
Excellent summary overall. I agree with almost everything, except
might be to attempt to seperate the run scoring, or the moving over(driving in) components
I have done this in Article 2, under the "building blocks of run creation". The separating into components is what you need to do to get custom Linear Weights components.
I'll have to think about your "8.2" concept. Sounds interesting.
I would like to also see the data shown grouped by number of triples in a game, number of walks in a game, number of doubles in a game
Sure, no problem. I'll try to get that done by this weekend (I usually run my research while my newborn is asleep, which is not often these days!).
It seems that if there is any inaccuracy here, it is likely in ... the relative weights assigned to the impact of the individual events
No, that is not possible. Those numbers were generated such that when using the plus 1 method it yields the exact LWTS coefficients determined by the play-by-play data. Therefore, the inaccuracy would be that we can't simply have such a simple "B" equation.
============
I will post the complete BsR equations that I used by the end of today. What I provided to Rob above was only the "B" equation. I neglected to also include the "Baserunner" portion as well.

How are Runs Really Created - Third Installment

September 18, 2002 - tangotiger (www) (e-mail)

2. Somewhat less accuracy in normal run-scoring environments.
This may very well be true, but that is only because the other measures (except LWTS) are "cheating" to get there. They ignore the constraints of a HR being at least 1 run, they ignore the constraint that you can't score more runs than you have runners, and so that gives them enough wiggling room to force in coefficients to the sample data they have to get the lowest RMSE possible.
Static LWTS values are derived from the pbp and therefore does no cheating. Well, it cheats in that its values can only be applied from the data it was generated from, the typical run scoring environment.
BaseRuns may be 1% less accurate in the typical environments but "50%" more accurate in the extremes. You (?) said that this is 1 step forward, 2 steps back. From my standpoint, this is 2 steps forward, 1 step back.
I don't like that the accuracy of the other formulas is fitted to the typical data, *especially since almost everyone then takes that formula out of that environment and applies it to Pedro, Barry, and Thome*. That little disclaimer is always ignored.
Anyway, to repeat: if all you care about is the typical, use the typical. If you want to know how the events interact with each other to produce runs in various run environments, then you need to use R = BR x scoreRate + HR. For now, BaseRuns is it.

How are Runs Really Created - Third Installment

September 18, 2002 - tangotiger (www) (e-mail)

I think you are onto something here
Thus total outs are being divided by 4.5. If you have 27 outs in a game, that would mean you are counting 6 of them, which it seems to me might be roughly the number of outs made per game with men on base in your data set.
That is not possible. About 45% of all PAs occur with men on base. 65% of all PAs are outs. Therefore, # of outs with MOB is .45 x .65 x 39 = 11
However, I did notice a very interesting relationship between the B component values and the LWTS values in the past. I haven't been able to quantify well yet though. I'm sure you are on the right path.

How are Runs Really Created - Third Installment

September 18, 2002 - tangotiger (www) (e-mail)

btw, the number I derived was 4.25. Not sure what to do with it yet.

How are Runs Really Created - Third Installment

September 19, 2002 - tangotiger (www) (e-mail)

If we go back to article 2, and the definition of the score rate (or just using common sense), we have:
% of runners scorings = (runners who score) / (runners who score + those who don't)
This is the score rate.
Runners who score is represented by the "B" equation. Though, as mentioned astutely in the earlier post, we should strip out the 4.25 (or whatever constant) to represent this actually.
The "outs" portion, the "C", of the score rate represents those runners who don't score, namely those left on base, and those outs on base.

How are Runs Really Created - Third Installment

September 23, 2002 - tangotiger (www) (e-mail)

Give me another week please on posting the formula. I'll have to write a whole article on it, as it's not as simple as I thought I could make it.

How are Runs Really Created - Third Installment

September 30, 2002 - tangotiger (www) (e-mail)

I've added a baseruns article, which is an addendum to the RC series. I apologize for not making it better, but I'd rather get it out there, rather than let it sit on my backburner.
BaseRuns Addendum

Age	#	Long	#	Short	#	Rest
19	2	0.565	1	0.810
20	5	0.687	4	0.871
21	25	0.816	15	0.921	3	0.936
22	61	0.857	36	0.939	9	1.025
23	99	0.930	81	0.999	16	1.007
24	140	0.936	131	0.987	21	0.972
25	140	0.982	167	0.994	79	0.929
26	140	0.989	191	0.994	119	0.973
27	140	1.000	181	1.000	166	0.991
28	140	0.987	141	0.994	185	0.989
29	140	0.981	80	0.966	203	1.000
30	140	0.979			218	0.969
31	140	0.977			162	0.980
32	119	0.954			114	0.976
33	88	0.938			80	0.951
34	58	0.930			55	0.945
35	39	0.885			35	0.930
36	27	0.887			20	0.905
37	17	0.856			11	0.937
38	11	0.842			4	0.906
39	5	0.809			1	1.014
40	4	0.790			1	0.942
41	2	0.672
42	2	0.563