See copyright notice at the bottom of this page.
List of All Posters
Aging Patterns
June 26, 2002 - tangotiger
(www)
(e-mail)
In a study I did in March 2001 (which included the hitter's last year, but used a much larger sample of players): hitters improve their walk ratio virtually every year, they strikeout the least at age 29, get their best HR ratio at age 27, their balls in play success goes down almost instantly, their line drive power stays pretty flat for a long period of time, their speed as measured by triples goes down instantly, their speed as measured by SB peaks at 24 and goes down almost at the same rate as the triples.
My intention is to eventually re-run that study but with the new information I've discovered recently regarding the last year effect.
A thread with all the data can be found here - http://baseball.fanhome.com/forums/showthread.php?threadid=662692#post1958322
Aging Patterns
June 26, 2002 - tangotiger
(www)
(e-mail)
As to your other question on different aging patterns for different types of players, I also took a look at this a while ago. My sample set was pretty small, and so, I wouldn't want to make any strong conclusions based on it, but the evidence was showing that all types of players age the same way. The Tim Raines class of runners would lose his abilities across the board (SB, HR, Hits) to the same extent that the Wade Boggs class of runners would. This is another area that I will be (eventually) looking at.
Aging Patterns
June 26, 2002 - tangotiger
(www)
(e-mail)
I usually only look at 1919 and later because I don't think that "power" is well-represented in the pre-1919 time period. Even HR are not true representation of "power", but it's still pretty good to use.
In that March 2001 study (which I'll reiterate used a slightly different methodology), I found virtually no difference between the aging patterns of the various skillset between 1919-1979, and 1979-1999.
In that study, I concluded the following: "...the historical averages match up very well with the recent period. While today's ballplayers may be better, and playing longer, the "curve" of their aging is the same. There is no age bias with today's regimen of training and medication. It affects all age groups the same."
Aging Patterns
June 26, 2002 - tangotiger
(www)
(e-mail)
jmac: yes, I agree that you have other forces at work. This is why I first presented that large chart breaking up the performances by number of years in the league (for players debut at age 25).
What I can do is present a similar chart for players debut at age 22, 23... etc, so that we can determine a more specific pattern. The only reason I did not do so is that it would be so overwhelming that the reader wouldn't know where to begin. As well, this kind of analysis suffers from sample size issues, and so conclusions will not be reliable. Let me think how I can best present such data.
Aging Patterns
June 27, 2002 - tangotiger
(www)
(e-mail)
In other words (and more simply), age is calculated as of Dec 31st.
I was wondering when someone would say that. Yes, I simply make Age = Year - YOB . Not only is it a snap to calculate, you also don't need to know them month the player was born.
Do you know that batting averages ...follows a normal istribution?
I seem to recall looking at this a long time ago to determine how many single-hit and multi-hit games a player would have, if it did follow such a distribution. And it did.
Do you stick to that in strike years, when 300 PA were harder to come by?...
Yes, on all counts. It's not as if I can say that the reliability of 300 PA in 1982 is similar to 200 PA in 1981. 300 PA is 300 PA. In the cases like this, my sample size goes down somewhat. My other option would be to limit my sample set so that 1981 is not part of the study. I can do this for pairs of seasons, but at the extent that I looked at this issue, I needed to have 5 and 10 and 15 consecutive years. Removing 1981 from such a study would drastically reduce my sample size. Your point is valid however.
I do think the league and park adjustments should be done, so that there is more confidence in the conclusions.
I agree. As my sample size goes down, these adjustments become much more important. As for followup articles, I think I'm going to have to make it a whole series of them because there are so many results that this dataset will give us.
Aging Patterns
June 28, 2002 - tangotiger
(www)
(e-mail)
One of the previous posters mentioned that players who play longer will have different aging curves than those who don't.
Working always from the same sample, I broke my main sample into three subsets. The first subset is those players who play between the ages of 24 (or earlier) and 33 (or later). This is a group of players that have at least 10 years experience, and who have had the chance to play during the "traditional" peak years. The second sample are those players whose career was over by the age of 31. Therefore, these might be those guys who you think might have peaked earlier. The third sample is everyone else. This means, it's a group of players whose career was over after the age of 31. These are just a whole bunch of different types of guys, but skewed slightly towards the older players.
Because of sample size issues, some of the results might look strange. In any case, here it is:
Age # Long # Short # Rest 19 2 0.565 1 0.810 20 5 0.687 4 0.871 21 25 0.816 15 0.921 3 0.936 22 61 0.857 36 0.939 9 1.025 23 99 0.930 81 0.999 16 1.007 24 140 0.936 131 0.987 21 0.972 25 140 0.982 167 0.994 79 0.929 26 140 0.989 191 0.994 119 0.973 27 140 1.000 181 1.000 166 0.991 28 140 0.987 141 0.994 185 0.989 29 140 0.981 80 0.966 203 1.000 30 140 0.979 218 0.969 31 140 0.977 162 0.980 32 119 0.954 114 0.976 33 88 0.938 80 0.951 34 58 0.930 55 0.945 35 39 0.885 35 0.930 36 27 0.887 20 0.905 37 17 0.856 11 0.937 38 11 0.842 4 0.906 39 5 0.809 1 1.014 40 4 0.790 1 0.942 41 2 0.672 42 2 0.563 Hopefully, the formatting comes out here.
What do we see? As you would expect, those players with long careers centered around the expected peak years did just that. They had their best years at the 25-29 level, peaking at age 27. They had a bit of a jump prior to that. Then they had a slowly declining phase to age 34, after which they plummeted. However, don't forget we specifically selected our subset for players who played between the ages of 24 and 33. Therefore, we should not be surprised to see demarcation points close to these ages. Furthermore, the peak point was still age 27. The slowly declining phase is a result of the selective sampling.
The second set of players is far more interesting. These are players, for whatever reason, had their careers over by the age of 31. These players did not have the traditional aging curve. Essentially, they stayed at a peak level between the ages of 23 and 29. Is it possible that there is a class of players that don't get to the next level? That perhaps, there is a class of players that peaks at age 23 and stays there? Or is this again a bias in our sample? That because we specifically chose our players whose careers ended prior to age 31 then this is exactly what we expected to see? This is a much more likely result. Management did not give the players a chance to show their stuff, and simply cut them before their truly good seasons could be shown.
The last set of players has a bias to be an older type of player, and the results show this. These players peaked between the ages of 26 to 32, peaking at age 29.
Selectively choosing your sample set leads to many biases.
Aging Patterns
July 1, 2002 - tangotiger
(www)
(e-mail)
Do you feel the Linear Weights Ratio is a much better measure of offensive performance than OPS? If so, what evidence do you have? If not, may I suggest you use OPS+ in your future studies? On June 26, I posted the following on fanhome "[OPS]'s extraordinarily useful and practical because: - it's readily available - it's made up of the two most important rate stats we have - it's highly correlated to runs scored - it can be used in research when you have the power of sample size that masks its deficiencies
It is NOT useful because: - you can not count on it for game-level decisions - you can not count on it to evaluate players of weird profiles - it does properly weight all the events
So, depending what you are trying to do, OPS is either a godsend or a bane.
The reason I hate it is that people use it for the exact reasons that it is not useful."
In sum, OPS is used as a stand-in when you don't have something better. LWR is something better.
I'm not sure why I need to show "better" evidence to use LWR over OPS. All the deficiencies of OPS are taken care of in LWR. I can use LWR to convert it into Runs Created or Runs over average, or really anything, in a simple one-step process. LWR is the best "rate" measure we have. (If you want more discussion on LWR, you can check out my site at http://www.geocities.com/tmasc/lwr.html which will give you the full formula, as well as a link that discusses LWR.)
******** Not only will this take care of the adjustments for park and year which you indicate are necessary, but it will make it easier to incorporate your findings with other studies that are OPS+ based...
I have the data to calculate the park/year adjustments, I just didn't want to add another layer of complexity. (If someone wanted to reproduce the above research, they could. If I added park and year adjustments, they couldn't.) As I indicated, I'll add that layer the next time. All studies that are OPS+ based are flawed for the reason that they rely on an indicator that has deficiencies that are circumventable.
Felipe Alou: Is He Afraid of the Walk?
November 13, 2002 - tangotiger
(www)
(e-mail)
I know that Walker hated the way Felipe would talk to him about hitting approach.
The first poster is doing exactly what I said we shouldn't: looking at team totals.
However, the first poster is correct that Alou does have a choice from within the 15 hitters which ones to play. But Alou was not dealt a good hand. If you're given the bottom of the barrel, you should expect to have low walk totals period.
The team was selected by the GM, and therefore, the low team walk totals is more an indication of the type of team that the GM has selected.
I'm going to continue the analysis tomorrow, looking at it from another angle. I don't know what the results will be, so I'll report whether they favor Alou or not.
Banner Years
October 31, 2002 - tangotiger
(www)
Good comments, guys. I actually meant to address these two issues, and I'm glad you brought them up.
Age: definitely should be looked at, but I can tell you that there is no big age bias, even with the 149 group. I will do the breakdown, hopefully before the end of the day.
Banner selection: one of the considerations I had was that I did not want to select players that were say 110-110-110-140, because of the "regression towards the mean" issue I brought up. That is, even though you've got a guy who you think is a 110 level for three years, he might actually be 107 or 113, etc. The closer you are to 100, the more likely it is that this is a 100 player. Furthermore, by introducing all players, then I get into trouble with losing players. While it is unlikely that you will lose a player from the pool who is 100-100-100, it is very possible that you might lose a player who is 80-80, thereby introducing a bias. Of course, this also depends on position.
Having said that, I was thinking about running through the data anyway, and see what happens. And with the much larger sample size this would allow me, I can select 35% above "previous 3 years" to really highlight the banner years. I'll try to get to this next week.
Banner Years
October 31, 2002 - tangotiger
(www)
Walt, good comment at the end, and this is exactly what I did with the HR study I linked to. And rather than seeing a "retention", we see essentially that what the player did the year after the banner year was repeated the year after that as well.
Again, what I am talking about is not "retention", even though I used that term. We are presuming that a player's performance level is a sample of his true talent level. Therefore, by selecting 130-100-100, I am choosing those players that had a great year followed by 2 average years. This does not imply that this player had an injury or something that forced him to go down to 100. The more years I tack on, the smaller my sample size. You are correct that I can simply show year1, select on years 2 through 4 (whether 100-100-130, or 130-100-100), and then look at year 5. My guess is that year 5 production will be only slightly different than year 1 production, age notwithstanding. This is a good idea, and I will run that next week as well.
Walt: any comment about the Hank Aaron issue?
Banner Years
October 31, 2002 - tangotiger
(www)
MGL, yes, I agree with almost everything you said. Two points:
1 - yes, not my best writing work, as I wrote it in 30 minutes, but what is it that is unclear? Was it the weighting thing at the end? It basically means that you put more weight on the most recent year, and you have some weight to regress towards the mean. Or was it something else?
2 - As for the 149,149,149, which I selected for, the 4th year was 142. However, you then say that this group is actually a "147" group . Not! Because my group is "fairly large", then I would say that this selected group of 149,149,149 is a 142 player. And, if I looked at the 5th year, I would bet that this group would also exhibit 142. I would also bet that the year prior to 149 would also be 142. I would say *every* year around the 149 years would be 142. Do you agree? (Age of course is an issue if I start going crazy and start to consider pre-24 and post-36, etc years.)
However, for a single player, if I had a 149,149,149,142 player, since I didn't select such a player, then I would have to guess that he is a 147 player.
I think we are on the same page, but I'm not sure.
*** As for parks and changing teams, etc, yes that is always a problem. It's "possible" that the park may play an influence in the selection of my players, but I doubt it. The banner year was 25% above the base years, and so, while playing at Coors does increase the chances that he will be selected in the banner year, I don't think this is the case. I'll look into it though.
*** By the way, the more I look into this, this is just like MGL's hot/cold streak study. While he is looking at 15-day periods, I am looking at 3-year periods. We are (or will in my case) looking at the pre-selected and post-selected period, and we are (or might/will in my case) finding that those two values pretty much match, regardless of the intervening period.
Banner Years
October 31, 2002 - tangotiger
(www)
First off, I'm not trying to capture ALL banner years, just some of them. As well, I am not suggesting at all that 149,149,149 is banner performance. I am using that type of player to show that a 149,149,149 player is not in fact a 149 player but a 142 player.
So, when you look at a 130,100,100 player, a player that certainly had a banner year, we should treat the 130 with some hesitation, since, as we've seen, this performance was "lucky" in some respect.
****
Anyway, I've re-run, so that we have "x", 149,149,149, "y". That is, how did the players with 3 great years do just before the "banner 3 years", and just after? Here are the results
Year 1 1.42 Year 2 1.49 Year 3 1.49 Year 4 1.49 Year 5 1.41
This population of players had 593 strings.
Now, if we break it down by age (in Year 5), this is what we get: Age 1 2 3 4 5 n
34+ 1.46 1.51 1.51 1.48 1.37 173 30-33 1.44 1.49 1.49 1.47 1.41 229 29- 1.36 1.46 1.49 1.51 1.45 191
Again, as you can see, the "3 selected years", were pretty constant around that 149 level. The before/after years are consistent with the age grouping. But, in all cases, the before year was less than the selected period, even for the old guys.
There is also about an annualized 2% change in performance level between year 1 and year 5, which is also consistent with my findings in aging patterns previously done.
So, the "true talent level" is year 1 and year 5, and everything in-between is "lucky".
Banner Years
October 31, 2002 - tangotiger
(www)
MGL, sorry for the bugaboo.
To go back to your question, let me amplify. The 149 performance is regressed 14% towards the mean to match the "expected probable" true talent level of 142. So, generally speaking, we should regress all 3 year performances by 14%.
Now, of the three remaining components (year x, year x-1, year x-2), we weight the most recent seasons (x) as 38%, and the other two as 24% equally.
As a shorthand, rather than remembering kooky percentages, you can apply integer weights of "3" for "x", "2" for "x-1","x-2", and "1" for "mean". Maybe I should have skipped this part, as it's probably more confusing than it should be.
Banner Years
November 1, 2002 - tangotiger
(www)
Good job, MGL!
The mean of the players who played in those 5 year spans, with at least 300 PA is 115%. Now, this may sound like alot, but don't forget, we have alot of repeating players in there (like Aaron).
I don't think that the regress towards the mean would regress to 115%, but I'd like to hear from the statistics-oriented fellows about their thoughts on this matter. I would guess at this point that the Aaron situation comes up, and I should identify unique players only.
Banner Years
November 1, 2002 - tangotiger
(www)
Since age is an issue, and I can easily control for it, I will re-run using that.
As well, the "mean" of the players is 115%. If we look only at one age group for the 5 year period (say ages 26-30), we see that each year they average 115%. If we select any other time period like 24-28, you also get similar results. And of course, no player could possible exist more than once in each age group. Therefore, the mean is 115%.
Therefore, I should probably select players that center around 115%, and that center around the 27 age group. I'll get to this next week.
Banner Years
November 1, 2002 - tangotiger
(www)
MGL, maybe you missed my last post, but if I only look at one 5-yr period, say ages 24-28, then of course Aaron can only exist once in this string. And, the players in this group are 115% of league average. Now, if I select some other age group, the unique players in that group are also 115%.
However, if I decide to combine the two groups, I might have two Aarons, and two Ruths, etc. I don't see why I would want to remove one of them from the groups.
I think it would be easier to keep all the age groups separate (24-28, 25-29, 26-30, etc, etc) and report on each one separately. This removes the conflicting players, but addresses the Aaron issue. However, I don't see the problem in then combining these three age groups afterwards, AND KEEPING the mean at 115%.
Or maybe I'm missing something?
Banner Years
November 2, 2002 - tangotiger
(www)
Contrarian: I've already admited my shortcomings in many areas, including statistics. I've taken enough that I can follow conversations, but that's as far as I would take it. I also know enough to apply the basics. This is no news to people who've been reading me, and any of my comments should be taken like that.
I am always interested to hear from Walt Davis, and frankly I just missed his second post (the way Primer regenerates the site, there is a lag, and Walt's post got sandwiched in-between).
I have no problems with people criticizing my approach, or my comments, or anything I do. It would be nice though if you would provide an email address so that we can correspond privately, and you can elaborate further.
Banner Years
November 4, 2002 - tangotiger
(www)
Sancho, thanks for the links. The first one I had not seen, and is a not bad one. As for Albert, I'm frankly disappointed. There's a long list of math professors who have tackled baseball issues, and really either miss something, or write so dry that I miss something. (Of course, there's an even longer list of sabermetricians who miss some math issues as well.)
Banner Years
November 5, 2002 - tangotiger
(www)
(e-mail)
Shaun, I agree age should be taken into account, and I'm currently working on this. I should have something to show as soon as I get the time (which these days is not too much).
As for contract status, certainly this would have an impact. However, by having an aggregate of players, this impact should not be noticeable too much. And of course, since my data is from 1919 onwards, there's an even smaller population which would even be affected by this at all.
As for learning and improving, etc. This is the issue. Is it the case that the player is learning and improving, or is it simply random chance that the player happens to have a banner year. Hopefully, with the new data I have, we'll have a better answer.
Banner Years
November 7, 2002 - tangotiger
(www)
(e-mail)
MGL, no F James specifically said that these 149,149,149 players would not regress, except for aging. In fact, they do regress to 142.
This group of players will regress towards THEIR mean, I agree. In fact, they will regress 100% towards their mean. But since we don't know what their mean is (without looking at other non-sampled years), we take the next best thing: the mean of the population they were drawn from. This mean is in fact 115%. Therefore, given the number of years (3), the number of players (I don't remember, let's say 100), and the number of PAs (let's say 500 / player / year), the best players will regress 7/34 (20%) towards the mean of the population they were drawn from. Different years, different # of players, and different # of PAs will regress differently.
Now, I know little of statistics, and perhaps Walt Davis or Ben V can put this matter to rest.
I'll be back in a week or two with detailed data, broken down by age.
Banner Years
November 8, 2002 - tangotiger
(www)
(e-mail)
F James: I think I wrote this already, but it might got lost with all of MGL's explanation, but the year before the 149,149,149 string was 141 and the year after the string is 142. Subsequent years drop off slightly from 142, and in fact matches what you would expect from normal aging. (This will become more clear when I do the breakdown by age... eventually, whenever that is.)
Essentially, MGL's point boils down to: whatever period you take, how many ever players you take, how many ever PAs that performance makes up, you have to regress to some degree. The amount to regress is related to the variables I just mentioned. By choosing 1 day, we are regressing almost 100%, by choosing 5 years of performance between age of 25 and 29, and in each of those years the player has 1 google PAs, you regress close to 0%. Everything in-between is subject to more analysis.
Given my sample of 3 years of 600+ players of about 500 PAs, the regression of the 149 player is 20% towards the mean of 115 to achieve the true talent level of 142 (more or less).
Let's Contract Two Different Teams
July 12, 2002 - tangotiger
(www)
(e-mail)
Proofreader guy: you know, I read and reread and re-reread my article, and it amazes me what I miss. How about "here" for "hear", and "marker" for "market"? Competitif is french for "competitive", so I don't think I can use the french excuse.
Common Sense: do you think that if Steinbrenner reduces his payroll from 140 million to 90 million that he will give that 50 million$ of savings to you? In fact, don't you think that now that he set up the YankeeNets that it will be very easy for Steinbrenner to claim much less revenue because the YankeeNets corporation owns the Cable rights, and not Steinbrenner?
If teams claim that they can't play in the same playing field as the Yankees, then either level the playing field by introducing teams into a lucrative market to siphon off some of that revenue, or take some of that Cable money, or realign the two leagues by market size. Let the Yankees and MEts and REdsox and Braves and Dodgers spend themselves crazy. Let the A's and Expos and Royals and Twins spend smart.
To think that by controlling player salaries that you will get an outcome that is different from today is ludicrous. Nothing is going to change. In 5 years, you'll be right back to where you started.
Let's Contract Two Different Teams
July 14, 2002 - tangotiger
(www)
(e-mail)
There's no question that we are introducing accountants into the fold with the owners' plan. As if lawyers aren't bad enough. How many white collar solutions do we have to introduce to "solve" the problem?
Just re-align based on market size. 4 divisions of 8 teams. The top team of each division goes foward, while the 2nd and 3rd place go into a wild-card system where the 2nd place of Divison 1 plays 3rd place of Division 4, etc. There's no need to force a socialistic solution. Just change dance partners.
There's no need to overhaul anything. If you want to overhaul, then disband the league, and do it right.
Let's Contract Two Different Teams
July 15, 2002 - tangotiger
(www)
(e-mail)
Common sense: it seems that you've been getting more and more common sense. How much longer before we get Commen Sense the third?
Seriously, when I say to "contract" the New York teams, I intended it to be in a humorous note. But the point of contracting the teams is to reposition the power that is highly concentrated in the New York teams. Since Steinbrenner is consolidating and hiding his power and revenue in a second enterprise (that exists only because of the first), it is unlikely that he will reduce the market value of his interests.
Why would 29 intelligent men buyout a franchise that has limited value (Expos) when they can buy out a franchise that has substantial value (Yankees). Steinbrenner used the system to its fullest, he capitalized on it with the unanticipated TV value that has created the great divide. Everyone has his price. So, buy out Steinbrenner at fair market price, and redeploy the value of the Yankees by siphoning away the cable and TV value, and selling the rest of the team to an interested buyer. That is, buy Steinbrenner's TV and cable rights away from Steinbrenner.
If that is too hard or too expensive to do (as if maintaining the status quo does not have its own expenses), then just take the "barnstorming" idea to something more palatable. Put the Yanks, Mets, Redsox, Dodgers, Braves, Orioles, Rangers, and Cubs in the "Division 1" league. Put the Expos, Pirates, Brewers, Reds, A's, Marlins, Devil Rays, and Blue Jays into the "Division 4" league.
What would happen? Well, all those Division 1 teams will soon realize that they can't hope to buy their way in because they've got too much competition for too few spots. They'll have to be smart. The Division 4 teams will realize that with just a little effort and smarts, they'll have a decent chance to make a run for the playoffs.
Once in the playoffs, anything can happen (especially if you make the first round 5 games instead of 7).
Without spending a single dollar on either side, we can reshape the entire competitive balance by simply changing divisions.
And what's more shocking: that I say to redistribute the wealth of the Yankees to the poor teams, or Selig redistributing the wealth of the Expos to the rich teams?
Let's Contract Two Different Teams
July 16, 2002 - tangotiger
(www)
(e-mail)
Willy Loman is all in favor of the American dream. And I didn't say to steal it from the Boss, but buy it back from him. MLB made a huge error in not securing the TV rights the way the NFL did. Now they've got to pay for it. Literally. Once they do that, the chips will fall into place. But to restrict player salaries through non-American ways? I don't think so.
Let's Contract Two Different Teams
July 24, 2002 - tangotiger
(www)
(e-mail)
I think the soccer relegation/promotion idea is viable. But I question the 30 teams/league decision. The disparity will still exist. Why not have a 12 team premier league, 24 team division-1, etc, etc. Which just brings us back to my proposal of having leagues segregated by market size, but having ALL of them play for the World Series. By having each league have its own championship, the fans will question the legitimacy of any except the World Series.
Let's Contract Two Different Teams
July 24, 2002 - tangotiger
(www)
(e-mail)
As for pay for performance, why not simply limit contracts to 1 year? And make everyone a free agent? That would make it truly free market. You'd end up paying rotisserie style prices (about 15% to your top player), because of the abundance of supply. So, a team with a 60 million$ payroll will pay say Mike Piazza 9 million$. A-Rod would have a tough time getting more than 15 million$.
So, we have a mechanism that can severely limit top players' earning potential. All owners have to do is declare everyone a free agent, and no more guaranteed contracts. Too hard. It's like going to the Playboy mansion and being told you have a chance with 1 girl, and 1 night only. The owners want control, and they want to feel empowered. It'll cost you.
Let's Contract Two Different Teams
July 31, 2002 - tangotiger
(www)
(e-mail)
It would turn exactly into a rotisserie style system. The top guy would get at most 20% of the payroll. In any case, it doesn't matter how much the #1 guy gets. It's the overall payroll that matters. Players will be willing to sign for below market in some cases, simply because they don't want to be left out.
Teams will have their budgets before the bidding starts, and they won't try to run up prices, because of all the other fish in the sea.
Owners need help controlling themselves, and this is the best way. And if they overpay? So what, it'll only be for 1 year. You won't have all those 5 year contracts guaranteed to worry about.
Forecasting 2003
February 13, 2003 - tangotiger
(www)
(e-mail)
I still have not decided how to "rank". With only 32 players, using differentials or RMSE might not be the most appropriate (esp with the Bonds thing). I could create "classes of differentials" (consider each class to be 1 SD of error, and max out at 3 SDs or something like that). Or I might use differentials, while capping the individual differential at 3 SDs. Really, it's not important. I'm going to present the full data, and the reader is free to analyze and interpret the data as well.
Forecasting 2003
February 13, 2003 - tangotiger
(www)
(e-mail)
...retrospectively, using the same methodology, at last year or other prior years?
I guess surprises are out of the question around here! Voros has looked at the various forecasters for the year 2000 hitters . I was going to also add in the "baseline forecast" to his list to see how that stacks up. Stay tuned in a couple of weeks.
When do fantasy drafts usually occur? The last weekend in March?
Forecasting 2003
February 13, 2003 - tangotiger
(www)
(e-mail)
I probably should have said this in the article.
If I were to ask the Primer readers to estimate 200 players ERA or OPS, I'd get a smattering of response. By limiting it to something reasonable (32) I hope to get a decent participation, while at the same time getting reasonable (though not conclusive) results from the forecasters. This is similar to what the WSJ does with using the top 10 picks from the brokerages. The intent is not to prove anything. I also selected those 32 players who showed the most deviations, and therefore, we'd expect the forecasters and the Primer readers to have little agreement on these.
I have also asked the forecasters to participate in a second parallel study, where they would submit the projections for a large number of players. I've only received a positive response from 2 of them. This is essentially what Voros did with his study, except he did the hard work by compiling everything himself. I can understand that the forecasters don't want to give everything away (which is why it was easy to ask them for only 32). I hope though that by the end of the season, they'll give me their list, so that I can save some work. So, you'll get the study that you are looking for, plus the other readers will have some fun (I hope) as well.
I hope this answers your concern.
Forecasting 2003
February 13, 2003 - tangotiger
(www)
(e-mail)
Yes, included with the ballot for the 32 players will be your estimate of MLB OPS and ERA (which will default to the 2002 level if you don't choose anything).
This is critical because if a forecaster underestimates all his projections, it doesn't matter, as long as you only use his system. Therefore, that's not a bad thing.
Really, I wanted to ask everyone to submit their OPS/lgOPS, but that loses too much meaning.
Great question!
Forecasting 2003
February 13, 2003 - tangotiger
(www)
(e-mail)
By the way, if anyone has a systematic forecasting system, then send me an email. It could really be based on anything, like
- weighted or unweighted recent performance - lefty/righty splits - gb/fb tendencies - comparable players - age - height/weight - position - regression towards mean - injury historyYou don't have to tell me how your engine processes everything, but just what/how does the engine consider. I can then throw you into the systematic forecaster pool. Thanks...
Forecasting 2003
February 17, 2003 - tangotiger
(www)
(e-mail)
Just to reiterate (or maybe iterate, since I was not very clear), the point is not to figure out who has the best forecasting system, but rather if a systematic forecasting system is any better than a baseline or back of the envelope (card) system.
What the WSJ study shows is not that the Lehman brothers have a better forecasting system (hard to say with only 10 stocks) but rather that the mom&pop do better using a baseline (the S&P500 index) than in paying off the professionals.
To determine which professionals are better, you need far more than just 10 sample points, and the WSJ also does this by looking at all stock picks. This would be part of a second parallel study if I get a decent participation from the forecasters as well, similar to what Voros did in the 2000 link I provided. However, given that I've chosen 30 players who have very inconsistent performances, I think it might show something about the forecasters, but will be far from conclusive. (If I had chosen the 30 most consistent players, my guess is that all systematic forecasters would come up with very very similar estimates. I've removed the Colorado and the inexperienced players from the study, and there again, some forecasting systems might be better with those players.)
Forecasting 2003
February 18, 2003 - tangotiger
(www)
(e-mail)
Erstad: good call!
The next set of players that missed making the cut were, in order: Renteria, Erstad, Beltran, Sosa, Javy Lopez, Mark Loretta, Vina, Giles, Magglio Ordonez, Garret Anderson, Ben Molina.
Forecasting 2003
February 19, 2003 - tangotiger
(www)
(e-mail)
Minks: no, you would only supply the unadjusted OPS. The only reason to supply lgOPS and lgERA would be to establish your basis. Suppose that you miss all your OPS projections by 50 points, but that you also projected the lgOPS to be off by 50 too. Then, this scores 100% (in my book). A person using the results of such a projection will be perfectly happy (as long as he uses only this projection).
David: the back-of-the-card forecasters are just like mom&pop investor. They each have access to their own private information and public information and intuition, and combine all their data into some sort of target price for a stock. The collection of all these investors makes up the market. You can benefit from this "wisdom" by buying the S&P500 index (SPY). The systematic forecasters follow a rigid, repeatable process, like the various brokerage houses, like Lehman and Smith Barney. The baseline is the monkey throwing darts at a stock chart. So, whether I am comparing apples or oranges I don't really care (for this study). I'm trying to put this study on the same plane that the WSJ puts its study in.
A second parallel study, looking at extended picks that the systematic forecasters provide (which the WSJ also does when selecting their best analysts), might satisfy the fruit requirements.
Forecasting 2003
February 21, 2003 - tangotiger
(www)
(e-mail)
Vinay, excellent. I did not know about this. We are essentially after the same goal, but where they have 27 humans projecting 125 players, I'm hoping to get the reverse (100+ humans projecting 32 players).
What is very interesting to me, and which matches the stock market with its S&P 500 index, is that the collective wisdom of the market matches the top forecaster, with all of his intricacies.
The "missing big or getting big" projections of Wilton, I think is probably attributed to lack of regression towards the mean in that system. I'd have to look at the data more carefully though. Because we are dealing with sample performances, you should expect a few guys to have seasons that are out of the norm, and therefore a system like STATS or Palmer will miss the outliers at the gain of the large population. Silver's PECOTA should give the readers the best of both worlds.
Forecasting 2003
February 24, 2003 - tangotiger
(www)
(e-mail)
We are trying to forecast a player's performance for the upcoming year. This performance is a combination of a player's expected true talent level, context in which that talent will manifest itself, and luck.
ERA has more luck (from the pitcher's perspective) than other measures. The point of this forecast is to try to predict a player's performance numbers, with the reader trying to do as little as possible.
The 2003 Projections
May 6, 2003 - tangotiger
(www)
(e-mail)
I didn't think about that sort of thing when making my projections; they were more seat-of-the-pants than that, and I assume that they were for most people.
I hope this is the case, as this is what I was hoping for. Can you get 100+ baseball fans to make seat-of-the-pants calls on extreme players, average them, and come up with something decent? We'll see in a few months...
Crucial Situations
December 3, 2002 - tangotiger
(www)
(e-mail)
Really? Hmmm, are you using an old version of Netscape? What's your browser version?
Crucial Situations
December 4, 2002 - tangotiger
(www)
(e-mail)
...but the shading doesn't print for me. Just pages of empty grids.
Hmmm... maybe I should put text and color? I'll see what I can do about that.
In some innings, does the blue shading bleed into the -4/+4 run columns (or even further)?
Good question! I was thinking about that, but since I used the same program as for Bonds, I limited to -3/+3. Maybe next time I'll expand to something larger.
...although the column headings would be better if repeated every half-inning, not just every inning.
Thanks for the suggestion! My artistic skills are not what even an average person has, so any formatting improvements suggestions are appreciated. I'll do this next time as well.
But is there anything here that isn't intuitive? I was a bit surprised by how much the leverage changes as soon as you get one guy on base, especially the late innings.
...but can there really be an argument for pinch hitting for guys (other than your pitcher) in the 3rd or 4th inning. I mean, you'd run through your bench, pretty fast. I mentioned at the end that that was not what I was suggesting. Though I would consider this if my batter was Ordonez, and Piazza had the day off.
...like what to do when you're in a particular colored box, so the practical value can be perceived. In contrast, your earlier, similar piece on when to walk Bonds seemed eminently practical, ...
There's really no end to this WE stuff. Eventually, I will be producing charts for the SB break-even points, when to bring in your reliever, should you go for the DP or try the runner at home, should you test the RF's arm, etc, etc. Any suggestions you can offer would be appreciated as well.
What are your definitions of 'Very high-leverage', 'High Leverage', etc.?
It gets a bit dry (series of math equations), but I just picked some arbitrary threshholds to try to distinguish easily the various situations. I could have put +.054 wins and +.013 wins, etc, but who the heck knows what that means?
Crucial Situations
December 4, 2002 - tangotiger
(www)
(e-mail)
I'll do my best.
This is what you do: 1 - Determine the WE for every inning/game/base/out for an average team. I've provided a subset of that in the initial link.
2 - Assume that your "great pitcher" or "great hitter" or whatever is going to come in for 1 PA. What is the expected WE following this player's PA? (I used a player whose component stats translates to a .750 win%)
3 - Take the difference between the two. That is the impact in wins of a "typical super-great" player for 1 PA.
The biggest swing, in this example, is about +.07 wins, and that occurs in the bottom of the 9th, home team up by 1, and you have men on 2b and 3b and 1 out. That is, if you bring in say Pedro or RJ or Mo Bonds or Thome or Giambo for ONE SINGLE PA, he will have an effect of .07 wins (assuming these guys are .750 players) over an average player.
How much is +.07 wins? Well, the typical star is +6 wins in 600 PA (+.01 wins). If you bring in Giambi IN THIS PARTICULAR SITUATION 100 times, he'll have as much impact as playing full time.
Now, now, you won't have this situation 100 times, and not having Giambi regularly in the lineup might even mean you might have this situation zero times, who knows. But this is the magnitude of the impact.
So, while Theo Epstein and Bill James are saying that tied games in late innnings are very important (AND THEY ARE!), my research shows that up by 1 for the fielding will have more of an impact to have a great pitcher pitching.
Anyway, the thressholds I used are .01 / .02 / .04. Just made them up to try to get a balance to the chart. Well, I used the .01 because that's what a great player is worth randomly. And .04 cause that would make it 4 PAs in a game. So, given the choice to hit Piazza 4 times randomly, or once in the "very high-leverage" situation, it's a wash. Of course, if that situation doesn't come up, well, you lose on the deal.
Is that enough detail? Too much?
Crucial Situations
December 4, 2002 - tangotiger
(www)
(e-mail)
Chris, you got it!
If you followed my "Runs Created" series, it shows that the "run environment" (really WIN environment) already exists when the batter/pitcher matchup comes up. That is, the runners on base already have a built-in chance of scoring, given the environment they are playing under.
So, if you then introduce a great player into the mix at that point in time, the entire environment changes. Now, the chances of winning change (sometimes drastically). With 1 out, more damage can be done (not only with the runner on base, but with the batter getting on base). You bring Bonds as a PH with 1 out, not only is the guy on base likely to score, but Bonds will now put himself in a position to extend the inning.
Crucial Situations
December 4, 2002 - tangotiger
(www)
(e-mail)
Oliver, I really don't know. You'd really have to compare how teams should make their choices optimally against how they really make their choices. And you'd have to break it down by the kinds of choices as well (steals, sacs, taking extra base, throwing to wrong base, bringing in the wrong reliever, batting order, etc, etc, etc). It's gotta be a few wins at least. I don't know, 5? 6?
In the business world, I would perform a cost/benefit analysis. But, the reason I'm doing all this baseball stuff is so that I can get away from doing these boring dry cost/benefit reports! Please don't make baseball like a job for me!!
Crucial Situations
December 5, 2002 - tangotiger
(www)
(e-mail)
Here's a printer-friendlier version
http://groups.yahoo.com/group/tangotiger/files/crucialpa.pdf
Still no text, though.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
Vinay, the starter is almost exactly 1.00. I looked at two starters, one who was good and went long (Blyleven), and one who went short and was not so good (Knepper). Bert was .99 and Bob was .98.
As for historical, I have the LI for the 20 pitchers with the most relief games from 1974-1990:
pitcherid Leverage Index suttb001 1.90 smitl001 1.76 fingr001 1.75 gossr001 1.72 rearj001 1.69 smitd001 1.52 orosj001 1.48 laveg001 1.46 quisd001 1.45 mintg001 1.45 tekuk001 1.41 garbg001 1.38 lyles101 1.35 campb001 1.31 stanb001 1.30 martt001 1.24 leffc001 1.20 hernw001 1.18 baird001 1.10 andel001 1.03
This is the LI only while as a reliever.
Sorry, but my data is limited to the pbp provided by Retrosheet.
I agree that doing the LI by year, and then doing the multiplication by year would add the "timeliness" aspect as well. I might do it for one of these guys, maybe Quiz.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
Thanks... good question. I haven't calculated it yet, but I have another tool (which for a lack of a better name I call the Tango Distribution), which shows the expected runs per game distribution, given the runs per game of a team. Using this, I can figure out the win% for any two teams, broken down by run differential. Surprisingly (to me), there is little difference between a .400,.500, and .600 team in terms of "number of close games". I would suspect that I could extend this to "number of crucial situations" as well, and therefore, expect that most team face a similar number of crucial situation. The more you get away from being a .500 team, the less the number of crucial situations. I'm not sure what the relationship is between team win% and crucial situations (yet). I'll keep this in mind next time I'm working on this. Great question!
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
Craig: Mark Eichorn, in 1986, was 1.32. If you look at the list of 20 players I listed in my followups, you will note that Bob Stanley is 1.30. I think this is probably what you'd find with your multi-inning non-closing firemen. Eichorn did have 10 saves, and finished half his games, so you might be careful about how you extrapolate his usage to other relievers. In any case, his 157 relief innings is equivalent to 207 typical innings. The guys below 1.00 in LI are the true mop-up guys.
Devin: Your point is valid regarding what it would take. Since the HOF is a self-defining institution, I don't see how I can answer that question with any basis. Writers, like fans, are flying by the seat of their pants in trying to establish the potential impact a reliever has.
Pete Palmer and Bill James try to answer this question by using a combination of SV, GF, and G to come up with a reasonable estimate. I'm offering the same type of solution from a different angle. Therefore, I think it is irrelevant how we think they impact, and how they've changed the way the game is played and managed. The fact is that the impact of the best relievers, while real, is not substantial enough to catapult them to the levels of the superstars. And the best of the lot is good enough to put them in line with star pitchers who lack longevity. This is why relievers are paid they way they are. GMs may have figured out their true value already.
However, your point is just as valid, and that the HOF may not simply be about "overall value". And perhaps relievers do deserve a special spot. I don't know, and I think that the writers also don't know.
As for my sims, I'll run a couple more, like for Quiz and Reardon and Bob Stanley, using their LI. I'll let you know what shows up.
Charles: thank you! I had alot of fun doing this piece! I just wish I could devote more time to this.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
No, you didn't miss it. I explained it in another article.
If you go to the first line of the article, and click on the link, you get a general explanation of how I determined the leverage of the situations. If you then go into the comments section, in one of the December 4 comments, I elaborate on how I derived the leverage values. Hope that's good enough?
I apologize for making each of these win expectancy articles links to links to links. They're all related, and it's very tough (for me) to write it adequately, without making it a mathfest.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
If ever I get pbp prior to 1999, and I get the World Series pbp, I'd *love* to look at Mariano Rivera. He may turn out to be a borderline candidate like Goose or Lee Smith, based on regular season numbers. But when you add in his tremendous playoff performance, that may be enough to get him over.
In fact, I am surprised how little play playoff heroics get. In the NHL, they have the same problem. The NHL and NBA is *all* about the playoffs. But the awards and HOF, etc, is mostly about the regular season. Rarely do you see the two combined. I believe in soccer, they combine all games, regardless of "league". Pele has 1,241 (or whatever) goals, with no split.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
Thanks Colin.
Yes, your point is very valid, and Vinay also brought up the issue in his comment. Essentially, the next breakdown is to look at each PA within the context of the leverage of the situation (which is what the Mills Brothers and Doug Drinen did). So, while Bruce Sutter may be 1.90 overall, he'd have say 30% at a 4.0 leverage, and 40% at a 1.50 leverage, and 30% at a 0.3 leverage. And maybe during the 4.0 leverage situations, that's when he was his best (whether because his manager used him during his peak years, or he rose to the occasion, or by luck), and therefore, he was even more valuable.
It's possible that there is some impact here, especially with the relief wild card.
It requires some upfront work on my part to get the whole thing set up. I'll see if I can devote some time to this. Perhaps after XMAS.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
I think this is the exercise that Vinay did, but in response to Joe's point, let's go through it step-by-step.
Let's assume that Gossage has an ERA+ of 126. Let's assume that he had 251 IP as a starter, and 1558 as a reliever. Let's assume that as a starter, his ERA+ was 100, and as a reliever it was 130. Fair enough?
Now, his LI as a starter was 0.99. His LI as a reliever was 1.72. As we see in the above paragraph, he pitched his best as a reliever. So, if we take his 1558 innings and multiply by 1.72, that gives us his "adjusted typical" innings. Do the same for the 251 x .99. Good so far?
Now, weight the 130 ERA+ by 1558x1.72 and the 100 ERA+ by 251x.99. You end up getting an ERA+ of 127. That's compared to the initial value of 126.
The point is that because very little of Goose's innings came as a starter, the change won't affect much. If this was Eckersley, then that's a different story.
That said, while the impact is small in this case, we should still do the breakdown as I mentioned in the previous post, so that we are leveraging each particular PA, and not applying an overall leverage on the sums of the PA.
Bruce, Lee, and the Goose
December 18, 2002 - tangotiger
(www)
(e-mail)
Good point.
There are two things to consider with "leverage". You can take the position of what was the leverage of the situation, assuming that the pitcher will pitch to the end of the inning. So, if it's top of 9th, 1 out, man on 1B, up by 1, the leverage is not that particular situation, but rather that particular situation as the starting point, until the end of the inning. It could be that that particular PA may have a leverage of "4", but the "starting from that PA to end of inning" may have a leverage of "2".
Furthermore, you can also take the point of view that if a reliever gets himself into a jam that the manager is "bringing him in" to get himself out. That is, after every PA, the manager is deciding whether to bring in his existing pitcher, or bring in a new pitcher.
Remember, my point of view is crucial PAs. So, PA by PA, what is the leverage. I don't know if it's the pitcher or the fielders that caused the change in leverage. And really, I don't care. What I care about is how often did he face a high-leverage situation.
It is important that you don't make a stat do what it wasn't designed to do.
If I were designing a model to decide when is the optimal point in the game to bring in a reliever, such that he will pitch to end of inning, I would have different leverage numbers. And if I design a model, such that my pitcher will pitch to end of inning, plus one more full inning, I'd have again, different leverage numbers.
All these methods are good, within the context of their design assumptions.
Bruce, Lee, and the Goose
December 19, 2002 - tangotiger
(www)
(e-mail)
To add to the point about "how often do reliever cause high-leverage PAs for themselves": Bruce Sutter comes into a high-leverage situation, and he keeps it high-leverage. Fat Rojas comes into a high-leverage situation, and turns it into a low-leverage situation (by giving up 3 run HRs).
So, there are various reasons as to turning a type of leverage situation into another type of leverage situation. It's not just a "if he's bad, then..." kind of deal. It's alot more intricate than that.
Bruce, Lee, and the Goose
December 19, 2002 - tangotiger
(www)
(e-mail)
Gossage gave up .5 more walks than Gooden, but .7 less non-HR hits. Gooden's run environment was slightly higher than Gossage. The actual runs allowed by relievers are a little suspect because of the "accountability" issue. Not that it should be ignored, but just you have to account for it.
On a "rate" basis, of pitchers born since 1950, I have Guidry, Cone, Rijo, Sutter, Blylven, Gooden, Gossage all being "equivalent". In terms of IP or leveraged-IP, clearly Blyleven is the one that stands out here. By the way, John Smoltz is also in this group.
Gossage is borderline, in my view.
Bruce, Lee, and the Goose
December 19, 2002 - tangotiger
(www)
(e-mail)
Paul, welcome! I don't think I've seen you around here? I'd love for Retrosheet to get more PBP, and I'd love to run the 73 Hiller, and the Franco career through their paces.
Walt, my inclination is to say that they are in the pen because they are not Roger Clemens or Greg Maddux. However, they are David Cone or Ron Guidry, and those guys were pretty darned good. I don't have separate standard for catchers or anyone else. I look at it as how many wins did they contribute over some baseline. If you are a catcher, and you only play 120 games, and you are done by 34, then I don't have different standards. Not to say I'm right or anything.
You make a valid point that relievers can be considered similar to catchers (can't play long enough in a season or a career). So, you have to first resolve why they have shorter careers (because of the position, or the quality of the players there). Then you have to resolve if you want to have a different standard.
It's a tough call no matter what your perspective is.
Bruce, Lee, and the Goose
December 20, 2002 - tangotiger
(www)
(e-mail)
If I remember my post, Eichorn had 157 real innings, and 203 leveraged innings. It may be that that's as much mileage as you can get out of a reliever. That, because of all his warm up tosses, etc, etc, you won't get more than that. Then, he's got to do that for 15 years.
However, I don't know if this is a physical limitation, as it is for catchers (who play 130 games instead of 150, and who play 13 years instead of 18, e.g.). If the relievers are physically limited to 200 leveraged innings insead of 250 for starters, and 12 years instead of 16 for starters (just examples), then it may be fair to consider the relievers to have lower standards, like catchers.
However, this should be studied to the extent that catchers' careers have been, before we pronounce sentence.
Even after all this though, people can still choose to not lower the standards for the C/RP.
Bruce, Lee, and the Goose
December 26, 2002 - tangotiger
(www)
(e-mail)
If there's anyone still out there, Eric Gagne's LI last year was 1.83, and Smoltzie was 1.79.
Are Managers Optimizing Their Best Relievers?
December 31, 2002 - tangotiger
(www)
(e-mail)
But first, I'd like to suggest that the most optimal use of the best relievers would generally be as a starter.
Agreed.
Why don't we use the same thinking for relievers? Why is the 9th inning any more important than the 1st?
If you bring in Mariano Rivera with a 6-run lead 50 times, you won't change the outcome of the game, than if you brought in an average pitcher.
If you bring in Mo 50 times with a 1-run lead, the Yanks will win a few more games than if you brought in an average pitcher.
If there's a one-run game, aren't each of the starters' six innings just as vital as the closer's ninth?
I'm not taking anything away from the starters. Their LI is about 1.00.
In fact, I would even argue that the first 7 innings of pitching are MORE important than the 9th because the score after the 7th (and often the 6th) influences the choice of relievers the opposing manager will use.
7 innings of LI of 1.00 is 7 leveraged innings. 2 innings of LI of 2.00 is 4 leveraged innings. Yes, the first 7 are more important, or at least, they have more impact to the final outcome of the game.
Are Managers Optimizing Their Best Relievers?
December 31, 2002 - tangotiger
(www)
(e-mail)
Is it reasonable to conclude that Rivera's LI could be somewhat deflated due to the fact that the Yankees have been consistant winners over the last few years...
I believe I mentioned that as a possibility that the Yanks pay (earn) this price.
I don't recall any mention in the previous article of a correlation between overall team LI and team W-L records (though I would expect really bad teams to have the lowest LI's for their relievers).
On my to do list. I should be able to come up with the LI, on a team-by-team, year-by-year basis, from 1974-1990. I expect the LI to peak with teams at .500, and slowly degrade the more the team's win% is from .500 (on either side).
It would also be fun to see the converse of this study, i.e. what is the xFIP for pitchers with the highest LI?
Also on my to-do list. I just ran a prelimiary report for 1974-1990, and Todd Worrell actually tops the list at 1.97. Bruce Sutter is second at 1.90. The top of the list is all the usual suspects. The first name that I didn't recognize was Victor Cruz at 1.58. Next was Steve Foucault at 1.50.
Among "middle-relievers", Tim Burke was 1.54. He's a favorite of mine, and it certainly looks like he was used prominently. Paul asked earlier, and john Hiller was 1.62. Mike Marshall was 1.51.
Among pitchers with at least 2000 PA, Dave Tomlin was 0.73, and worst of the bunch.
Are Managers Optimizing Their Best Relievers?
December 31, 2002 - tangotiger
(www)
(e-mail)
Thanks, I'm enjoying this as well!
The problem with the "out" is that sometimes an out increases your WE (win expectancy), say a flyball with a man on 3b, of a tie game in the 9th inning. Strictly speaking, you have to look at the change in WE for every possible event, and then come up with the variance (and the frequency of those events). In essence, how much swing potential in winning does a particular game state provide? That's the question to answer.
I'd love a faster computer, as I'm running this on a 650 MHz (but 512 RAM). Sometimes, I have to run stuff overnight.
Are Managers Optimizing Their Best Relievers?
January 1, 2003 - tangotiger
(www)
(e-mail)
I will give a performance breakdown for Shuey and Stanton, among crucial, normal, and non-crucial situations. Look for this in a few days. We'll see if they can "handle" the pressure...
Are Managers Optimizing Their Best Relievers?
January 2, 2003 - tangotiger
(www)
(e-mail)
What we are after is *not* to maximize a pitcher's LI, but rather to maximize their leveraged-innings (LI x IP). LI of 1.00 with 120 IP will have the same win impact as 1.50 LI with 80 IP to a reliever. Of course, it's not that simple, as you have to take the totality of your starters and relievers, and maximize the leveraged innings for the good pitchers, and minimize the leveraged innings for the bad pitchers, such that all innings are accounted for. You have other constraints as well, with respect to the tiredness of a pitcher's arm, etc.
Mark Eichorn, for example, had 200 leveraged innings (LI of about 1.3) in his great year. That is an excellent total.
Are Managers Optimizing Their Best Relievers?
January 2, 2003 - tangotiger
(www)
(e-mail)
All things equal, you are better off having your pitcher as a starter.
Your considerations would be to take someone like Urbina and Wetteland, and determine their level of effectiveness as a starter or reliever.
Say that as a starter, their performance would be a win% of .600. And as a reliever, they would be .650. You know that you can get say 160 leveraged-innings as a reliever, or 220 leveraged-innings as a starter. What do you do?
Compared to a baseline level of .450 (the effective level of rejigging your whole pitching lineup), you get 160/9 * (.650-.450)= +3.6 wins as a reliever or 220/9 * (.600-.450) = +3.7 wins. Essentially, a wash.
So, you really have to go into it deeply, determine the effectiveness level of all your pitchers based on the starter/reliever role, determine how you can best optimize your leverageable innings, and come up with your plan. It's not so easy, especially considering injuries throw a wrench in your whole plan. Unless you are the Yankees.
Are Managers Optimizing Their Best Relievers?
January 2, 2003 - tangotiger
(www)
(e-mail)
The leverage classes were broken up into high-leverage (LI of 2 or greater), low-leverage (LI of 0.5 or less), and the rest.
$H is non-Hr hits per ball in play. All the others should be self-explanatory.
Paul Shuey? He was at his best in high-leverage situations. Mike Stanton? He was by far his best in high-leverage situations. Note the small sample of PAs. Note also that it's easier to get more WP in high-leverage situations, since high-leverage situations occur more often with men on base. In any case, Shuey's WP rate wasn't so high, relative to his other situations.
I think there's some interesting DIPS numbers in there as well. With the leverage situations different, each pitcher gave up fewer hits / ball in play, and fewer Ks as well. Almost as if the pitcher had to bear down in the high-leverage situation, and therefore, has a different pitching approach, thereby lowering his K rate, and improving his $H rate. We may in fact find that pitchers DO control the hits/ball in play ALOT. And it may simply be the fact that once you reach the majors, the pitchers are similar in this regard overall.
Are Managers Optimizing Their Best Relievers?
January 2, 2003 - tangotiger
(www)
(e-mail)
FJM: yes that is correct. The second guy was on a hotter seat, and that's what LI is reflecting. As I mentioned on another thread, LI is not about rewarding a player, but classifying each PA.
Note that a manager is choosing to bring back the same reliever. If he had chosen to replace the reliever after 2 hits with another reliever, we'd have no problem saying that the replacement was on a hot seat.
It doesn't matter who sets the fire. We are capturing the existence of the fire, and we are capturing that the manager is letting someone pitch in that fire.
Doug Drinen's reliever reports works based on when the reliever enters and exits the inning. This metric works great in other areas, for other purposes. Eventually, I'll probaby create an LI for this as well.
Are Managers Optimizing Their Best Relievers?
January 3, 2003 - tangotiger
(www)
(e-mail)
Well, I provided the LI for 10 top relievers of 99-02, as well as the historical LI for all pitchers in the 74-90 time period (see Clutch hits).
As for biased, again, there is no bias. It's a reflection of the game state for each PA. I know what you are saying about say John Franco or Mel Rojas being arsonists.
But it's not like there is a giant in a land of pygmies, even Mariano, that we should be concerned about. In the 74-90 time period, Clemens and Gooden are probably the giants. Their LI are 0.96 and 1.03. Hershiser was 1.03. Ryan was 1.05 and Blyleven was 0.98.
Are Managers Optimizing Their Best Relievers?
January 4, 2003 - tangotiger
(www)
(e-mail)
FJM: Again, I don't know how much effect it has, but I suspect a little. I'll find out eventually.
But again, remember the purpose of leveraged PAs. It's about describing the level of fire during that PA, regardless of whether that fire was arson or not. The manager is bringing back Mel Rojas, the arsonist, for the next PA.
As mentioned in another article, I can also create leveraged appearances, whereby I only note the fire level when the reliever is first brought in. This I will also do eventually. (Drinen essentially did this already.)
It's important to realize that a stat is constructed to answer a specific question, and it should not necessarily be used to answer other questions. Nor is it a shortcoming of the stat if it can't answer this new question.
Are Managers Optimizing Their Best Relievers?
January 4, 2003 - tangotiger
(www)
(e-mail)
If you page up to my Jan 2 comment, you will see a link to Paul Shuey and Mike Stanton, and how they performed in the various leverage situations. Paul Shuey, and especially Stanton, have excelled in high-leverage situations, when given the chance. The sample size is small, so who knows.
I was surprised with Percy too. I thought he was better, but his K,BB,HR numbers don't compare with the best, though he would have come in the 11-20 list.
As for more analysis, I would love to do it. But my time is really constrained. I want to do an analysis on a team-by-team year-by-year basis for the last 4 years, and within that, show how each pitcher performed in the high-leverage and low-leverage situations. There is really so much I want to do, I don't know where to begin.
Right now, I'm taking a break from relievers and concentrating on baserunners.
Are Managers Optimizing Their Best Relievers?
January 6, 2003 - tangotiger
(www)
(e-mail)
David, thanks much! I'm actually using alot of different concepts into all this, so it's rewarding to me as well.
As for 2002 PBP, astrosdaily.net has it, so I'm fine there. What I need is *time*. Can you help me there?
OPS: Begone!
May 20, 2003 - tangotiger
(www)
(e-mail)
The A's have an additional point that by being able to work the count longer, a team can "choose" their opposing pitchers to the point where the average opposing pitcher is worse than by random chance.
They "choose" the pitcher by forcing their opponent to bring in the 10th reliever, because they wore out the starter. While this is certainly conceivable, you would need a whole team of such batters for this to work. As well, there's no guarantee that your team will benefit from it, since your opposition's next opponent might reap the benefits.
In the end, we are talking about a max .20 run difference/GP (see a previous Clutch hit for calculation), if the whole team is like this, and they are the ones who get the benefit. I fail to see how jumping the OBA to 3x from 1.8x would capture this. The "extra pitches" is not a function of OBA, but of (BB+K)/PA. By jumping the number from 1.8 to 3, you are capturing only part of this effect (BB/PA), in a whole bunch of other noise (H,HR,outs). This extra 1.2 is sort of trying to rise above the noise to find the BB/PA. If this is what the A's are trying to do, I don't think they're doing it in the best way. It's hard to comment further, without having the specifics (like James / Todd Walker comment as the best #2 hitter). From what we think they are trying to do, they are wrong.
OPS: Begone!
May 20, 2003 - tangotiger
(www)
The "additional point" thing is what I'm capturing. It doesn't matter if you do: 3*(OBA-.3)+(SLG-.35) OR 3*OBA+SLG-1.25
It's the same thing.
========== As for the "wearing out the starter", Ted is correct in his approach. If you have a team of player's whose "true talent level" was .333/.400, this team would score about 4.5 runs per game. However, because these guys all work the count, they have a synergistic effect in tiring out the starter, and bringing in the 10th man. These guys, because they feed off each other in this manner, will end up with .343/.405 numbers (let's say). Now, all of a sudden, this team of talent of .333 with the synergy effect, acts just like a team of .343 with no synergy effect.
This extra effect the A's are capturing inside the OBA, by overweighting that metric. However, there's no reason to rely on such a noise-filled metric, when what you want is (BB+K)/PA or (pitches/PA). Because of the amount of noise, to try to capture the little extra pitches/PA in the OBA, you have to severely overvalue the OBA to find it.
OPS: Begone!
May 20, 2003 - tangotiger
(www)
(e-mail)
The other reason for using "3" for OPS is if you are actively looking for those types of players. If you really really want guys with high OBA, then you would overweight OBA. You would do this because maybe you feel that it's a better predictor of future production. Or you feel that you need to get the players to toe the company line, or whatever. Guy like Vlad, Nomar, and Soriano would not be properly appreciated in such a system.
OPS: Begone!
May 20, 2003 - tangotiger
(www)
(e-mail)
This is how SLOB*k and SLOB*PA*k (where k is some constant to make things add up nicely) for 6 equivalent players from that last chart look:
81 7681 78
79 79
77 79
74 78
70 76
SLOB by itself works ok, except at the real extreme. SLOB*PA works much better. SLOB*PA is essentially Runs Created, and we already know that BaseRuns is more logical/accurate than Runs Created.
The best one in this group remains static Linear Weights. The best one "on the market" right now is BaseRuns-generated custom Linear Weights.
OPS: Begone!
May 20, 2003 - tangotiger
(www)
(e-mail)
Rob, you know what, you are right! I goofed.
While I was using outs as my baseline in the last chart, I should have used PA instead. Each player on the team should have the same number of PAs, not outs. Let me re-run the chart, and I'll publish the update on my site.
Good catch!
(Vinay, you are right about RC = SLOB*AB, and not PA as I mentioned in my last post.)
OPS: Begone!
May 20, 2003 - tangotiger
(www)
(e-mail)
For the last example, I should have been more careful.
What happens is that I should fix the team outs to something. In my example, I actually fixed it to each player making the same number of outs (440) which is wrong.
Anyway, what I now did (see link) was started with the team outs (3960), and, making sure each player had the same number of PAs, found the 8 typical guys and the 1 variable guy that would produce 3960 outs.
Things actually change. The Best-Fit becomes 1.64 (and not 1.75). I suspect that the best-fit will fall somewhere between 1.5 and 2.0, and for ease, probably use 1.5.
(Static) Linear Weights now looks less good than originally. I like this change, as it shows that the component values should change if the underlying environment also changes.
Custom Linear Weights wouldn't have this issue. Though at this point, I don't want to pronounce that custom LWTS will see all these guys as the same. It would definitely see all the teams as the same (just like BaseRuns). I think there will be some differences among these players though through custom LWTS. I'm not sure how much difference though.
Great catch again, Rob!
OPS: Begone!
May 21, 2003 - tangotiger
(www)
(e-mail)
See link for the values I used. For the categories I didn't use, I set them to "zero". It's not too important for what I am trying to do though.
As for the other question, you are asking if you can only know one thing, OBA or SLG, which one correlates to run scoring the best? I seem to remember Dan Werr doing a correlation study a month or 2 ago that showed the r to be pretty even between the two. That doesn't mean they are "equally important", especially if you have both.
As well, the coefficient itself (1.56, 1.64 or whatever) doesn't specificy the level of importance. If you made lilSLG = 1/4S + 2/4D + 3/4T + HR, all divided by AB, what do you think would happen? The best-fit would be 1.64*OBA + 4*lilSLG. That doesn't make lilSLG twice as important as OBA, now, does it?
OPS: Begone!
May 21, 2003 - tangotiger
(www)
(e-mail)
I agree that it would be a rush to judgement to make any conclusions without having all the information.
While you can conclude that using 3*OBA+SLG is a poor way to evaluate current run production, it's not so clear if you want to use that equation to try to evaluate future run production (or for other secondary reasons). And you certainly can't indict someone or some organization overall. Sample size! You need alot more evidence.
I also agree that being able to work with a group of people, respecting their views, regardless of what it is, as long as they respect your views as well, is very important. Respect, courtesy, professionalism. Isn't that the police motto?
However, I'll note that in the ESPN chat, Bill James said: Baldelli's a lot of fun. In my office we were making fun of some scout who compared him to Joe DiMaggio, but when you see him play you realize what people are reacting to. Of course, he doesn't have DiMaggio's entire package, but he does have more than half of it. I kinda didn't like the first part, which left me with the impression that the stat-heads and the scouts clash behind each other's backs. But, this was a throwaway sentence, so who knows what James meant.
Finally, as for anyone's ability to deal with people, I'm not sure that you can necessarily say that DePodesta is good or bad, nor could you say that with me, or Voros, or anyone else, unless you deal with these people on different issues in different settings (or you have some second-hand knowledge... definitely not third-hand or worse). I don't think that an executive is a better people-person, or can deal with people, than a non-exec.
I agree that arrogance is a turn-off to most people, and that's something that a speaker should be conscious of. Mike Gimbel, who I've had occasion to e-mail from time-to-time, seems like a pleasant enough fellow. But I've heard from many many people that he is insufferable. That by itself, truth or perception, will keep Gimbel out of MLB, in my view.
OPS: Begone!
May 21, 2003 - tangotiger
(www)
(e-mail)
I agree with your comment on the corporate world (as I've been here for...geez, almost 13 years... my "corporate world" anniversary will be in 1 month).
Rather, I'm talking about the ability to be persuasive when dealing with people who have dissenting or at least ambivalent viewpoints, which at the very least involves some combination of:
That sentence alone is interesting to read!
But to really do all of those things well is fairly unusual, and I would guess that among the pool of the 15 or 50 or 500 or whatever leading analysts, there's a lot more differentiation in terms of interpersonal ability than technical ability.
That's an interesting thought too. I'm not sure if there is more differentiation in one or the other, or how you would qualify/quantify all that. And even if the differentiation is more in one category, the impact of that differentiation might not be as much as the other category.
Did it just feel like we had an OBA v SLG discussion? (More differentiation in SLG, but more impact with OBA differences.)
As with everything, there's degrees of impact to everything, and it's rather pointless to label them black/white (not that that's what I think anyone is doing here). Even if you have a terribly insufferable analyst, his work might be of such quality that it tips the scales towards good. Even if you would be able to classify DePodesta as a mediocre sabermetrician (and I'm not doing that), the rest of his skills might be so strong, that he can make an impact with his research, while others might not (even with better "stuff").
The fact that a successful organization has him employed, and he is highly regarded by other successful people, even though his experience is not as vast as other baseball execs, must show that his total package is something to respect highly. He's a mover and a shaker, and he gets things moving and shaking in generally the right direction.
OPS: Begone!
May 21, 2003 - tangotiger
I think that you should give the benefit of the doubt when you can. I've heard nothing but good (in fact great) things about DePodesta, so, without him actually saying anything, I give him that benefit.
Now, I can interpret the 3 thing as being "you know, I've got this great formula, and you know what, this correlates highly to 3*OBA+SLG. I don't use 3OPS, I have my own, but as it turns out, it's close to 3OPS. BaseRuns, which I don't use, is close to 1.6OPS. I'm sure Tango/David don't use 1.6OPS, but their equation is close to that".
I don't think that explanation is unreasonable, is it?
OPS: Begone!
May 22, 2003 - tangotiger
(www)
(e-mail)
David, I agree that the 1.64 value is a little suspect since it is based only on those 6 players that I happen to construct. I mentioned that 1.50 to 2.00 would be the correct value, if you were to look for it.
I've used the plus-1 method in the past, and I find I can minimize the runs error by using 1.83 as the coefficient for OBA. That is, 1.83*OBA+SLG. I think that as long as you use something between 1.5 and 2.0, you'll be ok, or at least better than not. I suppose if you really wanted to find the best-fit via the "plus 1" method, you'd look at 200 regular hitters, and figure it out that way.
(For the uninitiated, the "plus 1" method was described in the "Runs Really Created" series last year. Check out the archives.)
OPS: Begone!
May 22, 2003 - tangotiger
(www)
(e-mail)
Interesting. You know, I'm pretty sure I never include the IBB, but it was several months ago when I did that 1.8 thing. Interesting results though. I suppose we should compare it to the full-blown BsR version in that case.
OPS: Begone!
June 2, 2003 - tangotiger
3*(OBA-x)+(SLG-y)
This works out to 3OBA+SLG-(3x+y) which works out to 3OBA+SLG-k
Therefore, it is irrelevant what "k", "x", or "y" is. Whatever numbers you choose won't affect the ranking of the players, or the degree of their rankings, relative to each other, than if you simply used 3OBA+SLG
OPS: Begone! Part 2
May 27, 2003 - tangotiger
(www)
Nick, very well said, and I especially liked this
...because he generates extra PA at (mostly) his teammates ability levels, not at his own. It would have taken me a paragraph to explain this, but you said it perfectly in half a sentence.
As for the batting average thing, I suppose that's another myth. It's pretty clear that given two guys with the same OBA and SLG, you want the guy with the LOWER BA (though in reality, we're not talking about much difference).
I suppose if you really needed to quantify it, probably something like 3*OBA+2*SLG-BA (I really don't know, but it would be of some form like that). I'll bow out of any discussion on trying to find the best-fit equation using OBA,SLG,BA. I already don't have much use for OPS, and I know I won't like OPSMB!
OPS: Begone! Part 2
May 27, 2003 - tangotiger
(www)
(e-mail)
Jason, interesting thought.
I just tried with a weird environment (OBA/SLG of .393/.493), and in this case, the higher the BA, the more runs scored. I then tried the other way, with .289/.351, and this time the LOWER the BA, the more runs scored.
The "break-even" point seems to be about .360/.450. That is, at that level, the change in batting average (and I checked from .200 to .340) made zero change to the run production of the team.
Great call!
OPS: Begone! Part 2
May 27, 2003 - tangotiger
(www)
(e-mail)
"Key" situation is another topic entirely.
Click the above link, select your "key" situation, and plug in the numbers (on a /PA or /600PA basis). That'll tell you which guy you want.
If by key you mean inning/score as well as base/out, then you need another tool to evaluate it.
OPS: Begone! Part 2
May 27, 2003 - tangotiger
(www)
(e-mail)
I just want to make it clear: do not, absolutely do NOT, rely on OBA/SLG/AVG to make game decisions.
You must break it down to your components, and you must apply those components against the context being faced (base/out states, inning/score/base/out game state, game/pitcher state, etc, etc).
OPS is quick and dirty and has no place in game decisions. Relying on it for some cases will make you rely on it for most cases, and sometimes all cases. That's a bad habit to start. OPS, begone!
OPS: Begone! Part 2
May 27, 2003 - tangotiger
Every game context produces different "win potential" for H, HR, BB, outs, SB, sacs. The values between those components are not static. In a completely "run potential" world, you would never call for an IBB or a sac. But in a "win potential" world, there are many many times that you need to call for the IBB or sac.
OPS, if left to its own devices, would become the defacto mechanism to evaluate game situations, when in fact its purpose is to gloss over player evaluations. I don't believe in taking baby steps, and the long path to get the job done. I also don't believe that we should hand hold the manager for 20 years to lead him to the proper tools.
Give them the right tools for the right job, and let them decide if they want it. If Felipe Alou says that looking at OPS is b.s. to decide whether to walk Bonds, I'm going to agree with him. Should I say that OPS is less b.s. than using BA? A rose by any other name...
OPS: Begone! Part 2
May 30, 2003 - tangotiger
Yes, what you want is win-based LWTS (or a sim). And I would guess that a manager will be able to be right (using only his experience) more often than using just OPS, in a tight in-game decision.
OPS: Begone! Part 2
May 30, 2003 - tangotiger
(www)
1) How can injecting more than 40 extra bases into the same number of plate appearances or outs produce a negative result?
40 extra bases on hits, but 100 less bases on walks.
OPS: Begone! Part 2
May 30, 2003 - tangotiger
(www)
The differences between the top guy and the bottom guy, the bottom guy has: 100 more walks 19.7 more HR 119.7 less singles
everything else is the same.
Straight static LWTS says that works out to +33, +28, -56 = +5, or some such.
OPS: Begone! Part 2
May 30, 2003 - tangotiger
RC has its own problems, magnified substantially when the HR/H or HR/PA becomes out of whack. RC does not model run scoring at all: it just got lucky that it looks like it models it. If you've got a computer, there's zero reason to use RC, when you've got BsR (unless you want to propose a model that's better).
I don't really care about the different denominators. The whole thing of OPS centers around: more good, less bad. The more walks, the more hits, the more TB, the less outs, the better the number. There's nothing inherent in OPS that ensures that the balance is proper. It's just plain old luck that for the run environment of MLB, that it works out that way.
Believe me, if the run environment was half what it is today, or double what it is, there'd be some other "quick" estimator that would get lucky to model run creation.
Sorry for the rant.
How are Runs Really Created
August 12, 2002 - tangotiger
(www)
(e-mail)
Devin, excellent points.
...If the results of the RC formula didn't correspond roughly to actual runs, James wouldn't be using it.
As I mentioned, as long as you are using typical teams in the .300 to .400 OBA range, and as long as the HR/game hit is around the norm, then RC works fine as something useful.
The problem is when you try to extend that to Barry Bonds types of teams (not that they exist) or Pedro Martinez types of teams (and they exist plenty, as Pedro, when on the mound, is his own team).
My point is to make sure that just because the results of Runs Created works on a particular set of samples doesn't mean that you can extend that methodology to other types of things you may be doing.
There's a reaons RC fails, and it's in its treatment of the HR.
2) Okay, my common sense has a problem with a run value system that has events with the same outcome (walk, HPB, interference) having different run values.
Let's take a real simple example: a regular walk v IBB. Since an IBB walk occurs almost always with first base open, then an IBB has zero "moving over" value. Since the IBB is given out much more with 2 outs than with 0 outs, the "run scoring" value of the IBB is much less than a regular walk.
So, based on the frequency of when the events happen, and the effect of each event, the values can change drastically.
As for a regular walk v HBP, HBP occur in more or less random fashion. A walk occurs with more frequency with 2 outs than 0 outs, and with more frequency with no runners on 1B than expected in random fashion. The effect of these two things reduce the "moving over" value of the walk and the "run scoring potential" of the walk.
If you are thirsty for more, I've published PRELIMINARY results on the run values of various hitting events by the 24 base-out states. (I should be publishing an updated table in a few weeks.) From there you will see there are virtually no differences between the walk, IBB, and HBP, as you'd expect.
http://www.tangotiger.net/lwtsrobo.html
Thanks, Tom
How are Runs Really Created
August 13, 2002 - tangotiger
(www)
(e-mail)
Rob, your question on base-out differences in run value can be found here http://www.geocities.com/tmasc/lwtsrobo.html
I looked at the batting order differences of run values, and there was a long thread posted on fanhome. It is not easily digestable, and someday I'll write an article on the discoveries there. But yes, as you'd expect the leadoff hitter's HR value was 1.30 while the #5 hitter was somewhere around 1.47.
John Warren: the steal is an interesting point. The run value of the SB is very independent of the run environment, as the additive value of the SB is around .17 to .21 for the most part. The CS however changes HIGHLY, as the out is the most dependent on the run environment. The break-even point is therefore much lower with Pedro, and more steals should be attempted against him.
Mike: I've previously published charts on win expectancy which I have to update in the near future. There's no doubt that win expectancy is really the most important aspect of analysis since that's what we are after. Again, for those thirsty for more, you can consult my prelimiary chart on WE here: http://www.geocities.com/tmasc/we.htm . Again, where this comes most into play is the IBB. While the run value of a regular walk is .30 runs and the run value of the IBB is .17 runs, the win values are far different. Because the IBB occurs in game situations where it is "controlled" to minimize the impact of win/loss, then it's win value would also decrease.
Thanks for all your great comments.
How are Runs Really Created
August 13, 2002 - tangotiger
(www)
(e-mail)
GIDP: it's worth around -.45 runs. I was thinking of breaking up the "outs" PA into "outs 1, outs 2, outs 3", but decided against it. Maybe I will fix that.
Jason: what I am presenting is how runs are really created. It's the building blocks to whatever it is you want answered. From this, you can generate win expectancy tables, if you like, or the more detailed run values by the 24 base-out states. You can then further extend this to a 24x9 run values that ALSO includes batting order. And from that standpoint, you can evaluate the #9 v Bonds with the bases loaded.
These other run evaluators give no option to do this simply because they are the end to the means. They were built to answer a specific question, and therefore are not very extendable. Play-by-play analysis is very extendable.
How are Runs Really Created
August 14, 2002 - tangotiger
(www)
(e-mail)
Linear Regression
There are certain things that must be understood about linear regression and using it to determine the relationship between hitting events and runs scored.
First, a little background on linear regression. If you have two things, say, the price of a stock and the earnings per share, you can probably find a relationship between these two variables. The higher the earnings, the higher the price of the stock. You will end up with a formula like P = m times E + b, where P is price, E is earnings, b is some constant and m is the slope. The price of a stock, and runs in baseball, is influenced by more than one variable. You end up with an equation that says y = m1a1 + m2a2 + m3a3 + ... + b. Linear regression lets you input the independent variables a1, a2, a3..., the dependent variable y, and solve for m1, m2, m3..., and b.
Here are 4 major problems with using this in baseball: 1 - Linear regression is LINEAR. Linear as in a straight line. While there is a somewhat linear relationship between runs and singles, doubles, triples, and walks, there is NOT a linear relationship between runs and HR, or runs and everything else like SB, WP, BK, etc. Baseball is non-linear.
2 - The independent variables are not independent. There is an interdependence between all these variables. A walk is only worth what it is because of the other things that happen. Linear regression attempts to "freeze" all the other variables when calculating the value of the unfrozen variable. As your run environment increases however, we know that the values of these variables change. Baseball is interdependent.
3 - Even if you assume for ease that run creation is linear and independent (a safe assumption for very controlled environments), what sample data will you use to run your regression against? Most people will use team season totals, which is an aggregate of individual games, which is an aggregate of individual innings. If you want to run a proper regression analysis, at the very least run it on a game or inning level. Your sample size will explode to something much more reliable.
4 - Not accounting for all the variables. Triples have a strong relationship to speed. If you don't have SB in your sets of variables, the regression analysis will award more weight to the triples as a stand-in (because of its relationship to steals). It is possible, based on some samples, that the value of a triple could exceed the value of the HR! What other variables are you not accounting for?
Arvid - Let me get back to your post. The purpose of this article is to explain the building blocks of run creation at the team level. I have not shown how to extrapolate this to individual players. The end-result is not to end up with linear values for each hitting event, since these linear values only apply to a given run environment. We need to determine the linear values for EVERY run environment! As I said, the value of a single in Pedro's run environment is far less than a single in an average pitcher's run environment.
I am interested in the pieces of how runs get created, an actual model. I am not interested in a formula that estimates runs based on whatever variables that ONLY works for a given run environment. Runs Created and Linear Weights work fine for that. BaseRuns is the key, and I will present this hopefully by the end of this month.
Michael - The building blocks of run creation does lie in run expectancy tables for the 24 base-out states. I am not introducing anything new here, but rather showing how we should extend this to other run environments. I have not read Curve Ball. Please clarify your post further so that I can properly answer you.
Rob - Are you asking me what would a player's run value be using a context-neutral approach (i.e., the final weighted average values I presented) compared to a context-specific approach (i.e., the specific values by the 24 base-out states)? If this is the case, the answer is about +/- 10 runs at the extremes. I looked at this last year, with regards to Ichiro. You can find that article here http://www.geocities.com/tmasc/lwbymob.htm though I only looked at the 8 base states. If this is not what you are talking about, please clarify further.
How are Runs Really Created
August 14, 2002 - tangotiger
(www)
(e-mail)
Michael: I agree that the easily most digestable measure of run creating is one that is context-neutral, and therefore, I am not adding anything new here, except more perfect values to use (and adding values to the obscure events like RBOE or BK).
My interest lies "under the hood", and the how and the why.
The important point that I'm also trying to get across is that even if you stick to a linear context-neutral measure like linear weights, that you should use a custom version, based on the run environment. It really makes no sense to apply the same formula to Mel Rojas as to Pedro Martinez. We only do this, because it's easy for us. And if we keep doing it, we will forget to question why we do it. Runs Created, as great as it was then, is an example of this. It completely fails us at the extreme player level.
I think I am in basic agreement with your point of view.
Rob: OUCH! First of all, I did look at the batting order about 2 years ago, and there was an effect of something like 15-20 runs for Rickey Henderson in the leadoff spot. That is, putting a player whose skillset is uniquely qualify for a batting spot that has the most variability (which is Rickey to a tee) with his best season I think had a variability of close to 20 runs (against putting Rickey say in the #5 spot). The #2 hitter also showed great variability, and I concluded elsewhere that in certain (many!) situations, your best hitter should bat #2.
With the MVP/Ichiro thread, I showed that batting great with men on base, or being given alot of men on base will add 10 runs. Give both, and you're close to 20 runs as well.
I really don't need to run a simulator to determine all this though. This is a simple problem of determing the frequency of facing the 24 base-out states, and your success in those same states.
I wouldn't be surprised if you have a player who is ideally qualified for a particular spot (say Ichiro for #2, though I don't know that), who faces more than normal high-leverage situations, who is one of the best hitters in the game and who performs far above his "neutral" performance level would add 30 more runs than if placed in a "neutral" spot and performing at his normal high level. This is of course a rarety, and I would guess in practical terms that 1 standard deviation would be +/- 4 runs.
This issue however is very interesting to look at, but it would be something that I would have to prioritize in with the other equally interesting things I'm looking at.
How are Runs Really Created
August 14, 2002 - tangotiger
(www)
(e-mail)
Michael, I would not look at SF actual run output to determine anything since 6000 PA is not a very small sample.
Anyway, I once ran sims where I had a team of 9 .333 OBA guys, and another team with 8 .300 OBA guys and 1 .600 OBA guy. Overall, both groups are the same. I also made the SLG average about 30 or 40% higher for each player.
I then moved this Bonds type player through the batting order.
From what I remember, I did not notice much difference between the 9 equal guys and the Bonds + bad team.
I'll have to redo that study now that I have better data available. It is again another interesting question that I must look at.
How are Runs Really Created
August 14, 2002 - tangotiger
I meant IS a small sample.
How are Runs Really Created
August 15, 2002 - tangotiger
(www)
(e-mail)
Michael, I guess I didn't make myself very clear, since what you replied is exactly what I said.
"Anyway, I once ran sims where I had a team of 9 .333 OBA guys, and another team with 8 .300 OBA guys and 1 .600 OBA guy. Overall, both groups are the same. I also made the SLG average about 30 or 40% higher for each player."
So, the 9 equals of .333 OBA had a team weighted team average of .333 OBA. The 8 equals of .300 OBA plus the Bonds-like .600 OBA would have a team weighted average of .333. So, the first team has the Bonds magic spread around. We are talking about two equal teams in terms of overall talent, except that the spread is far different.
As I mentioned, I don't remember seeing any noticeable difference. It might have been maybe 2% difference (say 15 runs over a season) only to the extent that you'd be able to optimize the batting order so that the .600 guy could do the most damage. I will redo the study at some point in the future though to get more accurate results.
Here is a link to the results of the study I did last year. Please take it as preliminary and crude. Spreading the Bonds magic
How are Runs Really Created
August 15, 2002 - tangotiger
(www)
(e-mail)
tango I think your crude Bonds analysis goofed up in exactly the types of ways you intended to prevent with your article.
I was doing my best to avoid lone gunmen types like Bonds. I did that analysis ONLY to show the effect of runs at a team level, with having either 9 guys equals, or 8 guys equals, and 1 outlier, even though overall, they have the same stats. I did not want to talk about the "run environement" because...
The problems I see are that as you so elegantly noted walks are valuable only because others drive you in. By using just OBA you missed that completely as most of Bonds exceptional value is in the walks.
...because Bonds doesn't get to partake in his own run environment. Bonds's run environment, the chances that the runners ahead of him will score, and the chances that he himself will score is derived by all the other batters. You can't measure Bonds value of moving runners over, if those chances include partly Bonds' effect.
So, I was hoping that everyone would overlook this, because the Bonds effect to the run environment is outside the scope here. However, since you brought it up, what you have to do, in this case, is establish a run environment for each batting spot for this particular team, such that if Bonds is the #3 hitter, then the run environment of the #2 hitter includes Bonds, but the run environment of the #3 hitter should "assume" an "average" type of ballplayer.
I went into this into great and deathly details in the batting order thread on fanhome. I really want to avoid talking about that here, because we are going to get away from the basics too fast.
Your point is well-taken and accurate.
It would be interesting to analyze all of the line-ups that have been tried to see which would be the most effective based on the run environment concept, and of course see if you could find a better one.
The run environment concept applies to the basic building blocks of run creation, and I did apply this to the above mentioned thread on batting order.
The correct and proper way to do what you are suggesting is to use the proper model (a simulator) to go through all the variations. The run environment concept with its building blocks of run creation ho