Tango on Baseball Archives

© Tangotiger

Archive List

After Sabre-school Special, Sept 17 (September 17, 2003)

I've got a month's worth of mail to get through, so here goes...

More...


DER

To be technically accurate, DER is a measure of a team's DEFENSE, but not its FIELDING, since DER is made up of pitching, fielding, and park. You've adjusted out the Park effect, and so you are left with pitching/fielding skills.

Unless you use ZR (which I DO recommend), I would split the difference between pitching/fielding. So, if say you have a DER of .275 and the league is .300, and your park is a .290, this is what I would do:

1 - Park effect, over 162 games is .005. NewDER = .280.

2 - pitching + fielding = .300 - .280 = .020

3 - pitching = fielding = .020/2 = .010

You COULD use ZR to show that maybe fielding = .015 or something, and then pitching would be .005.

Think of ZR as equivalent to BA or OBA. All opps and successes are treated the same (1) even though we know that a HR is worth more than a single.

So, you can have Erstad converting 90% of balls in his zone into outs, and the same for Bernie, but if Yanks pitchers allow all their balls to the "easy zone", then that's not really fair to Erstad. That's where UZR comes in. But ZR is as valuable as OBA. So, it's a great first step, better than any other first step available.

Hockey goalies have a similar problem. For the longest time, they only used "Goals Against Average", or goals/60 minutes. Then, we've got save percentage, or saves / shots faced. Now, we just need the next step, which is the "quality" of the shot faced. Teams do keep track of this, but as is the desire in the NHL, they don't release everything to the public.

GB and FB pitchers, again

I'm saying that including ALL contacted balls that the GB has the advantage by .1 runs per contacted ball.

If you look only at balls that requires fielders, I believe you get a wash. The hit rate on FB that stay in the park is about 30 or 40 points LOWER than the hit rate on GB. However, the XBH/H is much higher on FB. However, the GIDP/BIP is much higher on GB.

All in all, from a DIPS perspective, we have a wash. The difference therefore is entirely due to the HR.

Odds Ratio Method

Specifically, if I have a team that scores 4 R/G in a 4.5 R/G context, can I use the odds ratio method to estimate/predict how many runs they would score in a 3 R/G context? If so, how?

As luck turns out, you can do 4/4.5*3 to get your answer as a reasonable shortcut.

But, what you really want to do is

1 - extrapolate back where the "4", "4.5" and "3" comes from. So, maybe a team that scores 4 runs will have 600 AB, 160 H, 28 2b, 4 3b, 17 Hr, 60 BB, etc,, etc, etc. And you'd do it for all 3 "environments".

2 - THEN, using the Ben V-L matchup method or Odds RAtio method, you figure out the new component values.

3 - THEN, convert that back into runs.

Sacs: Better to be on 1B and 0 outs, or 2B and 1 out?

If you go here: RE9902event.html

you will see that getting a single with bases empty and 0 outs caused .39 runs to be added, while a double with bases empty and 1 out caused .40 runs.

Essentially, treat it the same. HOWEVER, you have 1 less out to work with in the second case. The chance of scoring exactly 1 run is about the same, but the chances of scoring at least 1 run is far greater in the first case. And, you should be looking at the "at least" category.

Tango Distribution

Yes, the equation is fitted to empirical data. I think I first used Woolner's data (which I think was from 1901 to 2000), and then I used my own data (I don't remember from when, but I think 1974-1990). The "control value", which is the brains of the whole thing, was .761 with the Woolner data and .766 with my data.

I also ran my simulator to try to break my equation (getting teams to score runs at rates like 1 run per game and 18 runs per game, etc), and I couldn't. While that control value was not static (going from .70 to .85 or something), it produced results that were close enough using only the .76 figure.

From that standpoint, it's a big success.

However, when trying to do matchups between 2 teams, I had to use a figure of .852. This is because the run scoring is not random (maybe parks with 2 teams always playing in the same one, or relievers being brought in, etc).

***

First you supply the rpi, and I figure out the distribution of 0,1,2,3... runs scored for one inning. Using that, I then figure out the distribution of 0,1,2,3... runs scored for one game.

I treat the rpi as an independent variable. This is obviously not true, but works for my purposes. And it fits the data pretty well.

You can try to control for it by using the "control value" that I supply. This control value lets you change the shape of the distribution. So, if you are more interested in runs in an inning, and not runs in a game, and not face-to-face win%, then you can alter this figure to conform to the distribution you want.

Say for example that looking only at distribution of runs by inning (because of the non-randomness of it) the best-fit to the empirical data is .679, then that's what you use.

However, if you look at runs by game, you can find that .766 is a better fit.

Need to worry about head-to-head matchups? .852 is the best fit.

I never tested the first one to see where my best-fit would like. For the second one, the best fit is somewhere between .76 and .77. For the last one, .852 was the best-fit I found, but further testing shows that I can improve on this figure.

UZR, again

The are a few issues to consider here:

1 - we have the UZR model

2 - we have the STATS version of UZR

3 - we have the MGL version of UZR

4 - we have the Tippett version of UZR

5 - we have actual play-by-play data

6 - we have biased play-by-play data

So, for #5 specifically, there is no reason to try to estimate the number of left handed batters that Jeter may have faced if we know exactly what that number is. There is also no reason to estimate the GB/FB tendency of Jeter's pitchers or the batters he sees, since we know all that as well. From that standpoint, trying to estimate something that we know is not acceptable. Same for number of times Jeter had the DP in effect.

Consistent use of data to estimate known data does not decrease the margin of error. You're not even propogating the same type of error. You are just magnifying the error.

For #6 specifically, the location of a ball hit, the speed of the ball, the trajectory of that ball, etc, etc is all subject to scorer bias. It is VERY possible that this bias is so high, the margin of error so great that it would be better to *not* use the data. And in fact, Tom Tippett does not use speed of ball, because he doesn't trust the scorers.

So, to back up from the beginning: #1 - The UZR model is sound because it conforms to how we best perceive fielding to work (see last ASSS thread on this).

#2 - The STATS version of the UZR is dependent completely on the hit-location. Now, since we don't know how reliable it is, and since they don't tell us how reliable it is, you are now free to dismiss all those ratings. This is true of any metric where a reliability figure is not supplied.

#3 - The MGL version of UZR is even more problematic. While it does a few perfect things (like knowing exactly how many lefties, groundballers, and DP situations that Jeter actually did face), it *also* introduces metrics based on data whose reliability has not been assessed.

Now, if I were MGL, I would have produced the UZR in stages so that the reader will be able to accept up to a certain point, and reject after a certain point. After all, I find it hard to believe that Scott Rolen can be that good, when I have an independent way to show that the most runs a fielder can save is about 20 to 25 runs from average. So, I share anyone's skepticism that shows +45 or +50 runs saved on fielding.

#4 - The Tippett version of UZR is similar to MGL's except he supplants his ratings with other data in a non-systematic way. Kind of like a reality check. The problem is that what he is doing is not reproducible necessarily.

Win Shares, where to find

I believe that STATS will give it to you gratis if you ask them.

You can contact Joe Dimino (check the Primer Authors page) who has a Win Shares Excel, and probably has all that.

You can also try www.baseballgraphs.com , where they might be able to help you out there. It might even be a good project for them to do actually.

Win Expectancy

You should find this link useful.

Win Expectancy

The important links in there is the main one, as well as the one in post 13. There are a few other posts in there that shows that you can generate what you need with an equation.

I have not decided when/if to release my full chart.



--posted by TangoTiger at 11:05 AM EDT