Park Factor Thoughts

Considerations

First off, you should decide whether you are talking about value or ability. Are you trying to strip out all the noise of the park, so that you can establish a player's pure true talent in some context-neutral environment (ability)? Or, are you trying to establish how much impact the player actually had, while understanding the context he actually played in (value)?

For value, it's relatively easy, and it's purpose is not to find a park factor per se, but to decide how to split up the offense/defense impact to winning. If both teams score 10 runs at Coors or 10 runs at the Astrodome per game, who cares what happens at the Big O. The run environment is 10 rpg, and that's what you need to know. Now, if you want to fairly separate offense/defense, you would take it one step further, but not much more than that.

Now, for ability, it's much more convoluted, an more in-line with what MGL is saying. I wrote a list of factors to consider on the old fanhome, and of course, that's gone. I'll try to remember them all:

1 - static conditions: the dimensions of the park. You should use 100 years, if the park hasn't changed, if you have it, because 300 feet is 300 feet.

2 - dynamic conditions: the weather, the cut of the grass, the wetness of the field, the wind, etc, etc. You should use multi-year if these things are predictable-dynamic, but single-year if they are unpredicatable-dynamic. If the groundskeeper is the same guy, and cuts the grass kinda the same way, then use multi-years. The wind patterns probably change drastically, so you should use single-year.

3 - other parks: Park factors are relatively to other parks. You can have say the Big O be a hitter's park 10 years ago, and a pitcher's park today, even if the park is exactly the same and the climate hasn't changed, simply because the other parks have become/introduced as hitter's parks.

(Note: because of this, regression towards the mean is not necessarily towards 1.00. You should look at the same parks year-to-year, and in some years, that could mean 1.02 or .98, etc, because the other parks that are not part of the analysis do make up the park factor. )

4 - the tendencies of hitters/pitchers: If you have a team filled with lefties or flyballers or whatever, that introduces a bias. A park that might be optimal for flyballers, and you have a team that is filled with flyballers will not show an accurate park factor. The sample of your players should be representative of your population. For extreme parks, like Fenway and Coors, I'm sure they are not. The degree of which, I'm not sure.

5 - the quality of hitters/pitchers: Barry Bonds is great no matter where he hits, and he might not be hampered as much as someone else by playing in a pitcher's park. So, he hits the ball 390 feet instead of 400 feet. If ReyRey hits the ball 320 feet instead of 340, that's a big difference. Again, I don't know the degree of impact here, but it is a consideration.

6 - the game context: Different game states (score/inning/base/out) results in different hitting/pitching approaches. Again, probably small impact, but needs to be determined.

So, your "park factor" is made up of several factors, each of which needs to be analyzed on its own. For some, you can use multi-years, and others you need single-years, etc, etc.

Practical Example

Mark McGwire hits 30 HR in the 1978 Astrodome, while Astros John Olerud hits 5 HR in the same number of opps. The average Astro hits 5 HR. The avg Astro playing at Coors hits 10 HR. How many HR should Mark McGwire hit? 60?

Proponents of park factors somehow want you to believe that somehow Mark 30's HR should split into 2, and make that 60 HR in Coors. Ridiculous isn't it?

Really, what they are trying to do, and what we want done, is to try to figure out how many of those "warning track" Astrodome hits/outs would have been HR at Coors. After all, none of the Astrodome groundballs (GB) would have made it out of Coors either. But they don't have that data. So, they use HR as a stand-in. If Mark has 6 times more HR in the Astrodome than the avg Astro, then he must have 6 times more warning track balls. Yes, I know, ridiculous.

Anyway, let's add some numbers here, and let's look only at balls in stadium (which is really AB-K).


Player AB-K 1b 2b 3b hr outs
Astro  250  50 20 5  5 160
JohnO  250  50 40 5  5 140
MarkM  250  30 20 0 30 170

Now, let's break down all those balls in stadium into 3 categories: GB, short FB, long FB. And really we only care about Category 3, since these are where your "potential HRs" come.

Let's say that all singles are either GB or short FB (unless you are Rickey Henderson). Let's say that 70% of 2b are long fb, 90% of 3b, 100% (obviously) of HR.

Let's say that we estimate that the avg Astro's out breakdown among the 3 categories is 50%,35%,15%. I'll guess that Olerud is 45,30,25, and that McGwire is 35,25,40.

Ok, so what does all that give us? Here are the estimated category 3 balls in stadium for our players:


Astro 47.5 = 20*.70 + 5*.90 + 5 + 160*.15
JohnO 58.5
MarkM 69.5

Breaking this down into HR/non-HR, this gives us
Astro 5 - 42.5
JohnO 5 - 53.5
MarkM 30 - 39.5

So, this means that of the average Astros 47.5 long flies, 5 were HR and 42.5 were non-HR. How about in Coors? Assuming that the batter does not alter his swing, there would also be 47.5 long flies. This time, we mentioned above that there were 10 HR. So, that means that Coors converts 5 of those 42.5 non-HR into HR (the other 5 HR would have gone out in any ballpark). So, THAT is our adjustment factor. We want to add 5 HR / 42.5 Astro non-HR.


Here is the expected breakdown at Coors for these players:
Astro 10.0 (5 + 5/42.5 * 42.5)
JohnO 11.3 (5 + 5/42.5 * 53.5)
MarkM 34.6 (30 + 5/42.5 * 39.5)

Now, not all long flies are the same. Some long flies are longer than others, etc, etc. And I make alot of assumptions.

But the most important thing is that the "extra" HR created by Coors are NOT a function of a player's HR but a function of his flyballs. Based on this example, and since we don't have all the data we need, it looks like the best way to adjust is NOT as a multiplicative effect to HR, but as an ADDITIVE effect to OPPORTUNITIES (and in our case, the only data we have of that is AB-K).