Tango on Baseball Archives

© Tangotiger

Archive List

BP - Sample size and park factors (August 11, 2003)

It's music to my ears when I hear analysts saying the words "small sample size". Here's what the unnamed author says regarding park factors:

Dodger Stadium suppresses run scoring by nearly 20%, yet Nomo has been a notably (and consistently) better on the road...it's possible that the Sample Size Goblin is simply rearing his ugly head ...possible that Dodger Stadium is ill-constructed to fit Nomo's individual strengths and weaknesses. After all, Park Factors measure run scoring, but they don't measure the components that go into run scoring

First of all Park Factors DO measure the components. I'm almost positive that BP has the component park factors in the back of their book. In any case, they exist, but they are not very popular. And, as I've preached many many times for at least 2 years now, and what is being reiterated here, no one can possibly say that Busch Stadium affected Vince Coleman, Willie McGee and Jack Clark the same way. From that standpoint, (run or component) park factors have (still to be determined how big) issues.

Anyway, onto the sample size issue:


2002-03 IP H HR BB SO PA BIP bipH $H SO/BB contact/PA HR/PA
Home 190.667 164 28 91 174 827 534 136 0.255 1.91 0.680 0.034
Away 194.667 142 14 81 155 807 557 128 0.230 1.91 0.708 0.017

Just combining Nomo's numbers, and adding in a column for PA (IPx3+H+BB), BIP (PA-BB-SO-HR), bipH (H-HR), we calculate $H (bipH/BIP), K/BB, contact/PA (1 - [BB+K]/PA), and HR/PA.

The K/BB numbers are a match. The contact/PA shows a .028 difference. Is that alot? Well, 1 standard deviation, with 800 PA, is about .016. So, that's less than 2 standard deviations. So, even though hitters are making greater contact with Nomo on the road, it could be luck. How about the $H? That's 1.3 standard deviations away. HR/PA? That's 2.74 standard deviations apart. If we were to look at pitchers like Nomo, and how they were affected by Dodger Stadium, we might find that he's even further away than that.

So, out of everything in his performance line, I would concentrate on his HR/PA or HR/contact rates. Are pitchers of Nomos type affected particularly different by Dodger Stadium when it comes to HR? Is it simply luck? Or, is there something inherent with Nomo himself?

--posted by TangoTiger at 03:22 PM EDT


Posted 5:43 p.m., August 11, 2003 (#1) - Michael
  I'm almost positive that BP has the component park factors in the back of their book.

Good thing you said almost, because BP books do not have component park factors in the back of their books. (At least not the years I have).

Posted 7:21 p.m., August 11, 2003 (#2) - FJM
  The Stats (now Stats/Sporting News) green book has component factors. They vary too much from year to year to be given a lot of credence, but they're better than nothing. They also show 3 year averages.

Posted 7:34 p.m., August 11, 2003 (#3) - Patriot
  Tango, I'm not sure I get your point about run PFs having issues with how they treat various players. Obviously PFs of all kinds have sample size issues, selective sampling issues, interaction between the various parks issues...but the only reason I can see for using a run based PF is if you are after value. And if it's value, I don't care that McGee doesn't benefit as much as Clark, I just care what value the runs had. Of course, then you get into the old BJ argument about using the team RPG instead of the PF...and then you get into all the various value defintions, etc.

Sorry for the hijack.

Posted 7:50 p.m., August 11, 2003 (#4) - Tangotiger
  Patriot, I agree with you.

Depending what you want, sometimes run PF are what you want, and other times component PF.

Posted 9:07 p.m., August 11, 2003 (#5) - FJM
  There is something weird going on at Dodger Stadium this year. 11.7% (103/883) of all hits by the Dodgers and their opponents have left the yard. That compares to only 7.4% (67/907) on the road.

It is true Nomo has been victimized the most, with 18.9% (14/74) at home and just 8.2% (4/49) on the road. But it's affecting the rest of the Dodger staff too: 12.4% (43/348) home, 7.7% (30/391) away.

The Dodger hitters are not affected as much: 9.8% (46/461) home, 7.1% (33/467) road.

Whatever is going on here, it's not a sample size issue.

Posted 10:43 p.m., August 11, 2003 (#6) - Tangotiger
  FJM: can you calculate the odds that a team of pitchers that gives up 10% of their hits as HR would give up 12.5% at home and 7.5% on road, given 400 hits in each, by random chance alone?

Posted 12:45 p.m., August 12, 2003 (#7) - FJM
  Using BINOMDIST in Excel, the probability of 50 or more HR's given 400 hits and a true 10% rate is 4.3%. The probability of 30 or less is 5.2%. When you include Nomo and the Dodgers hitters the probability drops to around 1.5% either way.

Posted 1:41 p.m., August 12, 2003 (#8) - tangotiger
  So, combining the two, it's 1 chance in 460 of having those rates occur by luck, correct? Seems like something's up, especially if you go back to the history of Dodger Stadium where I presume that the split would usually be the other way.

Posted 5:49 p.m., August 12, 2003 (#9) - FJM
  It's not quite that simple. We came up with our presumed 10% rate by combining the Home and Road data. So if the Home HR % turns out to be much higher than 10%, then the Road HR % must be much lower. In other words, the two results are not independent. Multiplying the probabilities together is appropriate only if they are independent.

As it turns out, HR rates were somewhat higher in Dodger Stadium in 2002 as well, although the difference was much less than it is this year. The Dodgers pitchers had 13.4% of their hits leave the park at home vs. 11.8% away. For Dodger hitters it was 10.9% home, 10.3% away. Putting them together we have 12.1% home, 11.0% away. So HR Rates have fallen just slightly in 2003 in L.A. while the Road % is down 33%!

Posted 5:53 p.m., August 12, 2003 (#10) - KJOK(e-mail)
  For the years 1969, 1972-1992, & 1999-2002, hits and HR's at Dodger Stadium and in away games:

Home
Hits - 34,827
Home Runs - 3,103
Rate of HR/Hits = 8.9%

Away
Hits - 35,842
Home Runs - 3,264
Rate of HR/Hits = 9.1%

Posted 6:00 p.m., August 12, 2003 (#11) - KJOK(e-mail)
  Darn typo - Away HR's should be 3,274, which is still 9.1%....

Posted 6:12 p.m., August 12, 2003 (#12) - KJOK(e-mail)
  And for just the period 1999-2002:

Home
Hits - 5,379
HRs - 729
Rate - 13.55%

Away
Hits -5,887
HRs - 747
Rate - 12.69%

Maybe the park is depressing singles,doubles & triples even more than it normally does - perhaps due to the TYPE of pitchers LA has? (Just a thought...)

Posted 7:38 p.m., August 12, 2003 (#13) - Tangotiger
  FJM: the 10% was simply a historical figure that I like to use.

Posted 4:10 p.m., August 15, 2003 (#14) - Hey, FJM
  What book do I need to buy to learn the purpose, application, and execution of those neat statistical concepts?