Tango on Baseball Archives

Road Warriors (September 4, 2003)

Home and Away records of contending teams. Sample size?
--posted by TangoTiger at 01:57 PM EDT

Posted 2:40 p.m., September 4, 2003 (#1) - bob mong
Here is some more data: Teams with a home-field advantage (HFA) of at least 8 wins.

Team, HFA

2002
Twins, +14
Red Sox, -9
Anaheim, +9
White Sox, +13
Royals, +12
Diamondbacks, +12
Expos, +15 (they didn't play in PR in 2002)
Astros, +10

2001
Twins, +9
White Sox, +11
Braves, -8
Phillies, +8
Cardinals, +15
Cubs, +8
Giants, +8

2000
Mets, +16
Giants, +13
Diamondbacks, +9

This isn't complete for all teams in these years.

Posted 3:08 p.m., September 4, 2003 (#2) - Andrew Edwards
The near total absence of year-to-year consistency in Bob's numbers suggests to me that there are serious sample size concerns.

That's not to say that the insight isn't relevant - some teams could have a serious playoff problem driven by a poor road record - but we'd need to aggregate over a few years I think before we started to be able to really make any strong statements.

Posted 4:26 p.m., September 4, 2003 (#3) - Greg Tamer(e-mail)
I don't think you can use a team's season road record to predict how they will perform in the playoffs against a few particular teams. There are too many other variables that need to be considered, that, in my opinion, are more important than how the team did overall on the road. These variables include current starting pitchers, current relief pitchers, current hitters, the weather (or lack thereof), how the other team is doing with injuries and streaks, their pitching and hitting, matchups, and how that team did in the particular stadium in which they will play, and not overall throughout the league.

Posted 4:35 p.m., September 4, 2003 (#4) - bob mong(e-mail) (homepage)
I just updated the blog entry, giving data for every team for every season between 1996 and 2002.

The most surprising to me was Florida - anybody have any ideas why they have such a large home-field advantage (especially considering that Tampa Bay does not).

Also, it appears that home-field advantage in the AL is different than HFA in the NL. I have no idea why, other than the outliers in the NL (Colorado, Florida).

Comments?

Oh, and thanks for posting this, Tango :)

Posted 4:42 p.m., September 4, 2003 (#5) - bob mong
I don't think you can use a team's season road record to predict how they will perform in the playoffs against a few particular teams.

I dunno. I agree that the sample size is very small, and the results are inconclusive. But...

Imagine Florida, who has a 7-year HFA of 12 wins (3-year average: also 12 wins), is playing Atlanta, who has a 7-year HFA = 5 wins (3-year HFA = 1 win) in the playoffs.

Both of those teams have similar HFA in 2003 to their historical HFA (Florida 2003 HFA: +15, Atlanta 2003 HFA: +5).

I would think that home-field advantage would be very important for Florida, as well as for any team playing against Florida.

Posted 7:10 p.m., September 4, 2003 (#6) - Dirk K
If you have the time and data, you may want to attempt to correct for the unbalanced schedule.

Quite a few pairs of opponents play a 9-game season series (at least the AL West does against non-division opponents; it may work out differently for the other divisions that I pay less attention to). That's typically 1 (3-game) home series and 2 road series, or vis versa. If you get the good opponents more on the road and the bad ones at home...

Plus, of course, interleague play is mostly a single series.

With enough data, I would do something like BP does with their postseason odds report. I don't know enough about how they compute Dav W%, but we could at least start with normal Pythagorean %.

Then produce, for each team
Home Strength of Schedule (weighted avg of home opp Pyth%)
Home Pyth (using RS/RA at home only)
Road SOS
Road Pyth

Home Stathead advantage = (Home Pyth - Home SOS) - (Road Pyth - Road SOS)
Home Luck = (Home Record - Home Pyth) - (Road Record - Road Pyth)

Whether you think Home Luck is part of HFA advantage or not is up to you (at least some of it is going to be explained by walk-off/extra inning games, where knowing exactly how many runs you need makes strategy a little easier).

I started writing something more complicated, involving using the teams's own Pyth and SOS via log5 to predict home wins, but then realized that I was confounding my input & output data.

Posted 11:24 p.m., September 4, 2003 (#7) - Alan Jordan
I know I'm going to get myself in trouble for saying this, but here goes.

I wouldn't recomend using the Pyth or any variation of the Pyth in your formulas unless you just don't have game level data. If all you have is seasonal total data then go ahead use the pyth and ignore the rest of this message. If you have game level data and want to use the pyth then remove the effects of home field advantage, parks and starting pitchers (possibly plate umps if you want) and then use the Pyth.

The reason is that the Pyth and any function that uses seasonal totals is inefficient and in some cases biased when data from distributions are mixed together. The reason is that any function that uses seasonal total data whether its runs scored, runs allowed, HRs, GIDPs, whatever, treats all of the runs scored etc... as if they are equal, i.e. come from the same distribution. Games played in high scoring parks such as Mile High in Colorado produce higher runs scored and allowed than games played in the Dodgers' home park. This should be adjusted before you plug it into the Pyth. It's complicated and it requires game level data, but it's better than just taking a run estimator with season total data and plugging the estimated runs scored and runs allowed directly into the Pyth.

Again, if all you have is season total data, go ahead, you're pretty much stuck with that.

Posted 8:28 p.m., September 5, 2003 (#8) - Dirk K
#7 -- you're right, of course, about the park factors.

I don't know that I'd worry about the rest of it, though. Home field advantage is what we're trying to measure, not an input.

Trying to account for starting pitchers is probably also confounding -- the different pitchers on a team don't have the same number of home and road games, and they don't face the same opponents.

Posted 9:04 p.m., September 5, 2003 (#9) - Alan Jordan
The park factors are probbably the biggest problem in terms of bias or imprecision (inefficiency in stat jargon) of estimates, but adding starting pitchers to your model along with teams, opps homefield ad, while a huge pain in the ass, increases the precision of the estimates and allows you to make predictions conditionial on the starting pitcher which might be a little more realistic. Of course you have to assume a rotation for each team.

For simplicity's sake, there's not much harm in ignoring the starting pitchers under the assumption that they are randomly spread throughout the season.