Tango on Baseball Archives

© Tangotiger

Archive List

Baseball Musings: Defense Archives (December 5, 2003)

I'll start another thread for the Pinto model. He's presenting position-by-position individual totals.

Go down to the 1B section. I absolutely *love* the way he presents the intermediary data.
--posted by TangoTiger at 10:39 PM EDT


Posted 11:32 p.m., December 5, 2003 (#1) - Tangotiger (homepage)
  I also want to direct the readers to a rather interesting (and at one point, orgasmic, for me anyway) discussion on fielding at fanhome.

Posted 4:34 p.m., December 6, 2003 (#2) - Alan Jordan
  What's wrong with the way that he presents the intermediate data?

Posted 5:01 p.m., December 6, 2003 (#3) - Michael Humphreys
  Alan,

Perhaps I should let Tango answer, but I believe he was playing it straight--he "loves" the "intermediate" data at first base because it *disaggregates* the ratings into ground ball, line-drive, fly ball and pop-up ratings (or something like that).

I've written to David that disaggregation might be very helpful at the other infield positions. Grounders are skill plays, line-drives might be subject to a lot of luck, and flyball/popups are subject to ball-hogging. David's data could provide excellent evidence for the degree to line-drive "range" is luck, or pop-up range is ball-hogging.

These factors may explain why the *aggregate* rating for Chavez at third is only average, and why Jeter and Soriano don't rate so badly--my guess is that Jeter and Soriano knew that Bernie didn't have the range and compensated by aggressively using their speed to go after short flies in center. Having written it that way, I'm making it sound like a good thing, and I suppose it is, but I wonder how much of it was just taking chances that Bernie (or an average centerfielder) might take anyway.

Posted 6:04 p.m., December 6, 2003 (#4) - Alan Jordan
  Nevermind, my misunderstanding. I liked it too.

Posted 8:12 p.m., December 6, 2003 (#5) - Tangotiger
  To close this off, yup, I was being sincere, and not sarcastic.

Posted 2:43 a.m., December 7, 2003 (#6) - Charles Saeger(e-mail)
  One reason Jeter probably rates so badly is that the "subtract 0.200" rule serves to exaggerate fielding Win Share differences. James takes a rating 20 on a 0-100 scale as a .200 DWP, when it isn't -- in the case of fielding (as opposed to hitting), the team rating determines the fielder's effectiveness. Runs Created is a team-independent method of offense determination (well, mostly), so thus it is possible to say safely what a player's OWP is. For range, when divvying up the claim points he's just assuming a team is average until they actually stake claims on those Win Shares. (Were James to use something like CAD or DFT which make full estimates of fielding skill before Claim Points, that would be different. I'm not saying what James uses is wrong, because I happen to like looking at a player's skills from about a zillion different angles, but he's using it wrong here.)

I took a look at players when they change teams, and as best I can tell, dropping out that 0.200 makes little difference in year-to-year consistency when changing from a good team to a bad team and vice versa.

Posted 11:54 a.m., December 8, 2003 (#7) - tangotiger
  Want to guess what a backup fielder's value is?

What I did was take all of David's data by position, and separated them into two pools: regulars and backups. To qualify as a regular, he had to be in the top 30 at his position in "expected outs". That's essentially a "playing time" component. Everyone else at that position was considered a backup. (You do have a problem with multi-position players, but, we can take care of that in a future study.)

Here is the total for SS:
regular expected outs: 12,372
backup expected outs: 3,249
total SS expected outs: 15,620

So, based on this, we can say that the regular shortstops played 79% of the time, and the backups 21%. So, to put this in a "162 game context", the 30 regular shortstops played .79 x 162 x 30, or 3849 games, and the backups played 1011 full games.

How many outs did the backups actually record? 3225, or a total of 24 less than expected if they were average. Seeing they played 1011 full games, to put that into a 162 game context, we get -24 / 1011 x 162 = -4 outs.

So, the backup SS is worth -4 outs compared to the average SS. How do all the positions do?

3 (4)
4 (0)
5 (2)
6 (4)
7 4
8 (6)
9 4

You'll notice that the backup LF and RF are BETTER fielders than their regulars. This makes sense to some level, since you've got great hitters/bad fielders at those positions. As well, I wouldn't be surprised that some CF are also playing at LF and RF (i.e., double-counting the regular CF as a backup LF/RF).

In any case, what happens if we ass all that up? A team of backup fielders are worth a total of -8 outs compared to the average SS. So, -8 outs per team, or -6 runs per team over a season. That's 1 run per position.

Essentially, a team of backups fielders is worth around the same as a team of regular fielders.

Posted 12:10 p.m., December 8, 2003 (#8) - tangotiger
  Note: because David's list does not include all players, the backups list might actually be worse. If I look at the regulars, they come out to being +5 better than average in almost 75% of the playing time. If we include the missing players, the playing time of the regulars probably drops to around 72 or 73%. In order for everything to add up, +5 for 72% of playing time for the regulars matches up to -18 outs for the 28% of playing time for the backups. I'm a little bothered that the few players that didn't make David's list could have that much of an effet. If you take that number, that works out to -14 runs per team, or -2 run per position that the backup fielder is worth compared to the average.

Posted 12:14 p.m., December 8, 2003 (#9) - tangotiger
  Finally, putting the regular pool at 45 players per position, here's what the list looks like:

BK ExpectedOuts ActualOuts %PT Outs/162
3 1290 1281 0.14 (2)
4 1929 1924 0.12 (1)
5 1656 1622 0.14 (8)
6 1174 1149 0.08 (11)
7 1314 1325 0.15 3
8 1599 1548 0.13 (13)
9 1623 1644 0.17 4

That's a total of -29 outs for a team of backup fielders, or about -3 runs per fielder per 162 GP.

Posted 12:42 p.m., December 8, 2003 (#10) - Michael Humphreys
  Tango,

Fantastic analysis. And the result makes a lot of sense. Let's face it--there simply is not the same "shortage" of adequate fielders as there is for adequate hitters. The talent distribution that Bill James brought to everyone's attention is not nearly as skewed for fielding as it is for hitting.

Having said that, it is interesting, and in a way consistent with the James's Defensive Spectrum, that there may be a slight "shortage" of fielding skill at SS and CF (though you'd expect the 2B and 3B results to be reversed).

By the way, DRA ratings in the 1974-2001 study (which only covers full-time players) were on average somewhere between 0 and +3. Can't remember offhand--but they were only very, very slightly better than league average.

Posted 2:08 p.m., December 8, 2003 (#11) - tangotiger
  I performed a similar analysis using 1999-2002 UZR last year, and the results were similar there as well. The typical backup fielder is only a couple of runs worse than the typical starting fielder.

**************

Here's another one you may like. For each player, I estimated his "BIP" as actual outs divided by 0.7. How does that make any sense? Well, the average DER is around 0.700 for a team. So, giving each player the responsibility, on average, to convert 70% of his BIP into an out, we can get his BIP. It's a useful measure for what I'm about to do...

... Now that we know each player's opportunity context (BIP), we can figure out how much 1 standard deviation works out to. Taking Andruw Jones as the example, he had 362 expected outs or 517 "expected assigned" BIP. One standard deviation would come out to .020 outs / BIP (sqrt(.3*.7/517). Andruw Jones is +28 outs per 517 BIP, or 2.7 standard deviations (28/517 / .020).

How do all the 640 players-positions do from Pinto's model? 435 of those players were within 1 SD, or 68%. That number should appeal to many people. 68.3% of all samples in a population will be within 1 SD from the mean. 102 player-positions were to the right of 1 SD, and 103 were to the left of -1 SD. What you've got is a fairly normal distribution of fielding talent, and NOT at all any kind of pyramid shape of distribution talent at fielding.

Seeing that we can model fairly accurately the talent distribution of fielding, it becomes a snap to figure out how good a fielder can be at fielding.

The leader in BIP was Tejada at 807, with a group of 2b and ss at around the 720 BIP mark. Essentially, you can say that you've got 800 BIP. At that level, 1 SD is .016 outs / BIP. With 95% of the players being within 2 SD, and I think 99% being 3 SD (someone can correct me), you can essentially say that you've got an upper boundary of 3 x .016 outs / BIP, as to how much a fielder can be better than average. That's .05 outs / BIP. With 800 BIP, that's 40 outs per season, or 32 runs. So, 30 runs is really about the most a fielder can contribute, with two-thirds expected to be within 10 runs. (When I repeat this at the position level, it goes from a low of 23 for 1B/LF/RF to a high of 30 for SS/2B).

Could we have gotten that .05 outs/BIP in other ways? Sure thing. Bring up the ZR chart for any position. You will note that the league leaders and league trailers are around +/- .05 outs / BIP.

So, what you've got is a real hard cap here as to how much a fielder can add.

Posted 2:16 p.m., December 8, 2003 (#12) - tangotiger
  Limiting it to the 210 regulars (30 per position), and we have 146 players within 1 SD (69.5%), 35 to the right of 1 SD, and 29 to the left of 1 SD. Regulars, backups, whatever. The distribution of fielding talent is virtually the same (though the mean of the regulars is ever so slightly better).

Posted 3:10 p.m., December 8, 2003 (#13) - Michael Humphreys
  Beyond fantastic.

I'm relieved that the numbers match up pretty well with DRA. The standard deviation in runs saved at almost all the positions (except first base) was about a dozen, not ten,--so there's still a little too much variation, but not much.

The max of +30 runs is also consistent with DRA. The only exceptions in the DRA sample (beyond a handful of "fluke" single-season values) appear to be Andruw (the ball hog, no doubt, at least from 1998-2001) and Barfield, whose range number is about +20 in his (three or four consecutive) good years, but whose arm ratings take him well above that, to about +30.

Posted 3:12 p.m., December 8, 2003 (#14) - David Smyth
  nice work.

---"(though the mean of the regulars is ever so slightly better)."

Could this be because the sample size of playing time for the regulars is larger, and that after you regress to the mean based on playing time the 2 groups would be almost the same?

Posted 11:13 p.m., December 8, 2003 (#15) - Charles Saeger(e-mail)
  +30 runs would lead the league in CAD in most years as well, and in DFT.

The talent distribution that Bill James brought to everyone's attention is not nearly as skewed for fielding as it is for hitting.

One thing that I have noticed (and I wrote about it in a BBBA once upon a time) is that the gap in terms of runs between the best and worst fielders at any given position is about the same, save for maybe catcher, where teams value the ability to catch more than differences in that ability. The reason I have always assumed for this is that teams will move a crappy shortstop or center fielder, thereby keeping those positions responsible for the most runs from having a really lousy performer afield. It's something to keep in mind at a less-demanding position, primarily first base (because if you're a really good left fielder, you play center field).

Posted 8:25 p.m., January 14, 2004 (#16) - ColinM(e-mail)
  Hey guys,

I wasn't sure where to post this, but here seems like a good spot. I was wondering if a couple of you could help me out a bit?

One of the things that I spend a lot of time doing is evaluating historical seasons. Comparing the top players in a given year, over time, etc... Of course the most difficult part of it all is trying to come up with a good estimate for defensive value. But now it seems like there are finally a few really solid systems out there that do a pretty good job of measuring this. The problem is, most of them aren't public.

Right now I have a database with Win Shares and can get Davenport's numbers from the net. Would any of you with your own systems be willing to share the results? Not the methods, just the final numbers would be great. Charles Saeger and Michael Humphreys, I've read the descriptions of you're methods and would love to see the numbers if you wouldn't mind. Same for anyone else who has a good system and is willing to share.

If any of you have this stuff already published would you be able to post a link? Or email me if you'd rather. Thanks for any help.

Posted 11:26 p.m., January 14, 2004 (#17) - studes(e-mail) (homepage)
  Well, I don't have anything for you, Colin. But are you willing to share your database?????

Posted 1:21 p.m., January 15, 2004 (#18) - ColinM
  studes,

I'd love to share the DB, how could I ask for other people's stuff and not share mine! The thing is, I don't know if I can. As I'm sure you know, STATS sells a digital update to the Win Shares ebook that contains WS for every player. If I share a database with the same info am I infringing on this? OTOH the formula for WS is published. If I were to use this formula to calculate all of the historical WS myself, why can't I share the work I've done?

Probably I'm just being paranoid, but if any of the legal types that hang out here know the answers, let me know.

Posted 2:43 p.m., January 15, 2004 (#19) - studes (homepage)
  Colin, check out the copyright of the Win Shares book. I think that as long as you don't sell the database (ie. for commercial purposes), you can do whatever you want with the info. I went through this with a Stats salesman already, as you can imagine.

Posted 5:16 p.m., January 15, 2004 (#20) - Silver King
  I enjoy similar diversions to what Colin mentions, and am therefore also interested in 'best available' defensive info. I'm not very interested in Win Shares though, figuring Davenport's is at least as good (and I like the idea of the historical adjustments for comparing players across time, even though it's untransparent (to me)).

I'd _love_ to eventually have Michael Humphreys' results available, base on what I read of people's excitement over his innovation. And of course, for the most recent times, MGL's UZR.

Posted 6:20 p.m., January 15, 2004 (#21) - Michael Humphreys
  Silver King,

Post your e-mail and I'll send you the results. The editors at baseballprimer have apparently been unable to get the full results posted. I'd put everything into a simple Excel spreadsheet that I had hoped could just be linked without formatting (as has been done with several UZR files), but maybe there's some other technical difficulty.

Posted 7:22 p.m., January 15, 2004 (#22) - Tangotiger
  Michael,

If you like, I can post it at Primate Studies in the morning.

Tom

Posted 8:02 p.m., January 15, 2004 (#23) - studes(e-mail) (homepage)
  Michael/Tango: if it's not posted on Primate Studies, please send me a copy of the Excel spreadsheet, too. Thanks.

Regarding Davenport's numbers, is there some way to get them other than looking up each player one by one? Seems like a lot of work to me.

BTW, how's the book coming, Michael?

Posted 11:14 p.m., January 15, 2004 (#24) - Michael Humphreys
  Studes,

The book is coming along slowly because I'm having to cut and paste a lot of data from public sources (Retrosheet, etc.) so there won't be any restrictions on use.

Tango,

I'll send you the file.