Pitching and Defense
M1421: Sabermetrics, Scouting, and the Science of Baseball
MIT ESP – HSSP Summer 2008

Wins and Losses

• Wins and losses are the measure most commonly used to judge pitchers

• A pitcher’s record is heavily dependent on the quality of his offense, bullpen, and on luck

• For example, Chris Young went 9-8 last year with a 3.12 ERA, while Tim Wakefield went 17-12 with a 4.76 ERA. The Padres scored 3.76 runs per game when Young started, the Red Sox scored 5.37 for Wakefield.

Earned Run Average (ERA)

• ERA is much better than wins and losses, since it tells us only about what happens when the pitcher is on the mound

• However, it is still far from perfect, as it is very dependent on how a pitcher does with runners on-base, which is something most pitchers have little control over

• In 2005, Jarrod Washburn allowed 184 hits, 51 walks, and 19 home runs, and struck out 94 hitters in 177.3 innings, posting a 3.20 ERA and getting a $40 million contract from the Seattle Mariners. In 2006, he allowed 198 hits, 55

Component ERA (ERC)

• Component ERA solves the problem with ERA by looking at what a pitcher’s ERA should have been, based on his component statistics, like hits, walks, and strikeouts

• We can evaluate hitters just like we do pitchers, using the BaseRuns formula to compute how many runs they should have allowed

• Washburn’s 2005 ERC was 4.18, much closer to his 2006 ERA

DIPS/FIP

• Unfortunately, component ERA is too far from perfect, because a pitcher’s hit total is still heavily influenced by luck and the fielders behind him

• Almost a decade ago, Voros McCracken found that batting average on balls in-play (BABIP) was very inconsistent from year-to-year: The best pitchers in baseball allowed roughly the same BABIP as the worst

• He invented a formula, DIPS,

Pitchf/x

• A new system that has come online in the past two years promises to provide lots of future innovation in pitching analysis

• Pitchf/x measures the speed, location, and break of every pitch

• Eventually, we will be able to better understand what makes a good pitch and faster identify true changes in talent level

• Extensions of the system could also help us evaluate fielding and hitting

DEFENSIVE EVALUATION

• We want to be able to evaluate a (position) player’s defensive skills/talent/value in such a way that we can combine it with his offensive (and baserunning) value in order to get a:

• Total value for a player

• The best format for an evaluation of any of a player’s skills is runs, since that is the currency of baseball, and runs can easily be converted into estimated wins.

• So, no matter what method or metric we use to evaluate defense, we ultimately want to present it as runs above or below some baseline. A good baseline for defense is league average for that position.

• And of course when we present a metric (a statistic or formula that measures something tangible) in runs, we need to specify per how many games, PA, defensive chances, etc.

“You got it! No, you got it!”

THE DEFENSIVE SPECTRUM

DH * 1B * LF/RF * 3B CF 2B * SS * C

• Each position in the spectrum requires more skill than all the positions to its left and less skill than all the positions to its right.

• Because of this, as you move to the right in the spectrum, there are fewer and fewer players that can play each position.

• Because of this, and for other reasons, as you move to the right, players generally hit worse.

• If a player were to play at one position and then move to a position to the left, he will be “better” at the new position.

• As players age, and their defensive skills deteriorate, they tend to move to the left. Sometimes a team realizes that a young player cannot play his position adequately, and they move him to the left.

• Even though catchers are at the far right of the spectrum (and very few people can play it adequately), moving to the left will probably not “cause” a player to improve defensively. Catcher is a unique position and you can make the argument that they do not even belong on the spectrum. However, they are the worst hitters, at least in recent times (since around 1990).

• Players rarely move to the right in the spectrum, and if they do, it is usually a bad move.

FIELDING METRICS

Fielding Percentage à Range Factor à Range (“Poor Man’s” ZR) àZR àUZR

(DER, WOWY)

Player #1: So, what is your fielding percentage this year?

Player #2: .978. What is your Range Factor?

Player #1: 3.48 per 9 innings. How about your Zone Rating?

Player #2: It’s .708. How is your UZR?

Player #1: Good. +6 per 150. How’s your WOWY?

Player #2: What the he** are you talking about?

FIELDING PERCENTAGE (FP or FA)

“Oh, crud, I made an error. There goes my fielding percentage!

Maybe I should use a glove next time.”

FIELDING PERCENTAGE

• Putouts + Assists divided by Total Chances

• Also, one minus the percentage of errors that a fielder makes relative to his total chances.

• Total chances = Putouts + Assists + Errors

• FP tells us how good a fielder’s “hands” are. It tells us how “error prone” a fielder is. It tells us nothing about how good a fielder’s range is.

• It is also subject to the whims of the official scorers.

• Range is more important than “hands.”

• All we really care about is how many plays a fielder makes. After all, there is virtually no difference between an error and a hit. If a fielder gets to a ball and commits an error, it is essentially the same thing as if he never got to the ball in the first place.

• If we wanted to (and we probably don’t, since FP is such a bad measure of fielding talent/value), we could convert FP into runs.

• How could we do that?

RANGE FACTOR

“Oh, yeah, I got range! That’s right!”

RANGE FACTOR

• Range Factor (RF) = Putouts + Assists divided by innings played.

• Better than FP because it gives credit to a fielder for every play he makes and does not penalize a fielder if he makes a lot of errors but gets to a lot of balls that other fielders might not get to.

• It will penalize players who do not make a lot of errors but who don’t get to a lot of balls, as well it should.

Problems with RF

• It does not take into consideration the number of opportunities.

• It treats putouts the same as assists for infielders.

• We can also convert RF into runs saved or cost, or runs above/below average.

• How would we do that?

DAVID GASSKO’S “RANGE,” OR A “POOR MAN’S ZR”

“I am so poor, I can’t even afford a Zone Rating.

I got to get out of the minor leagues and start making some money.”

DAVID GASSKO’S “RANGE,” OR A “POOR MAN’S ZR”

• It determines each fielder’s estimated opportunities by the following:

• It computes the number of balls in play to the outfield and to the infield, based on a teams’ pitchers’ outs recorded (3 * IP) minus K minus outs on base minus DP.

• If G/F ratios are available for a team or for its pitchers, it separates those BIP into infield (ground balls) and outfield opportunities (fly balls and line drives to the OF). If not, these can be estimated from league averages.

• If the number of TBF by LH and RH pitchers are known, we can use them to apportion the IF and OF opportunities to each fielder. If not, then these can be estimated as well.

• Now that we know, or at least can estimate, each fielder’s opportunities, we can divide each outfielder’s successful plays (catching a fly ball or line drive) by his opps, and divide each infielder’s successful plays (turning ground ball into an out) by his opps. We ignore infield pop-ups and line drives.

How do we figure out each fielder’s successful plays?

• For outfielders, it is simple and clean. All putouts are, by definition, successful catches.

• For infielders, other than at 1B, we use assists and ignore putouts, although some assists are on relay throws and some putouts are ground balls without a throw.

• For 1B, many of his groundball outs are putouts (no throw). We want to make sure that we don’t miss those, so we can take his total putouts and subtract the assists from the rest of the infielders.

• The final number is successful batted ball plays, not including pop ups and line drives for infielders, divided by opportunities. An error is treated like a hit – simply a missed opportunity. This system penalizes a player for making an error or not getting to a ball that was presumably hit in his general location, and gives credit for the opposite.

• That is exactly what a defensive evaluation system is supposed to do!

What are the weaknesses of Range?

• It only estimates successful plays and opportunities for all fielders. It includes some noise.

• It is not really sure how many ground ball outs a first baseman makes.

• It does not know how hard a ball is hit, where it is hit, where the fielder might be playing (based on the batter, outs, and baserunners), and other things.

• These things tend to “even out” in the long run, so over several years, Range is a very good indicator of defensive talent/value.

• How can we turn Range into runs saved or cost?

Zone Rating (ZR)

“I’m in the Zone baby!

Maybe next time I’ll be out of the Zone – baby!”

Zone Rating (ZR)

• Uses hit location data.

• Records how many fly balls and line drives are hit in each outfielder’s “zone” and how many of these are turned into outs.

• Does the same thing for infielders and ground balls.

• A fielder’s “zone” is defined as that area of the field around a fielder in which that fielder makes at least 50% of the plays.

• Revised Zone Rating (RZR) also keeps track of plays made outside of a fielder’s zone.

Weaknesses

• It does not distinguish between the location and speed of each ball within a zone, or even outside of a player’s zone.

• It arbitrarily creates a “zone” for a player.

• It treats an error the same as a hit.

• Like most of the systems so far, it does not account for the position of the fielders.

Again, how can we turn a ZR into “runs?”

Defensive Efficiency Rating (DER)

“There is no “I” in “teim!”

Or is there?

Anyone want to argue with me?

I didn’t think so!”

Defensive Efficiency Rating (DER)

• This is only used on a team level.

• We can easily figure out how many balls in play (BIP) a team allows.

• DER = outs on non-HR balls in play divided by non-HR BIP.

Weaknesses

• It does not distinguish between ground balls, line drives, and fly balls, and each of these has different out percentages. (If you have batted ball type data or pitcher G/F ratio, you can adjust for this.)

• It does not account for how hard each ball is hit, the exact location, or the position of the fielders – similar to the weaknesses of Range.

• It is only useful for teams and not for individual players, so it is limited as a projection tool.

Without and With You (WOWY)

• Compares how many plays a fielder makes compared to how many plays everyone else makes at that position, with the same pitchers, and in the same park.

• It is “clean.”

• It is easy for everyone to understand.

• You can’t blame or credit your bad or good defense on anyone or anything else but yourself.

• It can be used for other things that are hard to evaluate, like catcher defense, stolen bases against pitchers, double plays, or first basemen “scoops.”

• It may suffer from sample size issues. The more you try to control for the context, the smaller the sample size.

Ultimate Zone Rating (UZR)

“You’ve heard of ‘Ultimate Frisbee?

(Sound of crickets.)

O.K., maybe you haven’t. Maybe I’m stuck in the 70’s. Or maybe you guys are just too darn young!”

Ultimate Zone Rating (UZR)

• Uses detailed hit type and location data.

• Accounts for the position of the fielders.

• Uses park factors.

• Uses the exact hit value for each batted ball.

• Treats errors differently than hits.

• Creates “buckets:”

– Type, location, and speed of batted ball.

– Handedness of batters.

– G/F tendency of pitchers.

– Baserunners and outs.

– Infield and outfield (3 of them) park factors.

• For each bucket, calculates the percentage of plays made by every fielder.

• Uses that to compute each fielder’s UZR in runs.

Weaknesses

None!

2008 Red Sox

The Future of Defensive Evaluation

• Hit f/x

• Tracks the position of all fielders before the ball is hit.

• Exact landing location, speed off the bat, and hang time (for fly balls and line drives) of all batted balls.

• Given the exact characteristics of a batted ball and the exact position of a fielder, we can calculate exactly how often an average fielder does or does not successfully field any given batted ball.