Tuesday, October 10, 2006

The Rant

I didn't want to do it. I was saving this up for the offseason. The time, however, has arrived. Baseball is the ultimate game of statistics. It's a series of individual matchups between pitcher and catcher that can be nearly completely explained in numbers. Hits, walks, doubles, homers, GIDPs, HBPs, RBI, runs, wild pitches, strikes, balls, pop ups, line drives, errors, etc. It's all out there and somebody is keeping track of it.

When I was growing up, the measure of a hitter boiled down to a few things that were easy to measure and easy to understand. Batting average. It didn't get much simpler than that. The best hitters had the highest batting averages. Number of hits divided by number of at bats. Simple math for even a kid to figure out and it looked great on the back of a baseball card. Home runs. Those were the power hitters. 50 was a magic number back then. RBI. The clutch hitters. They didn't necessarily need the home runs or the high batting average, they got the job done when it mattered. They were the guys you wanted up with the game on the line. These were the numbers that stood the test of time and were cited as evidence of greatness for decades. Pitchers had similar numbers. Wins and losses. Those were the guys that knew how to get the job done. They might give up some runs here or there, but when it mattered they could buckle down and get the win. ERA. These guys didn't even let you score. The best pitchers always had the best record and the best ERA. Hall of Fame credentials? 3000 hits. 500 Home runs. 300 wins. Everybody knew.

Those are the numbers I grew up with. Those were the things I kept track of on baseball cards.

And then a funny thing happened. Some really smart people started looking at numbers. Bill James is widely considered the godfather of it, but he wasn't the first. It started in the 1920s and 1930s, but remained on the fringe. They started coming up with new ways of quantifying a players contribution to his team. Terms like On Base Percentage (OBP), Slugging Percentage (SLG), Strikeout rate (K/9 IP), Walk rate (BB/9 IP), and a whole host of others started getting tossed around. They used math and computers to analyze teams and see what made them good, bad, or in between. They started tracking statistics to see which ones predicted future performance and which ones seemed to be random from year to year. They made hypotheses and tested them. They turned the analysis of baseball into a science.

The backlash. You're telling me a bunch of number crunching geeks no more about baseball than legends that have been part of the game for decades? Not necessarily. It's just that the human mind is easily fooled by perceptions and habits. Teams won championships and lost heartbreakers and made comebacks and folded under pressure for a long, long time. Clutch hitters came and went. Managers made the same decisions with the same rationale for a century. .300 hitters are great. A pitcher with 20 wins was great. That's the way it was and that's the way it was going to be.

Change is hard. It's difficult to admit to yourself that the way that you used to see things just isn't right. I was wrong? Yep. But I've listened to some of the smartest men in baseball tell me that, they can't all be wrong can they? I think a little quote from the Matrix is apt here. "You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit hole goes."

So how far exactly does this rabbit hole go? Let's start with hitters. Batting average has probably been the most widely cited statistic in the past 100 years. But why? It's supposed to tell you how good somebody is at getting hits. What's so special about a hit? It gets you on base. It moves the runners along. Wouldn't it be interesting to look at On Base Percentage as being more meaningful? I mean a walk is almost as good as a hit right? Well, not exactly. Sometimes hits go for extra bases. Sometimes hits drive in a runner from 2nd base. I hear you, but let's leave the extra base hits for later. Back to batting average. Why doesn't it matter? Well, let's go back to that .300 hitter. He's been lauded for years. But what has he really done? Is getting a single 3 times out of 10 at bats really a great performance? I'm going to say no. Not when there are other players that can get a "hit" slightly less frequently but make a much, much bigger impact on the offense. Rationally I think we can all agree that a single, double, triple, and home run are not equally valuable. So why should we treat them equally? On Base Percentage simply takes the number of times a player reaches base (Hits + Walks + Hit By Pitch) and divides it by the number of times a player came up to the plate (At Bats + Walks + Hit By Pitch + Sac Flies) and it simply gives a measure of how frequently a player was successful in reaching base, or in more important terms how often he avoids making an out.

Yeah, yeah, OBP is important. But so what? It still doesn't take into account lots of what hitters do. Well, if they don't just get on base, what do they do? They hit the ball hard or soft. This brings us to Slugging Percentage. What is it? Simply put it's the number of total bases a player gets divided by number of at bats. That's (Singles + 2xDoubles + 3xTriples + 4xHomers) divided by At Bats. It is basically a measure of what a player does when he gets a hit, with the understanding that not all hits are created equal. Not unsurprisingly it favors players that hit the ball hard.

OK, I get the point. But what about things like RBI? The run producers. The guys that win games. Let's start with a basic premise I think everybody can agree on. When a run is scored (aside from a home run), credit goes to 2 players. One player got on base and the other knocked him home. The relative credit for any run depends on the situation. Driving in a runner from 3rd with nobody out is just a tad easier than driving in a runner from 1st with 2 outs, correct? Another thing to keep in mind is that different hitters will have a different number of RBI opportunities over a season. Alex Rodriguez hit cleanup for the best lineup in the majors this past season. He had guys on base all time time. In fact, in the span of 572 at bats this year he lead the majors with 534 runners on base. ARod managed to drive in those runners 86 times. He also knocked himself in 35 times on HRs. Curtis Granderson? He didn't have quite the same luxury as ARod. He got to bat leadoff for the free swinging Tigers. How many RBI chances did he have? Curtis managed to have 596 at abats this year and only came to the plate with 338 runners on base. He managed to knock them in 49 times (plus himself 19 times on HRs). Alex Rodriguez finished the season with nearly twice as many RBI as Curtis Granderson, yet they were nearly identical in terms of knocking in the runners that were on base for them. Interesting, wouldn't you say?

Let's take a look at some examples of players over 600 plate appearances. Player A will be the .315 hitter with a little bit of power. Player B will be more in the .275 range with his batting average, but draws some walks and hits for some power.

Player A (in 600 plate appearances)
570 at bats
30 walks
144 singles
18 home runs
18 doubles

Player B (in 600 plate appearances)
530 at bats
70 walks
85 singles
24 home runs
36 doubles

Player A hit .316 with a little bit of pop. Player B hit .274 with more walks and somewhat more power. But let's take a look at what they look like in terms of on base percentage and slugging percentage.

Player A: OBP - .350, SLG - .442
Player B: OBP - .358, SLG - .477

Interesting, huh? Player A is a much better hitter based on batting average. Player B has a little bit more power and draws a few more walks, but it more than makes up for the batting average difference in both OBP and SLG. But what does that mean? Who cares? The ultimate goal of finding new statistics to analyze baseball performance is to better understand a player's contribution to his team. When it comes to hitting, how many more runs does he help his team score than somebody else would?

Let's step back to team statistics for a moment, shall we? Wouldn't it be interesting to see a comparison of Team Batting Average and compare it to Team On Base Percentage and see how it relates to how many runs teams score? What follow is a list of the top 5 and bottom 5 teams in MLB in runs scored this year along with their team ranks in OBP and BA.

Rank in Runs Scored - TEAM -----Rank in BA - Rank in OBP
1) Yankees - 1st - 2nd
2) Indians - 3rd - 4th
3) White Sox - 8th - 5th
4) Phillies - 6th - 18th
5) Braves - 15th - 13th
26) Padres - 20th - 23rd
27) Brewers - 25th - 27th
28) Cubs - 29th - 17th
29) Pirates - 26th - 22nd
30) Devil Rays - 30th - 30th

Pretty close correlation, but not perfect. Most teams have an OBP and BA that are pretty close. The only outliers were the Cubs that were an average team in BA, but 2nd to last in OBP. And they ended up with the 3rd worst offense in the league. The Phillies were the other way. Their batting average was worse than the Cubs, but their OBP was 6th best in the league and helped power them to the 4th best offense in the majors. But OBP can't be the be all and end all can it? Funny you ask. OBP and SLG measure diametrically opposed parts of offensive production. Why not just combine them? That's where we get OPS (On Base Percentage Plus Slugging Percentage). Simply add the 2 together to get a more complete picture of a player's offensive production. Let's do the same exercise as last time, except compare team rank in runs scored with team rank in OPS.

Rank in Runs Scored - TEAM --Rank in OPS
1) Yankees - 1st
2) Indians - 4th
3) White Sox - 3rd
4) Phillies - 5th
5) Braves - 6th
26) Padres - 23rd
27) Brewers - 24th
28) Cubs - 27th
29) Pirates - 30th
30) Devil Rays - 29th

Pretty cool, huh? The top 5 offenses in the majors all finished in the top 6 in OPS. The bottom 5 offenses in the majors all finished in the bottom 8 in OPS. It's actually a phenomenal correlation and it holds up year after year. In 2006, only 1 team scored more than 900 runs (Yankees) and they finished #1 in OPS. Another 12 teams scored at least 800 runs on the season and they all ranked from #2-13 in OPS. Another 15 teams scored at least 700 runs and they ranked from #14-28 in OPS. Only 2 teams failed to score 700 runs and they ranked #29 and #30 in the league in OPS. There isn't a single outlier from that. Did you know that only one offense scored more than 900 runs last year? Yep, the Red Sox finished #1 in the majors in OPS and runs in 2005.

Enough about offense for now. There are far, far, far more things you can get in to. Turns out that a better measure than OPS is to take OBP and raise it to a power of something like 1.20 and then add that to SLG because it turns out that OBP is a little more important than SLG. But it's all semantics once you accept that the things you used to think were important aren't any more. For example, you could take things like OPS and compare it to the home park a player plays in and adjust for a "park factor" that accounts for the difference between pre-humidor Coors Field and Comerica Park. Things like "Runs Created" or "Equivalent Average" are all variations of the same theme of taking into account every result of a player's at bats and quantifying them into how much they help his team.

Pitching you ask? What more could I need to know than wins, losses, ERA, saves, and strikeouts? That's how pitchers have gotten into the Hall of Fame since it was invented. That's how announcers have told me who the great ones were and that's how they got voted into the All Star game.

Let's start with Wins and Losses. I think it's pretty easy to accept the idea that a good deal of a pitcher's record can be accounted for by the team he plays on. If he's getting 9 runs a game in run support, he's going to win a lot of games. If he plays for the Pirates and his team is getting shut out all the time, it might be hard to win a game. It's also important to remember there is a huge variation within a team. Justin Verlander ranked #2 in the majors in run support this year at 6.8 runs per game while his teammate Nate Robertson ranked #67 in the majors at 4.5 runs per game. Small wonder that Verlander went 17-9 and Robertson was 13-13 despite nearly identical ERAs.

ERAs? What's wrong with them? Not much. It's a good stat for measuring how effectively a pitcher prevented runs from scoring. But it brings me to another major point about statistics. There are 2 kind of stats in my eyes. The first kind quantify what happened. The second kind are more of an indicator of an inate talent and they predict future performance better than the first kind. How does this relate to ERA? Well, the ultimate goal is to prevent runs from scoring right? How do you do that? Preventing hitters from reaching base would be a good start. There is a simple measure of how many batters a pitcher allows to reach base that is referred to as WHIP and it simply means (Walks + Hits)/(Innings Pitched). Would you believe that WHIP is a better predictor of a pitcher's future ERA than his ERA itself is? Yep. As you could imagine, there are lots of little bits of luck here and there over the course of a season that can impact ERA. WHIP is less impacted by luck and more indicative of a skill. I actually think ERA is a pretty good stat, but it's not the best that is out there and needs to be taken into context.

If you really want to start looking at pitching stats, take a wander into the land of DIPS. That's Defense Indepedent Pitching Stats. It operates under the assumption that to rate a pitcher you should only credit him for things he can control. For example, did the pitcher make a good pitch if the hitter hits a rocket line drive right at the 3rd baseman who happens to catch it? There is fairly good evidence that the only things a pitcher can control are his Strikeout Rate, Walk Rate, and HR allowed Rate. There is little to no ability to influence the opponents batting average on balls in play (BABIP). In other words, the pitcher controls what happens when the defense isn't involved on K's, BB's, and HR's. But all pitchers are the same on every other result from pop up to line drive to grounder and the only influence is a combination of luck and the skill of the fielders behind the pitcher. Now don't get me wrong, it isn't quite perfect. But it's damn close. And it really opens a can of worms when you look at traditional baseball thinking.

I could go on about things like clutch hitters and why that is an antiquated notion, but I'll save that for another day. You can take the blue pill and go back to believing whatever you want or you can take the red pill and see how far the rabbit hole goes. The choice is yours. It took me a while to come around on this myself.

Labels:

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home