This post has long been in the works. Not necessarily in the physical sense, but in the mental sense. Ever since reading my first Baseball Prospectus article more than 5 years ago I have longed to have a better understanding of the REAL numbers behind baseball. It's frustrating living in a world full of outdated stats which don't always tell the whole story. None are more frustrating than RBI. I have come up with a way to better compare RBI numbers. I'm going to share this stat with you.
It seems as if almost every year when the discussion for MVP comes up, each candidate's RBI numbers are compared and a winner is determined by whoever has the most. This seems unfair to me. Not every player has the same opportunities to drive in runs. Why penalize a player for having ~5 less RBI when he had the opportunity to drive in 1000 runs, whereas the player who had ~5 more RBI had the opportunity to drive in 1150 runs? These players are clearly not being compared on an even plane. Think about it like this: When deciding a Cy Young Award winner, would you take into account the fact that one pitcher started 3 fewer games? I would. Now, this is slightly different because in a pitching comparison you may take into account the fact that the better pitcher was able to start more games, either by not being injured or not being in the minors, etc. However, when comparing RBI opportunities, both players may have an equal number of games played and even plate appearances, however, one of the players had an opportunity for more RBI simply by his position in the batting order and the success of his teammates.
In order to alleviate the problem that different RBI opportunities create, I propose a new stat. I'm going to call it RBI Efficiency, at least for now. It is a stat that can show how successful a player is at driving in runs compared to the opportunities that he has to drive in runs. This number can be shown as a decimal, but I prefer to show it as a percentage. This percentage literally says that player X will drive in a run XX% of the time he has the opportunity to drive in a run.
The calculation of this stat is actually fairly simple, but I don't believe anyone has come up with it yet due to a lack of statistical information being readily available until recently. www.Baseball-Reference.com has a link to batting split-stats for every major league player, broken down by year. This information is vital to calculate RBI-e. By looking at how many plate appearances each player had in different RBI situations, and then weighing how successful they were versus how successful they could have been, you can come up with their RBI-e.
Later in this post I will break down the 2010 Royals RBI-e, as well as a few other players in the AL for the 2010 season, but for now I will break down how this stat is calculated.
Every time a player comes to bat, he has an opportunity to drive in at least 1 run. This is true no matter the game situation, no matter the outs, and no matter the inning. If the bases are empty, the player has the "opportunity" to drive in 1 run. That is to say, in a perfect (offensive) world, each player would come to bat and hit a solo home run. The goal of baseball is to score runs, and this is the simplest, most efficient way of doing so. Since that player has the "opportunity" to score 1 run, all of his RBI with the bases empty are measured against the number of plate appearances he had. For instance, in 2006, Mark Teahen had 234 plate appearances with the bases empty. He had 8 RBI in such situations, therefore, when calculating his RBI-e you start with his efficiency being 8/234. In reality, a very small percentage or runs are scored this way, and so a player's true efficiency will not very well be reflected by his efficiency with the bases empty.
As we continue to determine a player's RBI-e, I find it helpful to create a small chart that looks something like this:
PA RBI e
empty 234 8 8/234
1 on xxx xx xx/xxx
2 on xxx xx xx/xxx
full xx xx xx/xx
total: xxx/xxxx e: xx.xx%
Continuing with our calculation of Mark Teahen's RBI-e in 2006, he had 146 plate appearances with 1 man on. In those situations, (either with a man on 1st, 2nd, or 3rd) he had the opportunity to drive in 292 runs (again, think of ultimate success as a home run, in this case, a 2 run shot). He drove in 35 RBI. His efficiency with 1 man on was 35/292, giving him a running total of 43/526.
Mark had 51 plate appearances with 2 men on. He could have hit 51 3 run home runs, meaning he had the "opportunity" to drive in 153 RBI. He drove in 22 Runs. This brings his running total to 65/679. Mark had 8 plate appearances with the bases loaded. He had the opportunity for 8 grand slams, for a total of 32 RBI. He drove in 4 runs for an "e" of 4/32.
Mark Teahen's RBI-e in 2006 was 69/711, 0.970, or 9.70%. This is to say Mark Teahen drove in 9.70% of the runs that he had the opportunity to drive in. If we take a look back at our chart from earlier, and finish filling it out, it looks like this:
PA RBI e
empty 234 8 8/234
1 on 146 35 35/292
2 on 51 22 22/153
full 8 4 4/32
total: 69/711 0.970 RBI-e=9.70%
No stat has meaning without understanding how different players compare, but before I break down other players, I'd like to discuss what this stat is meant to do, and its flaws. This stat is NOT meant to determine how well a player did compared to how well an average player might do. To better understand what I mean by this, consider the following. I lump together all of a player's plate appearances with one man on. This may seem counterintuitive at first, because a player is expected to drive in many more runs with a man on third and no outs than he is expected to drive in with a man on first and 2 outs. However, this stat is not meant to weigh individual expectancies, it is simply a tool to help measure different players' efficiency when it comes to driving in runs. You may argue that this puts certain players at a disadvantage, especially those that do not hit home runs very often. I would agree, however, as I discussed earlier, in a perfect world, a home run would be the result of every at bat, and so by putting players who do not hit home runs at a disadvantage, we are simply viewing them in a truer sense of their value when compared to other players.
That being said, let's look at a break down of the 2010 Royals. There were 11 players who drove in at least 30 RBI in 2010. Those players were:
78 Billy Butler
78 Yuniesky Betancourt
77 Jose Guillen
56 Alberto Callaspo
51 Scott Podsednik
43 Wilson Betemit
39 Mitch Maier
37 Jason Kendall
37 David DeJesus
32 Mike Aviles
When you look at RBI totals alone, Butler, Betancourt and Guillen were clearly the standouts. Let's take a look at their RBI-e now.
8.27% Wilson Betemit
8.02% Jose Guillen
7.88% Yuniesky Betancourt
6.98% Billy Butler
5.99% David DeJesus
5.75% Alberto Callaspo
5.71% Mitch Maier
5.65% Scott Podsednik
4.60% Mike Aviles
4.70% Jason Kendall
As you can see, the names at the bottom of the list probably don't surprise you, except maybe Aviles. However, Wilson Betemit being at the top came as a bit of a surprise to me. Betemit drove in 43 runs last season, but he only had the opportunity to drive in 520, which was less than half the amount that Billy Butler had the opportunity to drive in. Wilson Betemit did a better job of utilizing his RBI opportunities than Butler did, and, theoretically, if Betemit had as many opportunities as Butler, he would (should) have driven in more runs. Butler's efficiency is simply not as good as it could have been. In order to gain more perspective let's take a look at how Butler has been since reaching the majors in 2007.
Billy Butler RBI-e
2007 8.74%
2008 7.30%
2009 8.62%
2010 6.98%
As you can see, these numbers bounce around a little bit, he seems to have had some success his rookie year, regressed a bit in his sophomore year and then taken another step forward in 2009. However, last year he regressed a bit again, at least when looking at his ability to drive in runs. Let's take a look at another fairly young player with a similar number of plate appearances in his career.
Evan Longoria RBI-e
2008 9.80%
2009 10.03%
2010 9.23%
Clearly, Longoria does a better job of utilizing his opportunities to drive in runs. In fact, in 2010, Longoria had the opportunity to drive in 1127 runs. Butler had the opportunity to drive in 1117 runs. Butler drove in 78, while Longoria drove in 104. Now that we have a better understanding of efficiency, we can determine that while Butler drove in 26 fewer runs, he had 100 less "opportunities" to drive in a run. When you combine both of these numbers you can see that he was simply not as good as Longoria at driving in runs.
Just for fun, let's take a look at Miguel Cabrera's 2010 season (2010 AL RBI leader). Miguel had the "opportunity" to drive in 1127 runs, which is the exact same number that Evan Longoria could have driven in. However, Miggy drove in 126 runs, while Longoria drove in 104. Miggy's RBI-e Then was 11.18%. He was clearly better than Longoria.
I hope this stat will be useful for helping to determine not only players' seasons as they compare to other players, but also as they compare to their own careers. This stat is not meant to be an end-all, be-all kind of stat, and I doubt you'll find it on the back of a baseball card anytime soon. But it can be a useful tool when comparing different players, especially when looking at players who bat in different spots in the lineup and who play for teams with largely different levels of runs scored. Thanks for reading.
"For instance, in 2006, Mark Teahen had 234 plate appearances with the bases loaded." I think you meant bases empty? That confused me for about a minute lol.
ReplyDeleteI think this stat is really cool! :D It is interesting that no one made it before, but I do think there is a big difference between a man on 1st & 2 outs and a man on 3rd & no outs. But it's still cool, thanks for telling me about it! :)
You wanted me to help edit your blogpost, right? Well here goes...
ReplyDelete"Why penalize a player for having ~5 less RBI..." shouldn't it be fewer, not less?
"However, when comparing RBI opportunities, both players may have an equal number of games played and even plate appearances, however, one..." OK first of all it's weird to use "however" twice in one sentence, & the second one needs a semicolon.
"He had 8 RBI in such situations, therefore, when..." you need a semicolon here too.
"In reality, a very small percentage or runs..." you mean of, not or, right?
"PA RBI e
empty 234 8 8/234
1 on xxx xx xx/xxx
2 on xxx xx xx/xxx
full xx xx xx/xx
total: xxx/xxxx e: xx.xx%"
^this whole thing is extremely hard to read. maybe insert a table (if possible) or change to a monospace font like courier so it lines up like a grid? same goes for all the other charts.
"8.27% Wilson Betemit
8.02% Jose Guillen
7.88% Yuniesky Betancourt
6.98% Billy Butler
5.99% David DeJesus
5.75% Alberto Callaspo
5.71% Mitch Maier
5.65% Scott Podsednik
4.60% Mike Aviles
4.70% Jason Kendall"
is this supposed to be in order from greatest to least? is Jason Kendall out of order or is there a typo?
"As you can see, these numbers bounce around a little bit, he..." should be a dash or semicolon instead of a comma...as you might have noticed, incorrect comma usage is one of my pet peeves :)
"Now that we have a better understanding of efficiency, we can determine that while Butler drove in 26 fewer runs, he had 100 less 'opportunities' to..." again, should be fewer not less
hope that helps :) haha
I was just searching for this type of stat and came across this article. Great post. I can't believe this information (RBI/opps) isn't easier to find.
ReplyDeleteI have a running argument with my college teammates about RBI being worthless in their current form. So much of it depends on the team you are on, or the spot in which you hit in the lineup. This is not to mention the speed/skill of the base runners around you. For the sake of simplicity lets say that Player A triple slashed .300/.400/.500 hitting in the 3rd spot for the 2012 Mariners while Player B triple slashed .250/.350/.450 in the middle of the 2012 Tigers lineup. Good chance that Player B will end up with more RBI than Player A even though he was worse hitter. Just a few of the reasons that RBI-e is the only factor that should be taken into account.
ReplyDeleteI was wondering if some work has been done using Tango's weights with RBI efficiency.This is important because not all "one runner on" situations are equal. Great work on this post!