Using Pitcher Hitting to Build a Time Machine



Details of how the calculations were done

After posting the article on Nolan Ryan, there was some good discussion on the Bill James site Reader Posts section. The Ryan methodology wasn't 100% serious, just the product of curiousity and the availability of Retrosheet data. I knew going in that even if the average batting line in 2020 looks sort of like Dave Kingman, they were not really equivalent. Dave Kingman struck out nearly 30% of the time while mostly facing pitchers who threw 85-88 MPH, with an occasional Nolan Ryan or Tom Seaver mixed in. Today's hitters face average velocity around 93 MPH. In addition, they are much more likely to face a fresh reliever instead of a starter working his 3rd or 4th time through the order. Kingman facing today's pitching would probably strike out around 40 percent of the time, almost as bad as he and the other batters from that study did against Nolan Ryan.

Fine, but how can we tell how much better the batters of today are? Guy123 in that thread brings up pitcher hitting. I've known Guy online for many years, at TangoTiger.com and Baseball-reference.com, and this is not the first time we've discussed this idea. But I've gone farther in the research this time around. Basically, pitchers are not selected to play because of their hitting ability. No manager has ever pondered replacing his struggling #5 starter and said "well, he's giving up too many runs, but he's a much better hitter than most pitchers so we'll give him a few more tries." In the early years of baseball pitchers hit much better than they do today (well, than they did in 2019. For all I know the last pitcher to swing a bat has already done so.) Over time, the diminishing pitcher hitting stats do not represent pitchers getting worse. In fact, they are the same. The everyman. The constant. That stats only change because of changes around them - the pitchers they face over time are a lot tougher to hit.

When I first heard about this theory - that we could actually use the change in pitcher hitting to quantify pitcher improvment over time - I was skeptical. What about the DH? Maybe the theory held until 1972, but after that half the pitchers don't get to hit. Furthermore, they don't hit much in the minor leagues. In recent years in the minors pitchers only went to the plate if A) they were in AA or AAA and B) They played for a National league farm club against another National league farmclub.

Wouldn't that imply a huge barrier to pitcher hitting, and make them worse regardless of the competition they faced? A reasonable example would be a pitcher who was a good hitter in high school, playing short or the outfield when he wasn't on the mound, and then focusing 100% on pitching through 3 years of college. Let's say this guy is not a top prospect who rockets his way through the minors. He spends 5 years in the minors of an AL team before being traded to a National league team. He then gets the call and a shot in starting rotation. All of a sudden, he is facing a major league pitcher, despite not having any reason to swing a bat for the last 8 years!

But also consider the case of Jon Lester. He may or may not have been a good hitter in high school, I have no idea. He signed with the Red Sox out of high school. He didn't bat at all in the minors from 2002 to 2006. When he made the majors, he had David Ortiz batting for him most of the time, he only swung the bat during at most a few interleague games each year. He was hopeless at the plate, 0 for 36 through 9 seasons. Now 31 years old, the Cubs signed him to be their ace and to lead the rebuilt team to a world championship, a plan that actually worked. Now he would have to bat 60 times a year instead of 4. He continued to struggle for a while. He was 0 for 30, extending his hitless streak to 66 at bats to start his career, when he finally got a lucky single. From that point forward he was a .135 hitter, which was slightly better than the overall pitcher average from 2015 to 2019. He even cracked a homer per year from 2017 to 2019. That's 12 years off from hitting, and a half season crash course in trying not to embarass himself out there.

Anyway, the data don't show an acceleration of the decline in pitcher hitting after the DH. In 1972, pitchers hit 146/185/184. As recently as 2007 they hit 146/178/188. Most of the decline in pitcher hitting happened before the 1960s, or within the last 10 years. The pitcher overall batting line for 2011 looks very similar to that from 1963. From 2011 to 2019 the pitcher batting average dropped from .141 to .128. It was only .115 in 2018. In the same time, the pitcher-batter strikeout rate increased from 38 percent to 49 percent. This is probably just due to their improved abilities on the mound. The average fastball was 93.1 MPH in 2019, a full mile faster than 8 years earlier. In addition, more pitchers are optimizing their spin rates and have better information about which pitches work best together. The result - more swings and misses, and an especially difficult task for the weakest batters in the game - the pitchers themselves.

So, if we can accept that pitchers as batters represent a usable baseline, here are the next steps I used to build a Flux Capacitor that can tell me how any batter at any point in time would do against pitchers and game conditions in other eras. First, I broke the batting lines into component rates which are independent of each other. You won't find slugging percentage here as that is a combination of different skills.

  1. HBP - I didn't do anything with this as it's an infrequent occurrence. I just assumed a HBP in one era is a HBP in any other.
  2. BB/PA Does this PA end in a walk?
  3. SO/AB If not a walk, does the batter strike out?
  4. HR/(AB-SO) If he makes contact, does the ball leave the yard?
  5. (H-HR)/(AB-SO-HR) or BABIP - If the ball is put in play, does he end up with a hit?
  6. (2B+3B)/(H-HR) If he gets a hit, does is it an extra base hit?
  7. 3B/(2B+3B) If it's an extra base hit, does the batter make it to third?

Now I can transform these rates using the odds ratio, to see how a player will do in a different environment. Then take the transformed rates, and step by step recombine them into a new stat line.

Here's an example of the odds ratio. Batter strikes out 20% of the time in a league where pitchers strike out 35%. How would he do in a league where pitchers strike out 45% of the time?

New K rate = (.2 * .45 / .35) / ((1-.2)*(1-.45)/(1-.35) + (.2 * .45 / .35)) = .28

Note: Formula corrected, originally published with an error.

As alternatives, you could use the additive or multiplicative methods. For additive, player is moving to a league where pitchers strike out 10% more. So he now strikes out 30%. For multiplicative, you would get .2*.45/.35 = .26. The odds ratio will give you something in between those two methods, but it also will not give you nonsensical results (like a negative strikeout rate, or one greater than 1). I tried several historical examples of groups of batters facing a pitcher with a very high strikeout rate -far greater than the league average. In addition to Ryan, I looked at Bob Feller, Dazzy Vance, and Sam McDowell. In each case, the result comes in somewhere between the additive and multiplicative models.

I did run into a problem with using pitcher-batter rates for timelining. It works reasonably well with strikeout rates, walk rates, and BABIP. But the big problem is homeruns. Pitchers had a HR/CON rate of .011 in 2019. This is one of the highest rates ever. In 1920, the rate was .0034. So let's plug Babe Ruth in here. Babe hit 54 homeruns out of 378 contacts, a rate of .143. Plugging that into the odds ratio and we get .352. Let's say his 80 strikeouts turn into 200, so he only makes contact 258 times. That's still 91 homeruns. It's not as bad as "Ruth out-homered all the AL teams in 1920, so put him in 2019 and he'd hit at least 308, or one more than the Minnesota Twins". But it's not good, so I made some edits. The league leading homerun totals have not changed all that much over the last 100 years. Some years are easier to hit homers than others. 1987 was a famous outlier year. Beginning in mid-season 2015, homerun rates changed dramatically, with a lot of attention paid to changes in the baseball. I ended up using an edited rate of .0065 for 1920. The edited rates end up not showing any surprise high home run seasons. Translating every batter into 2019, Johnny Mize ends up with 70, a nice boost from the 51 he hit in 1947, but the other 3 in the top 4 are Bonds, McGwire, and Sosa.

For the other rates, I generally used a smoothing process since year to year jumps can be quite volatile. For example, the rate I used for 2010 is:
2010 = (2008 + 2009*2 + 2010*3 + 2011*2 + 2012)/9.

To be continued - the results

It will take me some time, but I'll post some results. If you leave a request in the thread I'll add to the Bill James Reader Posts, I may run it. I have some other ideas, but really I've opened up so many possibilities to post that It's hard to know where to focus. I will tell you that no matter how I do it, Bonds is the best hitter to ever play the game. That's taking his stats as recorded. I don't have a method to take out the steroids.

Some ideas I have:

  • Ruth vs Williams
  • Mantle vs DiMaggio
  • Ripken vs Wagner
  • Could Rabbit have played at all today?
  • Then how could Ty Cobb be the second best offensive player ever?



This page was last modified 2/05/2021


All content Copyright © 2008-2021 by Sean Smith.