– The K Zone –

Exploring the Crossover Effect, by Ian Joffe

February 25th, 2019

mlbf_1406029883_th_45.jpg

It is a well documented fact that Joey Votto is one of my favorite baseball players. I wrote my very first opinion article about how good he really was, and I have drafted him in fantasy baseball for several years in a row. However, this year my seemingly everlasting love for the Red’s first baseman hit a snag. Votto’s home run power plummeted in 2018 to 12 total bombs, his lowest full-season total ever, yet he did that despite maintaining his regularly high average exit velocity (88.1 mph) and launch angle (13.3 degrees). His line drive rate (31.4%) also remained exceptional. My first thought was that Votto was having a lot of near misses, balls were hit hard but died on the warning track. But, the Statcast data contested that theory too, as his barrel rate of only 6.7% matched his low home run total.

So, Votto had his normal high average exit velocity and strong launch angle, yet he was rarely getting barrels, which is defined as combination of the two. My theory became that he was still hitting balls hard and still hitting balls high, but in 2018 those types of hits did not coincide on the same at-bats. He had a lot of soft flyouts, and a lot of hard groundouts, but few well-hit balls angled for the stands. At first thought, one would think those two events — hitting balls hard, and hitting balls high — are independent. In other words, doing one does not make the other more likely on any specified at-bat. If this were the case, then Votto would be a victim of bad luck. One could expect his hard hits to coincide with his high hits at a normal rate again next season, and we can imagine 2018’s lack of intertwined hard and high hits like a low BABIP, where it will regress towards a mean. However, it is also possible that the two events are dependent, and that certain types of players are better at doing both at once than others. In that case, it is possible that Votto has experienced a legitimate decline in his skill level.

To test whether the events were independent or not, I examined data from 332 hitters that had at least 150 balls in play in 2018. The goal was to examine how often their hard hits and high hits actually coincided, versus how often they should have coincided, and to test whether those numbers differed by a reasonable margin. For this study, I looked at a statistic that I am calling crossover (CR), which is defined by a baseball hit with at least 99 mph of exit velocity and at least 22 degrees of launch angle. It’s similar to barrels, but a little less complicated. Barrels did not work for my purpose because their required launch angle differs based on exit velocity. The numbers 99 and 22 are admittedly somewhat arbitrary, but were decided upon by looking at where distribution of home runs started to accelerate. Crossover rate, or CR%, is defined as crossovers divided by crossover opportunities. A crossover opportunity, in turn, is the sum of a players hard hit balls and high hit balls, minus crossovers (so that crossovers only count for one at bat). The league average CR% was 13.1%, and Joey Gallo led the league with a 43.7%, although that number is over 10 points higher than the next best, which is Tyler Austin at 32.5%. From there, a right-skewed distribution starts:

dist

Next, I made a formula to determine the expected crossover rate of every player based on their hard hit rate and high hit rate. A player’s total expected crossovers (xCR) is the product of his hard hit rate and his high hit rate, times his number of ball in play. To find expected crossover rate (xCR%), put xCR over the sum of hits and high hits minus xCR, like with experimental CR%. My final statistic was CRd, or crossover differential. CRd is defined as CR% minus xCR%, times 100 (to make it more readable). A positive CRd indicates that a player had more crossovers than expected, and a lower, negative CRd indicates that a player had fewer crossovers than expected. A CRd of 0.0 means that the player’s crossover rate is the same as the expected number. Here is the distribution of CRd:

dist2

Interestingly, the league average value was -2.3. The league leader in CRd was, once again, Joey Gallo with an astronomical 18.1, with Tyler Austin next at 10.5. After Austin came a new name, Matt Joyce, at 10.4. At the bottom of the charts was Yuli Gurriel, at -13.0, followed by Jose Bautista at -12.6.

The next step was to determine if having a higher CRd than expected meant that a player was lucky, or meant that a player was skilled. To do this, I analyzed how consistent CRd was between two halves. If a player’s first half CRd was predictive of the second half, it could be legitimate skill. If it was not, CRd is due to luck. Here is the scatter plot comparing the two halves for players who had sufficient balls in play in each:

scatter

From that plot, is looks like there is a very significant correlation between crossover differentials in each half. The statistics would back your eye test up, as the graph produces a resounding r value of 0.51 and a P-Value just over 10^-12, meaning the probability of crossover differential being entirely luck is, for all intents and purposes, zero. In fact, this makes crossover rate seem like even more of a controllable, intentional skill than the extreme peripheral of hard hit rate itself, which has an r value of 0.33.

If half-to-half correlation is strong, I would expect the year-to-year correlation to be even stronger, due to the larger sample. My assumption was correct:

scatter2

This chart churned out a correlation coefficient of 0.61 and another near-zero P-value. Interestingly, there was a lot less variation in 2017 than 2018, and no extreme upper outliers. I can’t explain exactly why that is, but I can confirm that players like Gallo, who led the league in 2018, also did so in 2017, just with a lower overall number. To build on the case of the high stability of CRd, look at how close most players’ 2018 numbers were to their 2017:

Change in CRd (In Either Direction) Percent Frequency
0-1 24%
1-2 22%
2-3 14%
3-4 15%
4-5 9%
5+ 16%

 

Almost a quarter of individuals differ by less than one percentage point between two years of tracking this statistic. 60% of players will deviate in CRd by less than 3 percentage points between two seasons. That’s a very low deviation between years, especially compared to very volatile statistics like batting average. To be honest, these results are the opposite of what I expected. I thought that hard hits and high hits would be independent of one another, and that differentiation would be up to luck. I thought that the statistic would regress to a league average, not a career average. But, it appears that my initial hypothesis was wrong. CRd is a very stable peripheral that is grounded heavily in the skill to do two important things at the same time.

Let’s get back to my friend Joey Votto. My original expectation was that he was getting unlucky by having his hard hits and high hits fall on different at-bats. I was wrong for two reasons. First, crossover rate is not up to luck. Second, his 2018 CR% was actually higher than his xCR%, 16.6% to 12.2%, so, even if it were luck, that would not explain his drop in power. Instead, we have to look at Votto’s case through what we do know: that crossover differential is based in skill, meaning if a player keeps the same skills, they should keep a similar CRd. Votto dropped 3.37 points in CRd between 2017 and 2018, from 7.68 to 4.31. That’s puts him in the bottom 20% of the league in CRd, which is a convincing argument that he has legitimately down-skilled. Votto is still an incredibly valuable MLB and fantasy asset due to OBP alone, but he is 35 years old, and I sadly must admit that it’s possible we will never see his old power totals again. In fact, based on what I have found in this article, I would not bet that he will hit for power again.

I used this set of stats to analyze Joey Votto, but you could, of course, just as easily apply it to any player. For your convenience, I have taken all the statistics invented for this article and written them into the following Google Sheet files:
2018 Stats
2017 Stats
Stats Glossary

Enjoy!

 

If you liked this article, please follow The K Zone on Twitter and be the first to know when more original research, opinion, and interviews, come out!

 

Sources:
Baseball Savant
Fangraphs
Baseball Reference
MLB.com

Photographs Attributed to:
US Presswire
MLB.com

Advertisements