This article is part of our The Z Files series.
The break is over; it's time to get back to work. There's no better way to start the new year than with some math, specifically correlation studies. Today, we're going to look at several pitching metrics to discern what's in and out of a pitcher's control.
Methodology
Correlation is one of my favorite analytical devices for its combination of power and simplicity. It looks at two sets of data and determines how closely they are aligned. If the relationship for each pair of data points is the same throughout the sample, the correlation is 1. If the relationship is random, the correlation is 0. If the relationship is inversely proportional, it is -1.
The objective of this study is determining the aspects of performance for which the pitcher exerts the most and least control. Those influenced most will display a higher correlation over the sample while those being more happenstances will be lower.
Metrics for which the pitcher has the most effect will be closer from year to year. Those left to chance will have no connection in consecutive seasons.
The study will investigate metrics over the last three seasons. The data set consists of pitchers tossing at least 50 innings in all three years.
Those well versed in statistical analysis can read more into the correlation coefficients. I'm more concerned with which are highest, and which are lowest. The exact level of correlation isn't relevant for the scope of this project. To help frame the results, the high end
The break is over; it's time to get back to work. There's no better way to start the new year than with some math, specifically correlation studies. Today, we're going to look at several pitching metrics to discern what's in and out of a pitcher's control.
Methodology
Correlation is one of my favorite analytical devices for its combination of power and simplicity. It looks at two sets of data and determines how closely they are aligned. If the relationship for each pair of data points is the same throughout the sample, the correlation is 1. If the relationship is random, the correlation is 0. If the relationship is inversely proportional, it is -1.
The objective of this study is determining the aspects of performance for which the pitcher exerts the most and least control. Those influenced most will display a higher correlation over the sample while those being more happenstances will be lower.
Metrics for which the pitcher has the most effect will be closer from year to year. Those left to chance will have no connection in consecutive seasons.
The study will investigate metrics over the last three seasons. The data set consists of pitchers tossing at least 50 innings in all three years.
Those well versed in statistical analysis can read more into the correlation coefficients. I'm more concerned with which are highest, and which are lowest. The exact level of correlation isn't relevant for the scope of this project. To help frame the results, the high end are in the .75 to .80 range while the low end is between .00 and .15.
The following metrics will be analyzed. They were all collected on Fangraphs. If the stat or formula isn't common knowledge, the source is provided.
- IP
- ERA
- WHIP
- K%
- BB%
- K-BB%
- Line Drive%
- Groundball%
- Flyball%
- GB/FB
- Home Run per Flyball
- Batting Average on Balls In Play
- Left On Base Rate
- Average Exit Velocity (Statcast)
- Barrel% (SC)
- HardHit% (SC)
- Soft% (Baseball Info Solutions)
- Medium% (BIS)
- Hard% (BIS)
IP, ERA and WHIP
Comparison | IP | ERA | WHIP |
---|---|---|---|
2021 to 2022 | 0.70 | 0.21 | 0.31 |
2022 to 2023 | 0.59 | 0.20 | 0.18 |
2021 to 2023 | 0.60 | 0.16 | 0.19 |
Innings was included out of curiosity, but it also gives a perspective. The more relevant takeaway is the surface ratios exhibit a lot of variance from year to year. I'm surprised WHIP isn't a bit more predictable than ERA.
K%, BB% and K-BB%
Comparison | K% | BB% | K-BB% |
---|---|---|---|
2021 to 2022 | 0.70 | 0.66 | 0.61 |
2022 to 2023 | 0.69 | 0.45 | 0.51 |
2021 to 2023 | 0.63 | 0.44 | 0.46 |
Strikeouts and walks are considered to be a pitcher's basal skills, so it's not surprising their correlations are on the high end of the spectrum. Straying from 1.00 is more than variance; it can also represent improving or declining skills. Even so, in terms of this study, the higher the correlation, the more impact the pitcher exerts on the result.
Line Drive%, Groundball%, Flyball% and GB/FB
Comparison | LD% | GB% | FB% | GB/FB |
---|---|---|---|---|
2021 to 2022 | 0.30 | 0.75 | 0.74 | 0.81 |
2022 to 2023 | 0.04 | 0.76 | 0.74 | 0.78 |
2021 to 2023 | 0.11 | 0.65 | 0.65 | 0.66 |
This set of data was one of the impetuses for this research. A piece dissecting BABIP is in the hopper. It uses component BABIP to generate an xBABIP. In order to apply the concept to projections, a pitcher's LD%, GB% and FB% needs to be projected. Based on this data, a pitcher has greater influence on his GB% and FB%. It can be argued that the projected LD% should be regressed to the league mean while the GB% and FB% can be based on the pitcher's history.
Home Run per Flyball, Batting Average on Balls In Play and Left On Base Rate
Comparison | HR/FB | BABIP | LOB% |
---|---|---|---|
2021 to 2022 | 0.13 | 0.09 | 0.05 |
2022 to 2023 | -0.01 | 0.05 | 0.15 |
2021 to 2023 | 0.06 | 0.17 | 0.03 |
These are the metrics commonly known to be the most random, so it was a relief that the correlations agreed, if only to validate the crude yet effective approach. It is known better pitchers can carry a left on base mark around 78 percent while the league average is about 72 percent, so pitchers have some responsibility for their level. Reliever BABIPs are often well under league average -- as will be demonstrated next time, that is a product of their batted ball distribution. Still, the primary takeaway is pitchers have significantly more control over the quality of the pitch than the outcome. And yet, awards are still based on outcomes. But I digress.
Barrel%, Average Exit Velocity and HardHit%
Comparison | Barrel% | EV | HardHit% |
---|---|---|---|
2021 to 2022 | 0.33 | 0.51 | 0.35 |
2022 to 2023 | 0.39 | 0.55 | 0.39 |
2021 to 2023 | 0.18 | 0.44 | 0.30 |
These Statcast metrics are now conveniently available on Fangraphs. For those unaware of the formal definitions:
- Barrel: Batted-ball event whose comparable hit types (in terms of exit velocity and launch angle) have led to a minimum .500 batting average and 1.500 slugging percentage
- Barrel%: Barrels per batted ball event
- HardHit%: Percentage of contact with a minimum exit velocity of 95 mph
Barrel% is a leading indicator of home runs, while HardHit% correlates well with BABIP.
HR/FB studied above only has FB in its denominator, while Barrel% incorporates all batted ball types. This is why a pitcher has more control over Barrel%, since he has a greater influence on the batted ball types.
HardHit% can possibly help explain why a pitcher can sport a BABIP above or below that which is expected. The data suggests the pitcher has some control here, which corroborates the BABIP data shown earlier.
HardHit% is more relevant than average exit velocity, since 95 mph and above is when exit velocity and BABIP are most closely correlated.
Soft%, Medium% and Hard%
Comparison | Soft% | Med% | Hard% |
---|---|---|---|
2021 to 2022 | 0.31 | 0.07 | 0.30 |
2022 to 2023 | 0.39 | 0.13 | 0.32 |
2021 to 2023 | 0.26 | 0.01 | 0.18 |
These are the batted ball distributions that have been available on Fangraphs for years. I'm sure the process has changed since I used to be part of the team taping games and sending then into the Baseball Info Solutions office for employees to watch and subjectively gauge batted ball type, The ratios are often different from the Statcast data. There really isn't a lot to glean from this, but it's data many of us have relied upon for years (pre-Statcast) so I thought I'd include it.
Actually, there is one observation. Medium contact produces the lowest BABIP. A medium hit grounder can be fielded with only the fastest runners able to beat it out. A medium hit fly is the old can of corn. Hard contact clearly generates the highest BABIP. However, perhaps counter-intuitively, softly hit contact results in a BABIP a bit higher then medium. Batters can more frequently beat out a soft groundball, while soft flyballs more often fall in front of outfielders. The fact that medium struck balls are barely in a pitcher's control reinforces the perceived lack of control over BABIP. If inducing medium hit contact was a skill, it could explain carrying a lower-than-expected BABIP.
Conclusions
A major application of this research will be discussed in the BABIP piece teased earlier. To be honest, there aren't any major revelations, but the data helps debunk many of the notions that are presented but not often backed by evidence. Many assert pitcher skill fuels BABIP, LOB% and HR/FB. OK, maybe, but nowhere near the extent many intuit.
Similarly, the extent to which a pitcher can induce weak contact is overblown. We want it to be true, but the data suggests otherwise.
On the other hand, strikeouts and walks are very much skills. I know, duh, but keeping that in mind can help resist drafting a pitcher because he "always seems to post a low ERA and WHIP." Usually, they do... until they don't. When one in eight pitchers overperforms his xERA for three straight years, it isn't luck; it's probability.