[Math] Baseball, batting average, and probability

probabilitystatistics

A baseball player's batting average is equivalent to the probability he will get a hit for any given at-bat (at-bats don't include Errors, Walks, or HBP and a few other exceptions). So for a specific player with a specific batting average, the probability that he will get a hit against an unknown pitcher is exactly equivalent to his batting average.

Similar to the AVG statistic for hitters, pitchers have a statistic called Batting Average Against (BAA). This statistic is calculated in the exact same way as hitters except it's done for a pitcher. It's equivalent to Hits divided by At-Bats of opposing batmen (na na na na). So for a specific pitcher with a specific BAA, the probability that he will allow a hit against an unknown batter is exactly equivalent to his batting average against.

Intuitively, it seems obvious to me that a batter, no matter his personal batting average, is more likely to get a hit against a pitcher with a high BAA, and less likely to get a hit against a pitcher with a low BAA. Additionally, a pitcher, no matter his own BAA, is more likely to allow a hit when facing a batter with a high AVG than when facing a batter with a low AVG.

So the question is, given a specific batter with a specific AVG and a specific pitcher with a specific BAA, how do we calculate the probability that that specific batter will earn a Hit against that specific pitcher?

EDIT: It's fair to assume we're talking about MLB, and we have an overwhelming wealth of extra information. Assume we're talking about a batter with thousands of at-bats recorded, a pitcher with thousands of batters faced, and we know all the information about the average league AVG, average league BAA, etc., but this specific batter and this specific pitcher have never faced each other. How would we calculate the probability of a Hit?

EDIT2: Let's not get bogged down with vsLHP, vsRHP, RISP, and other statistics. These are merely statistics that can be used to give a more accurate probability. The method for calculating the probability should remain basically the same. Let's just suppose we have Batter A who has an AVG of .300, and the average BAA of the pitchers he's faced (weighted to account for facing some pitchers more frequently etc) is .250. And we have Pitcher B who has a BAA of .225, and the average AVG of the batters he's faced (again, weighted) is .250. Batter A has never faced Pitcher B before, but both have thousands of At-Bats/Batters-Faced. How do we calculate the probability of a Hit versus an Out?

Best Answer

Basically you can't, unless you know a lot more information. For example, suppose we have three batters $a,b,c$ with averages $0.75,0.5,0.25$ and three pitchers $A,B,C$ with BAAs $0.75, 0.5, 0.25$. If we multiply by $600$ at bats by each, $200$ for each pitcher/batter combination you are filling in a matrix $$\begin {array} {c|c|c|c|}&A&B&C&Total\\ \hline \\a&&&&450\\ \hline \\b&&&&300\\ \hline \\c&&&&150\\ \hline \\&450&300&150 \end{array}$$ You have six equations in nine unknowns, so lots of freedom. Each cell can contain any number from $0$ through $200$

Added: compare these two sets of hits: in each case, each pitcher pitches to each batter 200 times. The batting averages and BAAs are the same.

$$\begin {array} {c|c|c|c|}&A&B&C&Total\\ \hline \\a&200&150&100&450\\ \hline \\b&150&100&50&300\\ \hline \\c&100&50&0&150\\ \hline \\&450&300&150 \end{array}$$

$$\begin {array} {c|c|c|c|}&A&B&C&Total\\ \hline \\a&200&200&50&450\\ \hline \\b&200&100&0&300\\ \hline \\c&50&0&100&150\\ \hline \\&450&300&150 \end{array}$$

The first approximates the intuitive view you seem to have. The second supports the same data, and the worst batter is hitting $0.500$ against the best pitcher.

Related Question