Solved – If a tennis match was a single large set, how many games would give the same accuracy

gamesinferenceprobability

Tennis has a peculiar three tier scoring system, and I wonder if this has any statistical benefit, from the point of view of a match as an experiment to determine the better player.

For those unfamiliar, in normal rules a game is won by the first to 4 points, so long as you have a 2 point lead (i.e. if it is 4-2 you win, but 4-3 you need 1 more point, and keep going until one player is 2 ahead).

A set is then a collection of games, and a set is won by the first to 6, again having to win by 2, except this time a special tie-breaker game is played instead of carrying on (except final set of Wimbledon etc…)

The match is won by first to 2 or 3 sets depending on the competition.

Now, tennis is also odd in that games are unfair. For any given point the server has a huge advantage, therefore each game the server alternates.

In a tie-breaker game the serve alternates after every point, and it is the first to 7 points, again with a 2 point lead.

Lets assume that player A has a probability of winning the point on their serve of $p_s$ and when receiving $p_r$.

The question is this, suppose we

A) just played tennis as a big "best of N games" match, how many games would give the same accuracy as normal best of 5 sets tennis

B) just played tennis as a big tiebreaker game, how many points would give the same accuracy as normal best of 5 sets tennis?

Obviously these answers will depend upon the $p_s$ and $p_r$ values themselves, so it would also be good to know

C) What is the expected number of games & points played in normal tennis, assuming constant $p_s$, $p_r$


Defining "Accuracy"

If we assume that the skill of both players stays constant, then if they played for an infinite length of time, then one or other player would win almost surely, regardless of the format of play. This player is the "correct" winner. I'm pretty sure that the correct winner is the player for whom $p_r+p_s > 1$.

A better format of play, is one that produces the correct winner more often, for the same number of points played, or conversely produces the correct winner with equal probability in few points played.

Best Answer

If you play games to $4$ points, where you have to win by $2$, you can assume the players play 6 points. If no player wins by $2$, then the score is tied $3-3$, and then you play pairs of points until one player wins both. This means the the chance to win a game to $4$ points, when your chance to win each point is $p$, is

$$p^6 + 6p^5(1-p) + 15p^4(1-p)^2 + 20 p^3(1-p)^3 \frac{p^2}{p^2 + (1-p)^2}$$.

In top level men's play, $p$ might be about $0.65$ for the server. (It would be $0.66$ if men didn't ease off on the second serve.) According to this formula, the chance to hold serve is about $82.96\%$.

Suppose you are playing a tiebreaker to $7$ points. You can assume that the points are played in pairs where each player serves one of each pair. Who serves first doesn't matter. You can assume the players play $12$ points. If they are tied at that point, then they play pair until one player wins both of a pair, which means the conditional chance to win is $p_sp_r/(p_sp_r + (1-p_s)(1-p_r))$. If I calculate correctly, the chance to win a tiebreaker to $7$ points is

$$ 6 p_r^6 ps + 90 p_r^5 p_s^2 - 105 p_r^6 p_s^2 + 300 p_r^4 p_s^3 - 840 p_r^5 p_s^3 + 560 p_r^6 p_s^3 + 300 p_r^3 p_s^4 - 1575 p_r^4 p_s^4 + 2520 p_r^5 p_s^4 - 1260 p_r^6 p_s^4 + 90 p_r^2 p_s^5 - 840 p_r^3 p_s^5 + 2520 p_r^4 p_s^5 - 3024 p_r^5 p_s^5 + 1260 p_r^6 p_s^5 + 6 p_r p_s^6 - 105 p_r^2 p_s^6 + 560 p_r^3 p_s^6 - 1260 p_r^4 p_s^6 + 1260 p_r^5 p_s^6 - 462 p_r^6 p_s^6 + \frac{p_r p_s}{p_r p_s + (1-p_r)(1-p_s)}(p_r^6 + 36 p_r^5 p_s - 42 p_r^6 p_s + 225 p_r^4 p_s^2 - 630 p_r^5 p_s^2 + 420 p_r^6 p_s^2 + 400 p_r^3 p_s^3 - 2100 p_r^4 p_s^3 + 3360 p_r^5 p_s^3 - 1680 p_r^6 p_s^3 + 225 p_r^2 p_s^4 - 2100 p_r^3 p_s^4 + 6300 p_r^4 p_s^4 - 7560 p_r^5 p_s^4 + 3150 p_r^6 p_s^4 + 36 p_r p_s^5 - 630 p_r^2 p_s^5 + 3360 p_r^3 p_s^5 - 7560 p_r^4 p_s^5 + 7560 p_r^5 p_s^5 - 2772 p_r^6 p_s^5 + p_s^6 - 42 p_r p_s^6 + 420 p_r^2 p_s^6 - 1680 p_r^3 p_s^6 + 3150 p_r^4 p_s^6 - 2772 p_r^5 p_s^6 + 924 p_r^6 p_s^6)$$

If $p_s=0.65, p_r=0.36$ then the chance to win the tie-breaker is about $51.67\%$.

Next, consider a set. It doesn't matter who serves first, which is convenient because otherwise we would have to consider winning the set while having the serve next versys winning the set without keeping the serve. To win a set to $6$ games, you can imagine that $10$ games are played first. If the score is tied $5-5$ then play $2$ more games. If those do not determine the winner, then play a tie-breaker, or in the fifth set just repeat playing pairs of games. Let $p_h$ be the probability of holding serve, and let $p_b$ be the probability of breaking your opponent's serve, which may be calculated above from the probability to win a game. The chance to win a set without a tiebreak follows the same basic formula as the chance to win a tie-breaker, except that we are playing to $6$ games instead of to $7$ points, and we replace $p_s$ by $p_h$ and $p_r$ by $p_b$.

The conditional chance to win a fifth set (a set with no tie-breaker) with $p_s=0.65$ and $p_r=0.36$ is $53.59\%$.

The chance to win a set with a tie-breaker with $p_s=0.65$ and $p_r=0.36$ is $53.30\%$.

The chance to win a best of $5$ sets match, with no tie-breaker in the fifth set, with $p_s=0.65$ and $p_r=0.36$ is $56.28\%$.

So, for these win rates, how many games would there have to be in one set for it to have the same discriminatory power? With $p_s=0.65, p_r=0.36$, you win a set to $24$ games with the usual tiebreaker $56.22\%$, and you win a set to $25$ game with a tie-breaker possible $56.34\%$ of the time. With no tie-breaker, the chance to win a normal match is between sets of length $23$ and $24$. If you simply play one big tie-breaker, the chance to win a tie-breaker of length $113$ is $56.27\%$ and of length $114$ is $56.29\%$.

This suggests that playing one giant set is not more efficient than a best of 5 matches, but playing one giant tie-breaker would be more efficient, at least for closely matched competitors who have an advantage serving.


Here is an excerpt from my March 2013 GammonVillage column, "Game, Set, and Match." I considered coin flips with a fixed advantage ($51\%$) and asked whether it is more efficient to play one big match or a series of shorter matches:

... If a best of three is less efficient than a single long match, we might expect a best of five to be worse. You win a best of five $13$ point matches with probability $57.51\%$, very close to the chance to win a single match to $45$. The average number of matches in a best of five is $4.115$, so the average number of games is $4.115 \times 21.96 = 90.37$. Of course this is more than the maximum number of games possible in a match to $45$, and the average is $82.35$. It looks like a longer series of matches is even less efficient.

How about another level, a best of three series of best of three matches to $13$? Since each series would be like a match to $29$, this series of series would be like a best of three matches to $29$, only less efficient, and one long match would be better than that. So, one long match would be more efficient than a series of series.

What makes a series of matches less efficient than one long match? Consider these as statistical tests for collecting evidence to decide which player is stronger. In a best of three matches, you can lose a series with scores of $13-7 ~~ 12-13 ~~ 11-13$. This means you would win $36$ games to your opponent's $33$, but your opponent would win the series. If you toss a coin and get $36$ heads and $33$ tails, you have evidence that heads is more likely than tails, not that tails is more likely than heads. So, a best of three matches is inefficient because it wastes information. A series of matches requires more data on average because it sometimes awards victory to the player who has won fewer games.