With the ATP tour finals tennis tournament going on right now, I got to thinking about round robin tournaments. I learned upon searching that in graph theory the tournament models a round robin, in which each player plays every other player exactly once, and either wins or loses (no draws). But after some searching I was not able to find answers to the questions I posed in the title.
Suppose there are $n$ players in a tournament. The result of a tournament is described as a graph in which each pair of vertices is connected by exactly one directed edge, where an edge going from vertex $i$ to vertex $j$ indicates that player $i$ beat player $j$. Call $g$ the number of edges in the graph, which reflects the number of games played in the tournament. Of course, $g = n(n-1)/2$, the triangular number of $n-1$ (imagine the cells above the diagonal of a tournament crosstable). Clearly, there are $2^g$ possible results of a tournament.
Suppose every player in the tournament is equally skilled, so that every result is equally likely. For a given result, we can count the wins of each player and rank the win counts in a non-decreasing sequence $s = (s_1,\ldots,s_n)$ where $0 \leq s_1 \leq \cdots \leq s_n$. My first question is: what in general would be the probability distribution for $s$? Relatedly, what would be the expectation for each $s_i$?
To get an intuition for this last question, I simulated 10,000 instances of a round robin tournament of 100 equally skilled players. The plot below shows the average of $s$, where the x axis reflects the index $i$ and the y axis the number of wins $s_i$.
Now let the number of winners of the tournament be denoted $ w = |\{s_i : s_i = \max s\}|$. What is the probability that $w=1$, that is, that the tournament has an outright winner? In general, what is the probability distribution of $w$?
Using the data from the same simulation as above, I plotted the relative frequencies of $w$ values:
.
Does anyone know of any results that bear on these questions? When I thought of these questions, I figured the answers would be standard results, but after a moderate amount of looking I haven't come across them.
Best Answer
Here is a heuristic argument for the mean of the order statistics $s_1,s_2,\ldots,s_n$.
Let's define a random variable $X_{ij}$ as follows
$$X_{ij}=\begin{cases} 1 & \text{, if player $i$ beats player $j$,}\\ 0 & \text{, if player $i$ loses to player $j$.}\end{cases}$$
Then, if we assume all the players are equally skilled, the $X_{ij}\sim \text{Ber}(0.5)$, i.e. they are Bernoulli random variables with probability of success $0.5$. However, they are not independent. Indeed, we can more precisely state that
$$\begin{eqnarray}X_{ij}\sim\text{Ber}(0.5) & \;\; , & \;\; \text{ for } i<j \\ X_{ij}=1-X_{ji} & \;\; , &\;\; \text{ for } i>j \\ X_{ii}=0 & \;\; , &\;\; \text{ players don't play against themselves } \\ \end{eqnarray} $$
This defines an $n\times n$ matrix whose components are the individual match results. The rowsums corresponds to the total amount of wins, the colomn sums to the total amount of losses.
$$W_i=\sum_{j=1}^n X_{ij} \;\; \text{ and } \;\; L_j=\sum_{i=1}^n X_{ij}$$
Of course, the losses and wins of a player count up to the total amount of matches played
$$W_i+L_i=n-1 \; .$$
We are interested in the order statistics $s_1=W_{(1)},s_2=W_{(2)},\ldots,s_n=W_{(n)}$. $W_{(i)}$ is the i-th element of the ascendingly sorted vector of $(W_1,W_2,\ldots,W_n)$.
Solving this problem is a pretty hard nut to crack. So, I resorted to a heuristic argument that only covers the expectation of the order statistics. What is easy to find is the marginal distribution of a player $i$. These will all be the same and binomial: $W_i\sim \text{Bin}(n-1,0.5)$.
Now, I reasoned that since an individual player could rank anywhere in the hierarchy, the possible outcomes of $W_i$ encode precisely that. Moreover, one could interpret the quantiles of the distribution as follows: if player $i$ ends up to be the player that wins most matches, this means that he is somewhere in the $1/n$th upper quantile of the distribution. If he loses the most matches, he'll be in the $1/n$th lower quantile. Etc, for other quantiles. So, I reasoned that essentially, the function for the means of the $s_i$ is just the quantile distribution of an individual player's wins.
So I did a small simulation in R and found that this indeed gives good results. I simulated 10000 roundrobin tournaments of 100 players. (Don't mind the probably inefficient code.)
Then I plotted the mean of the $s_1,s_2,\ldots,s_n$ against the quantiles of the binomial (I added a shift of 0.5 continuity correction) and also the normal approximation of the binomial.
This is the result:
You almost don't see the difference between the normal approximation and the simulated result. (Although in the tails the difference will be larger)
Here's also the comparison between the simulated results and the binomial quantiles:
If you round the simulation results to the nearest integer, the difference is zero.
A more able statistician than me can probably make this rigorous and add error margins to this result. But I thought it was already neat enough to post this answer.
EDIT : I wanted to add some additional findings. For small values of $n$, the above approximations will be abysmally bad. But, in those cases, it is not to hard to compute by hand the distributions of $(s_1,s_2,\ldots,s_n)$.
Take the case $n=3$ for instance. There's really only two situations possible, being $(1,1,1)$ and $(0,1,2)$. By looking at the matrices I defined before, it's not difficult to see that you can realize the first situation in $1/4$ cases and the second in $3/4$ cases. From there you can compute means, variance, all you want.
Take the case $n=4$. Then you have 4 different possibilities: $(0,1,2,3)$ with probability $4!/2^6$, $(1,1,1,3)$ with probability $8/2^6$, $(0,2,2,2)$ with probability $8/2^6$ and finally $(1,1,2,2)$ with probability $4!/2^6$. You can check that all the realization are really partitions of the number $6=4(4-1)/2$ with the restriction that no element of the partition is larger than 3, plus additional restrictions. The mean values are easily computed to be $(0.5,1.125,1.875,2.5)$. And indeed the simulation agrees with this result.
I wouldn't be surprised if the problem can be recast into the theory of random partitions.
EDIT2 I wrote a little program that computes the exact probability distribution. But it's not very efficient memory or time wise. I had done the computation by hand for a 5 player tournament, and then I tried for a 6 player but decided it would be wiser to program the rules. The result is this
I tried to run it for $n=7$ (which is about 64 times more computations and more storage space needed than for $n=6$) but I did not have the patience to wait for the end result. So, Monte Carlo simulation is more efficient to determine round robin tournament results. Or my heuristic if you look at tournaments with very high numbers of participants.