Probability of getting into the favorite PhD

order-statisticsprobability

Suppose there are $n$ potential grad students and $n$ Universities. Each University has one scholarship to do a PhD. Each student has a strict ranking of Universities so that each University is comparable and she is not indifferent between any two Universities. The preferences of each student are independent and generated uniformly at random.

Students are ordered according to their height (or some other irrelevant attribute) and can choose their favourite University to study in based on this order. Universities accept the first student who applies to them and their decision is final and irrevocable.

My first question is: for the $k$ tallest applicant, what is the probability that they end up in their $j$-th best University? I'm thinking for the k-th tallest applicant, the chance of getting into her top choice is $\frac{n+1-k}{n}$.

Second question: Suppose now that, if a student gets her first choice, she completes her PhD with probability 1, if she gets her second choice, she does with probability 1/2 and so on, so in general, if a student gets her k choice, she finishes her PhD with probability $\frac{1}{2^{k-1}}$. What is the expected number of students that fail to complete their PhD? Simulations tell me it is $0.38 n$, but I would like to understand why.

Best Answer

Using the same notation as OskarSzarowicz's answer, let $p_{k,j}$ denote the probability that $k$-th tallest student gets to their $j$-th most preferred university (both $k$ and $j$ start at $1$). This probability can be expressed in a closed form by realizing that:

  • The previous $(k-1)$ students have "used up" $(k-1)$ universities in total and thus in order for the current student to be only accepted by their $j$-th choice, each of the first $(j-1)$ universities had to come this pool of $(k-1)$ universities. That can happen in $\binom{k-1}{j-1}(j-1)!$ ways.
  • The next university in the list must be a non-eliminated one, which means $(n+1-k)$ choices.
  • The remaining $(n-j)$ on their list can be chosen arbitrarily; giving us $(n-j)!$ possibilities.

Putting it all together and dividing by the total number of possible preference lists ($n!$) yields:

$$p_{k,j}=\frac{1}{n!}\binom{k-1}{j-1}(j-1)!(n+1-k)(n-j)!$$

This rather unwieldy expression can be rearranged a bit by combining the factorials into another binomial coefficient and moving the extra $j$ into the other one which pulls out one extra $k$:

$$p_{k,j}=\frac{(n+1-k)}{k}\binom{k}{j}/\binom{n}{j}$$

While this expression might not look really simpler than the previous one, it will come handy when it comes to calculating the expected number of students who are going to fail. It will actually be easier to consider a slightly more general problem, where student's probability of succeeding at their $j$-th choice is equal to $x^{j-1}$ (so the original problem corresponds to $x=\frac{1}{2}$). The expected number of students succeeding is equal to the double sum $$S = \sum_{k=1}^n \sum_{j=1}^k p_{k,j}x^{j-1}$$

Exchanging the other of summation allows us to pull out some terms in front of the inner sum and keep only one binomial coefficient in it (one whose bottom part is independent of the summation variable): $$S=\sum_{j=1}^n \left(x^{j-1}/\binom{n}{j}\sum_{k=j}^n \frac{(n+1-k)}{k}\binom{k}{j}\right)$$

For the inner sum, we have $$I(n,j)=\sum_{k=j}^n \frac{(n+1-k)}{k}\binom{k}{j}= (n+1)\sum_{k=j}^n \frac{1}{k}\binom{k}{j} - \sum_{k=j}^n \binom{k}{j}$$

The first sum can be also written as $$\sum_{k=j}^n \frac{1}{k}\binom{k}{j} = \frac{1}{j}\sum_{k=j}^n \binom{k-1}{j-1}=\frac{1}{j}\sum_{k=j-1}^{n-1} \binom{k}{j-1}$$

Now we can use the combinatorial identity $\sum_{k=j}^n \binom{k}{j} = \binom{n+1}{j+1}$ to obtain $$I(n,j)=\frac{n+1}{j}\binom{n}{j} - \binom{n+1}{j+1} = (n+1)\left(\frac{1}{j} - \frac{1}{j+1}\right)\binom{n}{j}$$

Returning back to our expression for $S$ and cancelling out the binomial coefficients, we get

$$S=(n+1)\sum_{j=1}^n \left(\frac{1}{j} - \frac{1}{j+1}\right)x^{j-1}$$

or, moving the $(n+1)$ factor to the other side and rearranging the terms a bit:

$$\frac{S}{n+1}=\frac{1}{x} - \frac{x^{n-1}}{n+1} + \frac{x-1}{x^2}\sum_{j=1}^n \frac{x^{j}}{j}$$

Unfortunately, this is the point where we might have ran out of luck when it comes to exact calculations for fixed $n$; the latest sum does not have a closed-form expression (at least as far as I know). It is a truncated form of the Taylor expansion of $-\log(1-x)$, though, so if we are only interested in the asymptotic behaviour, we can get the following result:

$$S^* = \lim_{n\to\infty}\frac{S}{n+1} = \frac{1}{x} + \frac{1-x}{x^2}\ln(1-x)$$

Note that this limit would be the same if we divided by $n$ rather than $(n+1)$ since we are going to infinity. Thus, the fraction of students who fail their tests tends to $(1-S^*)$. For the specific choice of $x=\frac{1}{2}$, we get $1-S^*=\ln{4}-1\approx 0.386294$.

Related Question