Find the probability that the 2nd and 3rd order statistics are compared in the QuickSelect algorithm

algorithmsprobabilityrecursionrecursive algorithms

A description of QuickSelect: In the selection problem, we have a list of numbers and want to find the ith order statistic. That's the ith smallest value, which is the value such that i-1 other elements of the list are smaller. In the QuickSelect algorithm we pick one at random. We then use this to compare it with all the other numbers, sorting them into two groups: Those smaller, and those larger. We count up the smaller group and if the size of this group is i-1 then we terminate the algorithm and return this random element.

Otherwise we have two cases, either the smaller element set has more or less than i-1 elements. In the former case we simply repeat the procedure on the set of smaller numbers.

In the latter case we know the order statistic we want is in the set of larger numbers. Suppose the random element we selected has k elements less than or equal to it. We are therefore throwing away k elements in order to continue the search only in the set of larger numbers. Therefore we are seeking, within that set, its (n-k)th order statistic. So with those parameters, we repeat the search procedure on the set of larger elements.

For example if the list of numbers were [56,4,32,11,3,4,5,76,19] and we want the 8th order statistic. That means we want a number such that 7 other numbers are smaller. We can see that this is the number 56, but we'll follow the algorithm to watch it work. We would pick a random index, say 3 which corresponds to the number 32. We partition the list into two smaller lists based on their comparison with 32 so we get

[4,11,3,4,5,19]

and

[56,76]

Since the size of the smaller set is 6, we know that we need to look to the set of larger numbers. So we look for the (8-7)th = 1st order statistic of [56,76]. That is the value with 0 elements smaller than it. If we picked index 1 at random, we'd partition the list into the set of smaller numbers

[]

and the larger

[76]

And since the set of smaller elements has size 0, this is the value we're seeking, so we terminate the algorithm and return the answer 56.


My question: What is the probability that, in a list of length n, the 2nd and 3rd elements are compared at some point?


My attempts: Two elements are compared at the initial stage of the algorithm if and only if one of them is the randomly selected value. After that, they are compared in the next recursive call to the algorithm if: They were not randomly selected, and the ith order statistic was also not selected (otherwise the algorithm would terminate before making the recursive call), and: Either one is chosen as the random element in the recursive call or they are compared at some later stage.

This suggests to me a recursive relation to define the probability on a list of size n.

$$P(n) = \frac 2 n + \frac{n-3}{n}(…???…)$$

The problem in expressing the recursive part is that we don't know the size of the subproblem. That depends on which number was randomly selected, and whether the order statistic we're seeking is bigger or smaller than that.

And of course, even if I could figure this part out I'm not sure how I'd solve the recursive relation for the probability, since this seems quite complicated.

Best Answer

The second and third element are compared exactly if one of them is chosen before an element between them and the $i$-th element (including the latter) is chosen. If $i=1$, the probability for this to happen is $\frac23$; if $i\gt3$ the probability is $\frac2{i-1}$. If $i=2$ or $i=3$, the second and third element are compared with probability $1$.