Probability – Solving QR Interview Problems with Draws from iid U(0,1) Distribution

intervieworder-statisticsprobabilityuniform distribution

This is for QR at two well know trading firms (think jane street, HRT, Citadel, Jump …)(not BB bank).

Question prompt:

Given n iid Uniform distributed r.v.s. $x_i$ ~ U(0,1). $x_1$ is drawn first, then $x_2$, then $x_3$

Let $r_1$, $r_2$, $r_3$,…, $r_n$ be the order of $x_1$, $x_2$, $x_3$, …, $x_n$ after all n draws have been made.

You must guess $r_i$ when $x_i$ is drawn (so basically you need to guess $r_i$ when you have only seen $x_1$ ~ $x_{i-1}$ and not $x_{i+1}$ ~$x_n$)

If you correctly guessed all rns correctly, you win the game. What is your best strategy and what's the probability of you winning with this strategy?

My attempt:

When n = 2,

if $x_1$ < 0.5 : guess $r_1$ = 1

else: guess $r_1$ = 2
enter image description here

=> probability of winning = area shaded = 3/4

However, for n = 3, I'm kinda stuck. I've tried using the Expected number of draws that are less than the current $x_i$ to help make the guess. The way to calculate that is $x_i$*(n-1) + number of $x_k$ < $x_i$ for k = 1 ~ i-1. If this expected number is 2, we can make our guess that $r_i$ = 2 + 1 = 3. But!!! this expected number is usually not an integer. I thought about rounding the expected number but couldn't really wrap my head around why we can use this method. Also, this is kinda hard to generalize to larger n. Any thoughts on how to approach this problem? My gut feeling tells me that there must be an easier way to deal with this, such as the geometric approach I used for n = 2.

Thanks in advance!

Best Answer

If the question was just about guessing the position of the first-seen value $x_1$ among the $n$ then it would not be too difficult:

  • you would guess $r_1$ to be the value $\hat r_1$ which maximises the probability that $\hat r_1-1$ of the other values are less than $x_1$, i.e. that maximises ${n-1 \choose \hat r_1-1}x_1^{\hat r_1-1}(1-x_1)^{n-1 -(\hat r_1-1)}$.
  • This gives $\hat r_1 = \lceil n\, x_1\rceil$.
  • So for example if $n=10$ and $x_1=0.234$ then you would guess $\hat r_1 =\lceil 10 \times\, 0.234 \rceil = 3$ and this would turn out to be correct with probability ${9 \choose 2}0.234^{2}0.766^{7}\approx 0.305$.
  • For $n=2$ this gives essentially the same strategy as you identified: guess $\lceil 2x_1\rceil$ which is $1$ when $0<x_1 \le 0.5$ and is $2$ when $0.5 < x_1 \le 1$. Clearly you can be certain about $r_2$ having seen $x_1$ and $x_2$ so the overall probability of success is $\int\limits_0^{1/2} {1 \choose 0}x^0(1-x)^1dx + \int\limits_{1/2}^1 {1 \choose 1}x^1(1-x)^0dx = \frac34$, as you found.

This approach should work for individual later guesses too. So when you see $x_k$ and know there are already $j_k$ observations less than $x_k$, your best guess would be to let $\hat r_k = j_k +\lceil (n-k+1)\, x_k\rceil$ and you would be correct with probability ${n-k \choose \hat r_k-j_k-1}x_k^{\hat r_k-j_k-1}(1-x_k)^{n-k -(\hat r_k-j_k-1)}$.

It seems a naive though possibly reasonable thought that making the individual best guess at each stage may also be the best combined strategy for the $n$ guesses. Actually calculating the probability of overall success looks difficult, and simulation might be faster and less prone to error.

Related Question