Probability – Solving QR Interview Problems with Draws from iid U(0,1) Distribution

intervieworder-statisticsprobabilityuniform distribution

This is for QR at two well know trading firms (think jane street, HRT, Citadel, Jump …)(not BB bank).

Question prompt:

Given n iid Uniform distributed r.v.s. $x_i$ ~ U(0,1). $x_1$ is drawn first, then $x_2$, then $x_3$…

Let $r_1$, $r_2$, $r_3$,…, $r_n$ be the order of $x_1$, $x_2$, $x_3$, …, $x_n$ after all n draws have been made.

You must guess $r_i$ when $x_i$ is drawn (so basically you need to guess $r_i$ when you have only seen $x_1$ ~ $x_{i-1}$ and not $x_{i+1}$ ~$x_n$)

If you correctly guessed all rns correctly, you win the game. What is your best strategy and what's the probability of you winning with this strategy?

My attempt:

When n = 2,

if $x_1$ < 0.5 : guess $r_1$ = 1

else: guess $r_1$ = 2

=> probability of winning = area shaded = 3/4

However, for n = 3, I'm kinda stuck. I've tried using the Expected number of draws that are less than the current $x_i$ to help make the guess. The way to calculate that is $x_i$*(n-1) + number of $x_k$ < $x_i$ for k = 1 ~ i-1. If this expected number is 2, we can make our guess that $r_i$ = 2 + 1 = 3. But!!! this expected number is usually not an integer. I thought about rounding the expected number but couldn't really wrap my head around why we can use this method. Also, this is kinda hard to generalize to larger n. Any thoughts on how to approach this problem? My gut feeling tells me that there must be an easier way to deal with this, such as the geometric approach I used for n = 2.

Thanks in advance!

Best Answer

If the question was just about guessing the position of the first-seen value $x_1$ among the $n$ then it would not be too difficult:

you would guess $r_1$ to be the value $\hat r_1$ which maximises the probability that $\hat r_1-1$ of the other values are less than $x_1$, i.e. that maximises ${n-1 \choose \hat r_1-1}x_1^{\hat r_1-1}(1-x_1)^{n-1 -(\hat r_1-1)}$.
This gives $\hat r_1 = \lceil n\, x_1\rceil$.
So for example if $n=10$ and $x_1=0.234$ then you would guess $\hat r_1 =\lceil 10 \times\, 0.234 \rceil = 3$ and this would turn out to be correct with probability ${9 \choose 2}0.234^{2}0.766^{7}\approx 0.305$.
For $n=2$ this gives essentially the same strategy as you identified: guess $\lceil 2x_1\rceil$ which is $1$ when $0<x_1 \le 0.5$ and is $2$ when $0.5 < x_1 \le 1$. Clearly you can be certain about $r_2$ having seen $x_1$ and $x_2$ so the overall probability of success is $\int\limits_0^{1/2} {1 \choose 0}x^0(1-x)^1dx + \int\limits_{1/2}^1 {1 \choose 1}x^1(1-x)^0dx = \frac34$, as you found.

This approach should work for individual later guesses too. So when you see $x_k$ and know there are already $j_k$ observations less than $x_k$, your best guess would be to let $\hat r_k = j_k +\lceil (n-k+1)\, x_k\rceil$ and you would be correct with probability ${n-k \choose \hat r_k-j_k-1}x_k^{\hat r_k-j_k-1}(1-x_k)^{n-k -(\hat r_k-j_k-1)}$.

It seems a naive though possibly reasonable thought that making the individual best guess at each stage may also be the best combined strategy for the $n$ guesses. Actually calculating the probability of overall success looks difficult, and simulation might be faster and less prone to error.

Related Solutions

Solved – Why is Expected value of a random variable equal to the mean

In my comment, I said that an observation $X_i$ inherits all of the probability properties of the population from which it was sampled. Of course, no single observation can exhibit all of these properties by itself, but if we take a large sample from a population, we can infer much of the probability information that's in the population.

In particular, if we take the sample mean (average) $\bar X$ of all of the elements of a large sample, then $\bar X$ will be near $\mu.$ Because $Var(\bar X) = \sigma^2/n,$ we know that the variability of $\bar X$ will be small, giving an idea how near the sample mean $\bar X$ will actually be from the population mean $\mu.$

If we look at the population of points randomly placed in the interval $(0,1),$ then the population has the distribution $\mathsf{Unif}(0,1)$ with population mean $\mu=1/2$ and population variance $\sigma^2 = 1/12.$ Also about 25% of the points will lie between $3/4$ and $1.$

As an experiment, I will use R to take a sample of $n = 10,000$ values from this distribution. Then let's see what the mean of that large sample is, and what proportion of the points in the sample actually do lie between $3/4$ and $1.$

x = runif(10000)
mean(x)
[1] 0.5008642    # sample mean is very close to population mean 1/2
mean(x > 3/4 & x < 1)
[1] 0.248        # very nearly 25% of observations btw 3/4 and 1
var(x);  1/12
[1] 0.08267011   # sample variance; nearly the population variance 1/12
[1] 0.08333333   # ... exactly 1/12

We see that $\bar X = 0.500086,$ very near 1/2. Also that 24.8% of the sampled values lie in $(3/4, 1).$ (Showing how the variance of $\bar X$ works would require a messier simulation, which I will skip for now.)

A histogram of the 10,000 values is shown below, the position of $\bar X$ is indicated by the vertical black line near $1/2,$ and the vertical red lines have a about a quarter of the observations between them.

Probability – Proving Independence of Two Random Variables with F-Distribution

Your answer to (a) is correct.

For part (b), the two random variables are $F$-distributed by construction. We could prove that they're independent by establishing joint independence of $Y_1, Y_2, X_3$.

I believe you can do this via the joint moment-generating function. We have $$ M_{Y_1,Y_2,X_3}(s,t,u) = \mathbb{E}[\exp(sY_1+tY_2+uX_3)]=\mathbb{E}[\exp(uX_3) \mathbb{E}[\exp(sY_1+tY_2|X_3)] ] $$ We can drop the conditioning on $X_3$ in the inner expectation, as $ (Y_1, Y_2) $ are constructed from $(X_1,X_2)$ only. This gives $$ \mathbb{E}[\exp(sY_1+tY_2|X_3)]=\mathbb{E}[\exp(sY_1+tY_2)]=M_{Y_1,Y_2}(s,t)=M_{Y_1}(s)M_{Y_2}(t), $$ by independence of $Y_1,Y_2$.

Putting it all together, we have $$ M_{Y_1,Y_2,X_3}(s,t,u)=\mathbb{E}[\exp(uX_3) M_{Y_1}(s)M_{Y_2}(t)] =M_{Y_1}(s)M_{Y_2}(t)M_{X_3}(u) $$ and we have joint independence.

Best Answer

Related Solutions

Solved – Why is Expected value of a random variable equal to the mean

Probability – Proving Independence of Two Random Variables with F-Distribution

Related Question