Probability – Painting n Balls from 2n Balls Red and Guessing the Red Ball

game theorypr.probability

The game

Lucy has $2n$ distinct white colored balls numbered $1$ through $2n$. Lucy picks $n$ different balls in any way Lucy likes, and paint them red. Lucy then giftwrap all the balls so that it is impossible to tell its color without opening its wrapping. Lucy does this before offering any of the balls to Alice.

Alice would like to get one of the red balls. Alice doesn't know which balls are painted red. Lucy will offer Alice the balls one by one, starting from ball $1$, until ball $2n$. Upon being offered each ball, Alice may either take the ball or pass it. If Alice takes the ball, Alice open the wrapping. If it is red, Alice takes it and the game ends — otherwise, Alice will continue to be offered the remaining balls (unless all balls has been offered). If Alice pass the ball, the ball remains wrapped and thus Alice does not know the color of the ball, and Alice will continue to be offered the remaining balls. Note that it is possible for Alice not to get any red ball — for example, if Alice choose not to take any ball.

Alice's goal is to

  1. Obtain a red ball with probability at least $1 – 1/n$ (that is, it is okay for her not to get a red ball as long as this probability does not exceed $1/n$), and
  2. Minimize the expected number of balls Alice take.

Lucy's goal is the complete opposite of Alice, that is:

  1. If it is possible for her to make Alice unable to obtain a red ball with probability at least $1 – 1 / n$, she will do so.
  2. Otherwise, maximize the number of balls Alice take in average (that is, the expected number of balls Alice take).

Question

What is the optimal strategy for Alice? What is the expected number of
balls that Alice would take in the optimal case?

Example

When $n = 2$, Alice's goal is to obtain a ball with probability at least $1/2$. Here's an example strategy, which is provably optimal:

Decide which one of the four balls to open at random, and open that
and only that ball. Clearly the probability that the opened ball is
red is $1/2$, and the expected number of balls opened is $1$.

My work

Let $X$ be the expected number of balls Alice would take if both players play optimally.

Upper bound on X

I can proof a $O(\log n)$ upper bound on $X$ by letting Alice select $\log_2 n$ balls at random to open. The probability of Alice not finding any red ball is at most $2^{-\log_2 n} = 1/n$. Lucy would then paint the last $n$ balls red, and we have $X = O(\log n)$.

Lower bound on X

Edit: As domotorp pointed in chat, the proof in the pdf below for my lower bound is wrong because I wrongly use the assumption that E(x) and E(y) were independent. I don't see any immediate fix to the proof, so I will try to work on this again.

I proven a $\Omega(\log \log n)$ lower bound on $X$ in this pdf. I used a different terminology in the pdf: instead of having $2n$ balls, $n$ colored red, we have $2n$ bins, $n$ of which has ball inside.

Motivation

In the distributed systems settings, a rather common approach to guarantee that an algorithm is fault tolerant is by utilizing a bound on the number of possible failures. One of my supervisor's paper's algorithm works by utilizing a cheap algorithm that runs very fast if the number of failures is very small. To accommodate for larger number of failures, he decides to run this multiple times — if with high probability (at least $1 – 1/n$) one of these runs have very few errors, then this algorithm will be much faster than the previously known one. At the moment, the paper utilize the $O(\log n)$ upperbound, but we wondered whether this bound is optimal.

A very nice implication is that if a strategy with expectation lower than $O(\log n)$ is found for this game, then the algorithm's complexity in the paper will decrease correspondingly. However, I strongly believe that it is more probably to prove $\Omega(\log n)$ rather than $o(\log n)$.

Best Answer

Here is a sketch of an argument by my colleague Jim Roche of an $\Omega(\log n/\log \log n)$ lower bound. The basic idea is that Lucy chooses randomly between two strategies, one of which puts all the red balls early (thereby forcing Alice to open some number of the early boxes), and the other of which puts a lot of the red balls late (thereby forcing Alice to open some number of the late boxes).

Specifically, with probability 1/2, Lucy distributes the $n$ red balls randomly among the first $n(1 + 1/\log_2 n)$ boxes. With probability 1/2, she distributes $n/\log_2 n$ red balls randomly among the first $n(1 + 1/\log_2 n)$ boxes, and puts the remaining red balls at the very end. We now give a lower bound on how many boxes Alice expects to open under any randomized strategy that succeeds with probability at least $1-1/n$.

The argument that follows is slightly imprecise because it treats the probabilities that the boxes contain red balls as independent, whereas they are actually dependent since Lucy is constrained to paint exactly $n$ balls red. However, the error terms are negligible, and the argument is clearer if we ignore the dependencies.

For any constant $C$, define $P_C$ to be the probability that Alice chooses at most $(C\ln n)/\ln\log_2 n$ of the first $n(1+1/\log_2 n)$ boxes. Then we must have $${1\over n} \ge \Pr\{\hbox{Alice fails}\} \ge {P_C\over 2} \left({1\over\log_2 n} \right)^{(C\ln n)/\ln\log_2 n}, $$ so $$P_C \le {2\over n}\exp (C\ln n) = 2n^{C-1}. $$ In particular, for $C=1/2$, the probability that Alice chooses at most $(\ln n)/2\ln \log_2 n $ of the first $n(1+1/\log_2 n)$ boxes is at least $2/\sqrt n$. Therefore, with probability 1 as $n\to\infty$, Alice must examine at least $(\ln n)/ 2 \ln \log_2 n$ of the first $n(1+1/\log_2 n)$ boxes.

But remember that with probability 1/2, Lucy places only $n/\log_2 n$ red balls among the first $n(1+1/\log_2 n)$ boxes. If she does so, then the probability that Alice's first $(\ln n)/2 \ln \log_2 n$ looks uncovers a red ball is upper-bounded by the union bound $$\left({\ln n\over 2\ln\log_2 n}\right)\left({1\over \log_2 n}\right),$$ which approaches zero as $n\to\infty$. Therefore, with probability at least $1/2 -\epsilon$, Alice must examine at least $(1/2 - \epsilon)(\ln n)/\ln \log_2 n$ boxes. This establishes the $\Omega(\log n/\log \log n)$ bound.

Related Question