Finding the best threshold for bounding error probability in Chernoff (biased coins examples)

epsilon-deltaprobabilityupper-lower-bounds

Suppose we have two biased coins which we want to distinguish:
$$
\begin{cases}
c_1: P(H) = 1/2 + \epsilon\\
c_2: P(H) = 1/2 – \epsilon
\end{cases}
$$

We can define the random variable $X$ which counts the number of heads in $n$ trials:
$$X=\sum_{i=1}^n X_i$$

To find coin $c_2$ (with high probability) we can bound the error probability of the event $X \geq n/2$ using additive Chernoff bounds, finally obtaining the lower bound: $$n > \frac{1}{2 \epsilon^2}\cdot \log \frac{1}{\delta}$$

Now, suppose that the biases are not equal in magnitude, that is we have:
$$
\begin{cases}
c_1: P(H) = 1/2 + \epsilon_1 \\
c_2: P(H) = 1/2 – \epsilon_2
\end{cases}
$$

Defining $X$ similarly as above, we can now generally say:
$$X=\sum_{i=1}^n X_i < \lambda \cdot n, \ \lambda \in (0, 1)$$

For this latter case my intuition was to set $\lambda = (1+\epsilon_1-\epsilon_2)/2$ (i.e. the middle of the interval $(1/2-\epsilon_2, \ 1/2+\epsilon_1)$), obtaining a lower bound:
$$n > \frac{2}{(\epsilon_1 + \epsilon_2)^2}\cdot \log \frac{2}{\delta}$$

My question is: How to analytically find $\lambda$ s.t. to distinguish between the two coins (with high probability) with the minimum number of trials?

Or, in other words, how to prove that some $\lambda$ is the best threshold for the error probability used in Chernoff which gives the minimum number of trials needed? (note that for the first example, $\lambda = 1/2$).

Does it make sense to pose the problem in this way?

Thank you!

Best Answer

I assume that the goal is to select $\lambda$ to define a classifier that takes a value of $X$ and says that you have the $1/2-\epsilon_2$ biased coin if $X<n\lambda$ and you have the $1/2+\epsilon_1$ biased coin if $X \geq n\lambda$, when $\lambda \in (1/2-\epsilon_2,1/2+\epsilon_1)$ is a number decided before doing any flipping. (Or you could go the other way if $X=n\lambda$, it doesn't matter.)

In this case, you want to be wrong with probability at most $\delta$. To know what that probability is, you need a probability distribution for which coin you pick to start flipping. If you pick a coin uniformly at random (and only flip that coin), then you are wrong with probability

$$\frac{P(X \geq n \lambda \mid p=1/2-\epsilon_2)+P(X< n \lambda \mid p=1/2+\epsilon_1)}{2}$$

where $p$ is the probability of getting a heads. The Chernoff bound then says that this is less than $\frac{e^{-2(1/2-\epsilon_2-\lambda)^2 n}+e^{-2(1/2+\epsilon_1-\lambda)^2 n}}{2}$. Jensen's inequality then tells you that this bound is minimized when $\lambda=\frac{1+\epsilon_1-\epsilon_2}{2}$. More generally Jensen's inequality tells you that the Chernoff upper bound is minimized when $\lambda$ is the average of $1/2+\epsilon_1$ and $1/2-\epsilon_2$ as weighted by your prior, whatever that prior was. Note that here we are technically minimizing the bound, not necessarily minimizing the failure probability itself, though considering how intuitive the result is, I would be surprised if the result is different for the actual failure probability.

Regarding further generalization, I am not sure what the right way to proceed while also incorporating lower bounds on failure probabilities would look like. Most likely pretty similar, because the lower bounds behave rather similarly. I'm also not sure what it would look like if you accumulated a table of statistics from flipping both coins and counting the number of heads observed in each. Obviously this is not as good as always flipping the more biased of the two coins, but seeing as you don't know which one that is, it might still be better to switch back and forth, if you can come up with a classifier based on this data and estimate its error probability.

Related Question