[Math] Convex combination iid Bernoulli random variables

pr.probability

I'm looking for a lower bound for the probability that an arbitrary convex combination of iid Bernoulli (p) random variables is at least p. My guess is p/k (for some constant k; k must be at least e, as noted by Matt below), but I'm happy with any positive lower bound that depends only on p.

For example, if p is slightly above 1/2, and the convex combination is simply the average of two variables, then the probability is slightly above 1/4 which is approximately p/2.

Best Answer

To complement my other answer, I will show

Proposition 1 Let $\xi_k$ be a finite number of iid Bernoulli random variables of expectation $p > 1/2$, and let $a_k > 0$ be real numbers. Then ${\bf P}( \sum_k a_k \xi_k \geq p \sum_k a_k) \gg 1$.

By replacing $\xi_k$ with $1-\xi_k$ and $p$ with $1-p$, this is equivalent to

Proposition 2 Let $\xi_k$ be a finite number of iid Bernoulli random variables of expectation $p < 1/2$, and let $a_k > 0$ be real numbers. Then ${\bf P}( \sum_k a_k \xi_k \leq p \sum_k a_k) \gg 1$.

Let's prove Proposition 2. I found a number of arguments to treat various special cases, which when combined together was able to cover the general case as follows.

We will need a large absolute constant $C_0$.

Case 1: (high multiplicity) For every integer $n$, the number of $a_k$ in the dyadic interval $(2^{n-1}, 2^n]$ is either zero, or at least $C_0/p$.

Roughly speaking this is the regime where the central limit theorem applies. In this case we can proceed by the Berry-Esseen inequality, which lets one estimate $$ {\bf P}( \sum_k a_k \xi_k \geq p \sum_k a_k) = \frac{1}{2} + O( \frac{p \sum_k a_k^3}{(p \sum_k a_k^2)^{3/2}} ).$$ If, for each $n$, we let $c_n$ be the number of $k$ for which $a_k \in (2^{n-1},2^n]$, we can write the right-hand side as $$ \frac{1}{2} + O( p^{-1/2} \frac{\sum_n 2^{3n} c_n}{(\sum_n 2^{2n} c_n)^{3/2}} ).$$ By hypothesis, each $c_n$ is either zero or at least $C_0/p$, so we can bound $$ 2^{3n} c_n \leq (C_0/p)^{-1/2} 2^{2n} c_n (\sum_{n'} 2^{2n'} c_{n'})^{1/2} $$ so the previous expression becomes $\frac{1}{2} + O( C_0^{-1/2} )$, and the claim follows for $C_0$ large enough.

Case 2: (low multiplicity) For every integer $n$, the number of $a_k$ in the dyadic interval $(2^{n-1}, 2^n]$ is at most $C_0/p$.

Here the Bernoulli variables are mostly zero and one can proceed using the first and second moment methods, after first applying a dyadic decomposition.

Let $n_0$ be the integer such that $p \sum_k a_k \in (2^{n_0-1}, 2^{n_0}]$. It will suffice to show that $$ {\bf P}( \sum_k a_k \xi_k \leq 2^{n_0-1} ) \gg_{C_0} 1.$$

The number of $k$ with $a_k > 2^{n_0-C_0}$ is at most $2^{C_0} / p$, and hence $$ {\bf P}( \sum_{k: a_k > 2^{n_0-C_0}} a_k \xi_k = 0 ) \geq (1-p)^{2^{C_0}/p} \gg_{C_0} 1.\qquad(1)$$ Next, for any $m \geq C_0$, we consider the random variable $$ \sum_{k: 2^{n_0-m-1} < a_k \leq 2^{n_0-m}} a_k \xi_k.$$ There are at most $C_0/p$ elements in this sum, so the second moment of this variable can be computed as $$ {\bf E} ( \sum_{k: 2^{n_0-m-1} < a_k \leq 2^{n_0-m}} a_k \xi_k )^2 \ll C_0^2 2^{2n_0-2m}.$$ By Markov's inequality we thus have $$ {\bf P} ( \sum_{k: 2^{n_0-m-1} < a_k \leq 2^{n_0-m}} a_k \xi_k \geq 2^{n_0-m/2} ) \ll C_0^2 2^{-m}.\qquad(2)$$

Applying the union bound to (2) for all $m \geq C_0$ and combining with (1) (using independence), we obtain the claim (if $C_0$ is large enough).

Case 3: (general case)

In this case we can partition the $k$ into two classes ${\mathcal K}_1$, ${\mathcal K}_2$, one of which is in the low multiplicity case and one of which is in the high multiplicity case. By the preceding cases we have

$${\bf P}( \sum_{k \in {\mathcal K}_1} a_k \xi_k \leq p \sum_{k \in {\mathcal K}_1} a_k) \gg 1$$ and $${\bf P}( \sum_{k \in {\mathcal K}_2} a_k \xi_k \leq p \sum_{k \in {\mathcal K}_2} a_k) \gg_{C_0} 1$$ so by independence we conclude $${\bf P}( \sum_k a_k \xi_k \leq p \sum_k a_k) \gg_{C_0} 1$$ as required.

Related Question