Solved – Variance of the Poisson Binomial Distribution

distributionspoisson-binomial-distributionvariance

Consider a sequence of $n$ independent Bernoulli trials drawn from a list of biases $p_1,p_2,…,p_n\in[0,1]$, respectively. We set the random variable $X$ to be the sum of these trials. On wikipedia, the distribution of $X$ is called the Poisson binomial distribution. We define the sample mean and sample variance of our list of Bernoulli biases as
$$
\bar{p}=\frac{1}{N}\sum_{i=1}^n p_i
$$
and
$$
\sigma_p^2
=\frac{1}{N}\sum_{i=1}^N(p_i-\bar{p})
=\frac{1}{N}\sum_{i=1}^N p_i^2 – \bar{p}^2.
$$

Since the trials are independent, it is easy to compute that
$$
\mathbb{E}[X]
= \sum_{i=1}^n p_i = N\bar{p}
$$
and
\begin{align*}
\mathbb{Var}[X]
&= \sum_{i=1}^n p_i(1-p_i) \\
&= N\bar{p} – N(\sigma_p^2+\bar{p}^2) \\
&= N\bar{p}(1-\bar{p}) – N\sigma_p^2.
\end{align*}

The expectation value of $X$ is not surprising. Also, when $\sigma_p^2=0$ we must have $\bar{p}=p_1=\cdots=p_n$ and so $X$ is binomially distributed, which matches $\mathbb{Var}[X]$ computed above.

My confusion is this: why does the variance of $X$ go down as the sample variance $\sigma_p^2$ goes up (with $\bar{p}$ and $N$ fixed)? I find this very counter-intuitive, and would appreciate an explanation. I would expect with a greater variance of biases, there would be a broader distribution of possible sums of the result…

Best Answer

Think of the case where $n=2$. If $p_1 = p_2 = 0.5$, this maximizes the variance of X. If $p_1 = 0$ and $p_2 = 1$, then $X=1$ and there is no variance.

Good bounds for the Poisson binomial distribution

Consider the following lemma:

Given $0 < p_1 \le \dots \le p_N < 1$ with $N \ge \frac{2}{p_1}$, define $\mu := \frac{1}{N}\sum_{k=1}^N p_k$. For $m \le N+1 -\frac{2}{p_1}$, the tail of the Poisson binomial distribution with parameters $N$, $\{p_k\}_{k=1}^N$ is bounded by the binomial distribution with parameters $N$ and $\mu$: $$\sum_{A \in \mathcal{F}_m}\prod_{i \in A}p_i\prod_{j\in A^c}(1-p_j) \le \binom{N}{m}\mu^m(1-\mu)^{N-m}.$$

The basic idea for the proof is that a Poisson binomial distribution with parameters $p_2,\dots,p_{N-1}, (p_1+p_N)/2, (p_1+p_N)/2$ has a heavier tail than the distribution with parameters $p_1 \le \dots \le p_N$. Repeatedly applying this and accounting for edge cases will give the limits on $N$ and $m$ where the statement holds. The statement I prove and the proof are at the end of this post.

If you can bound the mean of the success probabilities, directly, then this lemma will give you a tight bound. Otherwise, suppose you have lower bounds on the $p_i$. Then the 'worst-case' distribution is when each of the $p_i$ are at the lower bound (this maximizes the probability that there are few successes). Let $\mu$ be the mean of the bounds on $p_1,\dots,p_N$. Using the lemma we get that the binomial distribution with parameters $N$ and $\mu$ upper bounds the tail where there are few 'successes'. If you want the other tail, use $1-p_i$ for your success probabilities (you will want to use the upper bounds on $\{p_i\}_{i=1}^N$ as the worst-case). Since the Poisson binomial distribution includes the binomial distribution as a special case, this approximation is as tight as your bounds on the $p_i$. From here you can use an appropriate bound for the tail of the Binomial distribution, such as Hoeffding's inequality.

Bounds on the tail of the binomial distribution

The wikipedia article for binomial distributions has a very detailed section on this. Hoeffding's inequality gives: $$Pr\left(\sum_{n=1}^N X_n \le k\right) \le \exp\left(-2 (N\mu-k)^2/N \right)$$ using the success probabilities $1-p_i$ in the binomial approximation will give you the other tail.

Interaction between success probabilities and the CDF

To clarify - the CDF always integrates to 1, so I don't think we can call it 'decreasing' with respect to parameters. If you restrict yourself to the tail where 'many' successes occur, then intuitively we know that as the $p_i$ decrease, the probability that 'many' successes occur should also decrease. I would suggest using the binomial approximation above to give yourself a tight upper bound, and then find a value of $n$ such that the probability that there are $n$ successes is a decreasing function of $p_i$ by taking the derivative. From this point you should be able to prove that for each $k \ge n$, the probability there are $k$ successes is also a decreasing function (using unimodality of the binomial distribution), and hence the probability that there are at least $n$ successes is a decreasing function of $p_i$.

Formal Statement and Proof

This is a technical lemma in a paper which is under review. I'll update this post with bibliographic information if it gets accepted and/or if it is put on arXiv. If you find mistakes or have ideas on how to make the presentation clearer, please let me know!

Claim: Given $0 < p_1 \le \dots \le p_K < 1$ with $K \ge \frac{2}{p_1}$, define $\mu := \frac{1}{K}\sum_{k=1}^K p_k$. For $m \le K+1 -\frac{2}{p_1}$, the tail of the Poisson Binomial distribution with parameters $K$, $\{p_k\}_{k=1}^K$ is bounded by the Binomial distribution with parameters $K$ and $\mu$: $$\sum_{A \in \mathcal{F}_m}\prod_{i \in A}p_i\prod_{j\in A^c}(1-p_j) \le \binom{K}{m}\mu^m(1-\mu)^{K-m}$$

Proof: We prove this by considering a sequence of Poisson Binomial distributions $f_0,f_1,\dots$. The parameters of the $n+1$th distribution are $K$ and $$\{p_k^{n+1}\}_{k=1}^K = \left\{\frac{p^{n}_1+p^{n}_K}{2}, \{p^{n}_k\}_{k=2}^{K-1},\frac{p^{n}_1+p^{n}_K}{2}\right\},$$ that is, the largest and smallest event probabilities of distribution $n$ are averaged to form distribution $n+1$. One can quickly see that the mean $\mu^n =\frac{1}{K} \sum_{k=1}^K p_k^n$ is constant for all $n$, and that the sequence of distributions converge to the Binomial distribution with parameters $K$ and $\mu^0$. Next we show that the tail of the sequence of distributions becomes heavier as $n$ increases. For ease of notation, we define $\gamma_n = (p_1^n + p_K^n)/2$ and $\epsilon_n = \gamma_n-p_1^n = p_K^n-\gamma_n$.

First consider $m=0$. We have: \begin{align*} f_n(0) & = \prod_{k=1}^K(1-p_k^n) = \prod_{k=2}^{K-1}(1-p_k^{n})((1-\gamma_n)^2-\epsilon_n^2)\\ &\le (1-\gamma_n)^2\prod_{k=2}^{K-1}(1-p_k^n) = f_{n+1}(0) \end{align*} For $m = 1$, we have: \begin{align*} f_n(1) &= (p_1^n(1-p_K^n)+p_K^n(1-p_1^n))\prod_{k=2}^{K-1}(1-p_k^n)\\ &\qquad+ (1-p_1^n)(1-p_K^n)\sum_{k=2}^K \frac{p_k^n}{1-p_k^n}\prod_{k=2}^{K-1}(1-p_k^n)\\ &= \left(2\gamma_n(1-\gamma_n) +(1-\gamma_n)^2 \sum_{k=2}^K \frac{p_k^n}{1-p_k^n}\right)\prod_{k=2}^{K-1}(1-p_k^n)\\ &\qquad+ \epsilon_n^2\left(2-\sum_{k=2}^{K-1}\frac{p_k^n}{1-p_k^n}\right)\prod_{k=2}^{K-1}(1-p_k^n)\\ &\le f_{n+1}(1) + \epsilon_n^2\prod_{k=2}^{K-1}(1-p_k^n)\left(2-(K-2)\frac{p^n_1}{1-p^n_1}\right)\\ &\le f_{n+1}(1) + \epsilon_n^2\prod_{k=2}^{K-1}(1-p_k^n)\left(\frac{2 - Kp_1^n}{1-p^n_1}\right)\le f_{n+1}(1)\\ \end{align*} The first and second equalities follow from the definitions of $f_n(1)$, $\gamma_n$ and $\epsilon_n$. The first inequality follows since $\frac{p^n_1}{1-p^n_1} \le \frac{p^n_k}{1-p^n_k}$, and the final inequalities since $Kp_1^n \ge 2$.

Now we consider the case $m \ge 2$. Define $\mathcal{F}_{m-2}$ as the $\binom{K-2}{m-2}$ sets with $m-2$ unique elements from the set $\mathcal{K} = \{2,\dots,K-2\}$. For an event $A\in \mathcal{F}_{m-2}$, let $A^c = \mathcal{K}\setminus A$ and define the constants \begin{align*} c^n_0(A) &= \prod_{k\in A}p^n_k\prod_{k'\in A^c}(1-p^n_{k'})\\ c^n_1(A) &= \sum_{\ell \in A^c} \frac{p^n_\ell}{1-p^n_\ell}\\ c^n_2(A) &= \sum_{\ell \in A^c} \frac{p^n_\ell}{1-p^n_\ell}\sum_{k\neq\ell \in A^c}\frac{p^n_k}{1-p^n_k}.\\ \end{align*}

The probability that event $A \in \mathcal{F}_{m-2}$ occurs and there are $m$ successful trials out of $K$ is proportional to \begin{align*} g(p_1^n,p_K^n,c^n_1(A),c^n_2(A)) &:=p_1^np_K^n + c^n_1(A)(p_1^n(1-p_K^n) + p_K^n(1-p_1^n))\\ &\qquad + c^n_2(A)(1-p_1^n)(1-p_K^n)\\ & = g(\gamma_n,\gamma_n, c^n_1(A),c^n_2(A))\\ &\qquad + \epsilon^2(-1 + 2c^n_1(A) - c^n_2(A)), \end{align*} where the second equality follows from nearly the same algebra as was used for the case $m=1$. For $m \le K+1 - \frac{2}{p_1}$, we have that $p_1 \ge 2/(K-m+1)$, and using the definition of $c_1^n(A)$ and $c_2^n(A)$, \begin{align*} -1 + 2c^n_1(A) -c^n_2(A) &= -1 + \sum_{\ell \in A^c} \frac{p^n_\ell}{1-p^n_\ell}\left(2 - \sum_{k\neq\ell \in A^c}\frac{p^n_k}{1-p^n_k}\right)\\ &\le -1 + \sum_{\ell \in A^c} \frac{p^n_\ell}{1-p^n_\ell}\left(2 - \frac{(K-m-1)2}{(K-m-1)}\right)\\ & \le 0. \end{align*} The first inequality is due to the fact that $\frac{x}{1-x}$ is an increasing function of $x$ for $x \in (0,1)$, and the second is due to the condition on $m$. The Poisson binomial distribution can be expressed for $m\ge 2$ as \begin{align*} f_n(m) &= \sum_{A \in \mathcal{F}_{m-2}} c^n_0(A) g(p_1^n,p_K^n, c^n_1(A), c^n_2(A))\\ &= f_{n+1}(m) + \epsilon^2\sum_{A \in \mathcal{F}_{m-2}} c^n_0(A)(-1 + 2c^n_1(A) -c^n_2(A)), \end{align*} which implies that for $2 \le m \le K+1 - \frac{2}{p_1}$, we have $f_n(m) \le f_{n+1}(m)$. Since the limiting distribution is the Binomial distribution with parameters $K$ and $\mu$, this implies that for for $2 \le m \le K+1 - \frac{2}{p_1}$, the original probability mass function $f_0(m)$ is smaller than the binomial with parameters $K$ and $\mu$, and the proof is complete.

Best Answer

Related Solutions

Solved – Poisson Binomial Distribution

Good bounds for the Poisson binomial distribution

Bounds on the tail of the binomial distribution

Interaction between success probabilities and the CDF

Formal Statement and Proof

Related Question