Conditions for Binomial Distribution

binomial distributiondistributionspoisson-binomial-distribution

It is known that if $X$ is the sum of $n$ independent and identical Bernoulli random variables, $X$ follows a Binomial distribution.

How about the reverse, can a sum of dependent and/or non-identical Bernoulli random variables still follow a Binomial Distribution?

Best Answer

Yes, certainly. By this I mean a sum of dependent but not identically distributed Bernoulli variables can have a Binomial distribution.

Let $\mathbf X$ refer to a random vector of zeros and ones of length $n,$ so that there are $2^n$ possible values of $\mathbf X.$ For each such possible vector $\mathbf x$ let $p_{\mathbf x}$ be the probability that $\mathbf X = \mathbf x.$

Observe

  • The components of $\mathbf X$ (written $X_i,$ $i=1,2,\ldots, n$) are Bernoulli variables with probabilities $p_i.$ These probabilities are the sums of the $p_{\mathbf x}$ over all $\mathbf x$ for which $x_i=1.$

  • If there is a number $q$ for which the sum of $\mathbf X$ has a Binomial distribution, then (by the definition) the sum of all $p_{\mathbf x}$ where $\mathbf x$ has exactly $k$ components equal to $1$ must equal the Binomial probability: $$\sum_{\mathbf x:\, |\mathbf x|=k}p_{\mathbf x} = \binom{n}{k}q^k(1-q)^{n-k}.$$

The latter constitutes $n+1$ linear constraints on the $2^n$ values of the $p_{\mathbf x}$ and therefore has dimension at least $2^n - (n+1).$ If not all of the $X_i$ have the same distribution, the former translates to the complement of a finite set of linear constraints of the form $p_i\ne p_j$ for all $i\ne j.$ Finally, all $p_i$ must lie within the interval $[0,1].$

Generically, then, given $q$ there is a space of solutions of dimension $2^n-(n+1)$ if there is any solution at all.

Rather than analyze this in more detail, let's move on to the simplest possible example, where $n=2.$ ($n=1$ won't work because $2^1-(1+1)=0$ is not flexible enough.) Here is a tabulation of the possibilities.

$$\begin{array}[rrrrr] \text{} & x_1 & x_2 & p_{\mathbf x} & x_1+x_2 & \text{Binomial probability}\\ \hline & 0 & 0 & p_{00} & 0 & (1-q)^2\\ & 0 & 1 & p_{01} & 1 & (1-q)q\\ & 1 & 0 & p_{10} & 1 & q(1-q) \\ & 1 & 1 & p_{11} & 2 & q^2 \end{array}$$

Collecting lines according to the value of $x_1+x_2$ and equating the probabilities $p$ with the Binomial probability gives three equations,

$$\begin{aligned} p_{00} &= (1-q)^2\\ p_{01}+p_{10} &= 2q(1-q)\\ p_{11} &= q^2. \end{aligned}$$

Evidently $q$ determines $p_{00}$ (top line) and $p_{11}$ (bottom line) directly, leaving one more equation

$$p_{01} + p_{10} = 2q(1-q).$$

We seek solutions for which the $p_{*}$ lie in $[0,1].$ They can be parameterized by a number $t$ between $0$ and $2q(1-q).$ The full set of solutions therefore is

$$p = (p_{00}, p_{01}, p_{10}, p_{11}) = ((1-q)^2,\ t,\ 2q(1-q)-t,\ q^2).$$

Moreover, for any $q$ in the interval $(0,1)$ there are infinitely many solutions. This is our space of $2^2 - (2+1) = 1$ dimensions, parameterized by $t$ and contingent on the choice of $q,$ thereby giving a two-parameter family of possibilities.

The general case (for arbitrarily large $n$) works the same way but has many more solutions.


I will close by remarking that when the $X_i$ are independent, a quick analysis of their characteristic functions shows that their sum can be Binomial$(n,q)$ for some $q$ only when the $X_i$ are identically distributed. Thus, dependence is essential in the preceding construction.

Related Question