# Conditions for Binomial Distribution

binomial distributiondistributionspoisson-binomial-distribution

It is known that if $$X$$ is the sum of $$n$$ independent and identical Bernoulli random variables, $$X$$ follows a Binomial distribution.

How about the reverse, can a sum of dependent and/or non-identical Bernoulli random variables still follow a Binomial Distribution?

Yes, certainly. By this I mean a sum of dependent but not identically distributed Bernoulli variables can have a Binomial distribution.

Let $$\mathbf X$$ refer to a random vector of zeros and ones of length $$n,$$ so that there are $$2^n$$ possible values of $$\mathbf X.$$ For each such possible vector $$\mathbf x$$ let $$p_{\mathbf x}$$ be the probability that $$\mathbf X = \mathbf x.$$

Observe

• The components of $$\mathbf X$$ (written $$X_i,$$ $$i=1,2,\ldots, n$$) are Bernoulli variables with probabilities $$p_i.$$ These probabilities are the sums of the $$p_{\mathbf x}$$ over all $$\mathbf x$$ for which $$x_i=1.$$

• If there is a number $$q$$ for which the sum of $$\mathbf X$$ has a Binomial distribution, then (by the definition) the sum of all $$p_{\mathbf x}$$ where $$\mathbf x$$ has exactly $$k$$ components equal to $$1$$ must equal the Binomial probability: $$\sum_{\mathbf x:\, |\mathbf x|=k}p_{\mathbf x} = \binom{n}{k}q^k(1-q)^{n-k}.$$

The latter constitutes $$n+1$$ linear constraints on the $$2^n$$ values of the $$p_{\mathbf x}$$ and therefore has dimension at least $$2^n - (n+1).$$ If not all of the $$X_i$$ have the same distribution, the former translates to the complement of a finite set of linear constraints of the form $$p_i\ne p_j$$ for all $$i\ne j.$$ Finally, all $$p_i$$ must lie within the interval $$[0,1].$$

Generically, then, given $$q$$ there is a space of solutions of dimension $$2^n-(n+1)$$ if there is any solution at all.

Rather than analyze this in more detail, let's move on to the simplest possible example, where $$n=2.$$ ($$n=1$$ won't work because $$2^1-(1+1)=0$$ is not flexible enough.) Here is a tabulation of the possibilities.

$$\begin{array}[rrrrr] \text{} & x_1 & x_2 & p_{\mathbf x} & x_1+x_2 & \text{Binomial probability}\\ \hline & 0 & 0 & p_{00} & 0 & (1-q)^2\\ & 0 & 1 & p_{01} & 1 & (1-q)q\\ & 1 & 0 & p_{10} & 1 & q(1-q) \\ & 1 & 1 & p_{11} & 2 & q^2 \end{array}$$

Collecting lines according to the value of $$x_1+x_2$$ and equating the probabilities $$p$$ with the Binomial probability gives three equations,

\begin{aligned} p_{00} &= (1-q)^2\\ p_{01}+p_{10} &= 2q(1-q)\\ p_{11} &= q^2. \end{aligned}

Evidently $$q$$ determines $$p_{00}$$ (top line) and $$p_{11}$$ (bottom line) directly, leaving one more equation

$$p_{01} + p_{10} = 2q(1-q).$$

We seek solutions for which the $$p_{*}$$ lie in $$[0,1].$$ They can be parameterized by a number $$t$$ between $$0$$ and $$2q(1-q).$$ The full set of solutions therefore is

$$p = (p_{00}, p_{01}, p_{10}, p_{11}) = ((1-q)^2,\ t,\ 2q(1-q)-t,\ q^2).$$

Moreover, for any $$q$$ in the interval $$(0,1)$$ there are infinitely many solutions. This is our space of $$2^2 - (2+1) = 1$$ dimensions, parameterized by $$t$$ and contingent on the choice of $$q,$$ thereby giving a two-parameter family of possibilities.

The general case (for arbitrarily large $$n$$) works the same way but has many more solutions.

I will close by remarking that when the $$X_i$$ are independent, a quick analysis of their characteristic functions shows that their sum can be Binomial$$(n,q)$$ for some $$q$$ only when the $$X_i$$ are identically distributed. Thus, dependence is essential in the preceding construction.