Bernoulli Distribution – What Is the Distribution of a Sum of Identically Distributed Bernoulli Random Variables with Same Pair Correlation?

bernoulli-distributionbeta-binomial distributionbinomial distributiondensity functiondistributions

What is the distribution of a sum of $n$ Bernoulli random variables, each having success probability $p$, where each pair is correlated with correlation coefficient $\rho$?

$$Y = \sum_{i=1}^n X_i$$
$$ X_i \sim \mathsf{Bernoulli}(p),\:\:\:\operatorname{corr}(X_i, X_j) = \rho$$

If $\rho=0$, then it is obviously a binomial distribution. Is there a closed-form expression for the probability mass function when $\rho > 0$?

Is it perhaps a beta-binomial? I cannot convince myself either way.

Best Answer

Have you seen this paper: Kadane, 2016, Sums of Possibly Associated Bernoulli Variables: The Conway-Maxwell-Binomial Distribution?

In this paper, you can see that the conditions assumed in your question i.e. having $n$ marginally Binomial r.v. with the same probability of success, $p$, and the same pairwise correlation, $\rho$, between all pairs does not fully specify the distribution of the sum of those random variables.

To be more specific, in Section 2.3 of the paper, the author has assumed "zero higher order additive interaction (Darroch, 1974)":

enter image description here

where $P\{W = k\} = P\{\sum_{i=0}^{m} X_i = k\}$. The model is also called correlated binomial model.

Here is also a brief summary of the first sections of the paper that you may find helpful for modeling the sum:

Proposition 1,2 and 3 provide reasoning for not using correlation as a measure of dependence and to model the sum without assuming a marginal distribution. section 2.1 and 2.2 are distribution models that have these two characteristics. They have some notion of dependence but it is not necessary the correlation. They also allow for symmetric dependence. (Proposition 1 states that correlation cannot be used as a measure of dependence as it is bounded below by $-1/(m-1)$ based on the conditions stated in the proposition).

Section 3 is the proposed model of the author to directly model the sums using a notion of dependence that allows both for positive and negative association.