Solved – Expected value of product of non independent Bernoulli random variables (correlations are known)

bernoulli-distributioncorrelationexpected valuejoint distribution

I've asked a question about getting the joint probability distribution for $N$ Bernoulli random variables, given the expected value for each one ($E[X_i]=p_i)$ and it's correlations ($\rho_{12},\rho_{13},\dots$). Someone kindly directed me to this paper, and it's been really helpful, but I still can't figure one last point out:

Given that I know the correlation matrix for the $N$ Bernoulli random variables: How can I get the expected value for the product of all of them:
$$
E\left[\prod_{j=1}^N X_j \right]=?
$$

For two Bernoulli random variables, I know that expected value is:
$$
E\left[X_1·X_2\right]=\sigma_{12}+p_1·p_2
$$ where $\sigma_{12}$ is the covariance of $X_1,X_2$ and $p_j=E[X_j]$. However, when working with three random variables, I can't find a way. My gut tells me that it can be a function of the covariances, but I don't know for sure.

Can you point me in the right direction?

Best Answer

Ordinarily, bivariate relationships do not determine multivariate ones, so we ought to expect that you cannot compute this expectation in terms of just the first two moments. The following describes a method to produce loads of counterexamples and exhibits an explicit one for four correlated Bernoulli variables.


Generally, consider $k$ Bernoulli variables $X_i, i=1,\ldots, k$. Arranging the $2^k$ possible values of $(X_1, X_2, \ldots, X_k)$ in rows produces a $2^k \times k$ matrix with columns I will call $x_1, \ldots, x_k$. Let the corresponding probabilities of those $2^k$ $k$-tuples be given by the column vector $p=(p_1, p_2, \ldots, p_{2^k})$. Then the expectations of the $X_i$ are computed in the usual way as a sum of values times probabilities; viz.,

$$\mathbb{E}(X_i) = p^\prime \cdot x_i.$$

Similarly, the second (non-central) moments are found for $i\ne j$ as

$$\mathbb{E}(X_iX_j) = p^\prime \cdot (x_i x_j).$$

The adjunction of two vectors within parentheses denotes their term-by-term product (which is another vector of the same length). Of course when $i=j$, $(x_ix_j) = (x_ix_i) = x_i$ implies that the second moments $\mathbb{E}(X_i^2) = \mathbb{E}(X_i)$ are already determined.

Because $p$ represents a probability distribution, it must be the case that

$$p \ge 0$$

(meaning all components of $p$ are non-negative) and

$$p^\prime \cdot \mathbf{1} = 1$$

(where $\mathbf{1}$ is a $2^k$-vector of ones).

We can collect all the foregoing information by forming a $2^k \times (1 + k + \binom{k}{2})$ matrix $\mathbb{X}$ whose columns are $\mathbf{1}$, the $x_i$, and the $(x_ix_j)$ for $1 \le i \lt j \le k$. Corresponding to these columns are the numbers $1$, $\mathbb{E}(X_i)$, and $\mathbb{E}(X_iX_j)$. Putting these numbers into a vector $\mu$ gives the linear relationships

$$p^\prime \mathbb{X} = \mu^\prime.$$

The problem of finding such a vector $p$ subject to the linear constraints $p \ge 0$ is the first step of linear programming: the solutions are the feasible ones. In general either there is no solution or, when there is, there will be an entire manifold of them of dimension at least $2^k - (1 + k + \binom{k}{2})$. When $k\ge 3$, we can therefore expect there to be infinitely many distributions on $(X_1,\ldots, X_k)$ that reproduce the specified moment vector $\mu$. Now the expectation $\mathbb{E}(X_1X_2\cdots X_k)$ is simply the probability of $(1,1,\ldots, 1)$. So if we can find two of them that differ on the outcome $(1,1,\ldots, 1)$, we will have a counterexample.

I am unable to construct a counterexample for $k=3$, but they are abundant for $k=4$. For example, suppose all the $X_i$ are Bernoulli$(1/2)$ variables and $\mathbb{E}(X_iX_j) = 3/14$ for $i\ne j$. (If you prefer, $\rho_{ij} = 6/7$.) Arrange the $2^k=16$ values in the usual binary order from $(0,0,0,0), (0,0,0,1), \ldots, (1,1,1,1)$. Then (for example) the distributions

$$p^{(1)} = (1,0,0,2,0,2,2,0,0,2,2,0,2,0,0,1)/14$$

and

$$p^{(2)} = (1,0,0,2,0,2,2,0,1,1,1,1,1,1,1,0)/14$$

reproduce all moments through order $2$ but give different expectations for the product: $1/14$ for $p^{(1)}$ and $0$ for $p^{(2)}$.

Here is the array $\mathbb{X}$ adjoined with the two probability distributions, shown as a table:

$$\begin{array}{cccccccccccccc} & 1 & x_1 & x_2 & x_3 & x_4 & x_1x_2 & x_1x_3 & x_2x_3 & x_1x_4 & x_2x_4 & x_3x_4 & p^{(1)} & p^{(2)} \\ & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{14} & \frac{1}{14} \\ & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ & 1 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & \frac{1}{7} & \frac{1}{7} \\ & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & \frac{1}{7} & \frac{1}{7} \\ & 1 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & \frac{1}{7} & \frac{1}{7} \\ & 1 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \\ & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{14} \\ & 1 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & \frac{1}{7} & \frac{1}{14} \\ & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & \frac{1}{7} & \frac{1}{14} \\ & 1 & 1 & 0 & 1 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & \frac{1}{14} \\ & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & \frac{1}{7} & \frac{1}{14} \\ & 1 & 1 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & \frac{1}{14} \\ & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & \frac{1}{14} \\ & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & \frac{1}{14} & 0 \\ \end{array}$$


The best you can do with the information given is to find a range for the expectation. Since the feasible solutions are a closed convex set, the possible values of $\mathbb{E}(X_1\cdots X_k)$ will be a closed interval (possibly an empty one if you specified mathematically impossible correlations or expectations). Find its endpoints by first maximizing $p_{2^k}$ and then minimizing it using any linear programming algorithm.