Solved – Finding the maximum and minimum variance of the sum of two Bernoulli events

bernoulli-distributionbinomial distributionexpected valueprobabilityvariance

You are guessing the contents of two envelopes. Let $U_i$ be the event that you guess correctly. Your probability of guessing correctly for each envelope is $P(U_1) = P(U_2) = 3/4$. $U_1$ and $U_2$ may or may not be independent of each other. Let $X$ be a random variable that represents the total number of correct guesses you've made. $X$ can be $0$, $1$, or $2$.

a. Compute $𝐸(𝑋)$.
So I modeled this as a sum of two Bernoulli events; each one has $p = 0.75$, and can take on $1$ for success, or $0$ for failure.

$$E(X) = 0.75 + 0.75 = 1.5$$

b. Compute the maximum value for $\operatorname{var}(𝑋)$.
c.* Compute the minimum value for $\operatorname{var}(𝑋)$.
I'm unclear how to answer these two questions.

  • Now, I know $$\operatorname{var}(X) = \operatorname{var}(U_1 + U_2) = \operatorname{var}(U_1) + \operatorname{var}(U_2) + 2\operatorname{cov}(U_1,U_2)$$
  • I also know cov = $E(XY)-E(X)E(Y)$
  • I also know that $E(X) = p$ for a Bernoulli variable, and that $\operatorname{var}(X) = pq$ for a Bernoulli variable.

I have no idea how to calculate or rationalize an $E(XY)$ to maximize or minimize the variance. Is this in the right direction?

Does $E(XY)$ have to take on a value of either $1$ or $0$?

Now suppose there are four envelopes, $1$ through $4$. Again, let $𝑈_𝑖$ be the event that you guess envelope i correctly. $𝑃(U_i)=3/4$ for all $𝑖$ . Once again, let random variable $𝑋$ be the total number of envelopes that are correctly guessed.

d.* Compute the minimum and maximum values for $\operatorname{var}(𝑋)$.

I likewise would have no idea how to do this. Following the above line of thinking, cov() with multiple variables would give a matrix. How would this be incorporated?

Best Answer

Make a table of the probabilities where $X_1$ is the indicator of $U_1$ and $X_2$ the indicator of $U_2:$

$$\begin{array}{r|cc|l} & X_2=0 & X_2=1 & \text{Total} \\ \hline X_1=0 & a & \frac{1}{4}-a & \frac{1}{4} \\ X_1=1 & \frac{1}{4}-a & \frac{1}{2}+a & \frac{3}{4} \\ \hline \text{Total} & \frac{1}{4} & \frac{3}{4} \end{array} $$

This table was constructed beginning with the row and column totals as specified by the problem, letting $a = \Pr(X_1=X_2=0),$ and applying the (obvious) probability axioms to complete the other three cells. Since probabilities are positive, the possible values of $a$ are $0\le a \le 1/4.$

The problem supposes $E[X_1]=E[X_2]=3/4.$ Because $X_i^2 = X_i,$ their common variance is $$\operatorname{Var}(X_i) = E[X_i^2]-E[X_i]^2 = E[X_i] - E[X_i]^2 = 3/4 - (3/4)^2 = 3/16.$$

A formula for their covariance is

$$\operatorname{Cov}(X_1,X_2) = E[X_1X_2] - E[X_1]E[X_2] = \left(\frac{1}{2}+a\right) - \left(\frac{3}{4}\right)^2\tag{*}$$

because the only contribution to $E[X_1X_2]$ comes from the case $X_1X_2=1.$

Finally,

$$\eqalign{ \operatorname{Var}(X) &= \operatorname{Var}(X_1+X_2) \\ &= \operatorname{Var}(X_1) + \operatorname{Var}(X_2) + 2\operatorname{Cov}(X_1,X_2) \\ &= \frac{3}{16} + \frac{3}{16} + 2\left(\frac{1}{2}+a - \left(\frac{3}{4}\right)^2\right) \\ &= \frac{1}{4}+2a.}$$

This linear function of $a$ obviously is optimized at the extreme possible values of $a,$ equal to $1/4$ when $a=0$ and $3/4$ when $a=1/4.$


With multiple envelopes there are corresponding variables $X_1, X_2, \ldots, X_n$ and $n(n-1)/2$ possibly different covariances among them. Their covariance matrix $\operatorname{Cov}(X_i,X_j)=(\sigma_{ij})=\Sigma$ will have values of $3/16$ on the diagonal and, off the diagonal, symmetrical entries between $-1/16$ and $3/16$ according to $(*),$ subject to being positive semidefinite. Let's duck that issue for the moment and just compute the variance of the sum $S_n=X_1+X_2+\cdots+X_n.$ That is

$$\operatorname{Var}(S_n) = \frac{3n}{16} + \sum_{1\le i \lt j \le n} 2\sigma_{ij}.$$

Looking at the extreme possible values of the $\sigma_{ij},$ we see the variance cannot possibly be any smaller than $3n/16 - n(n-1)/16$ nor greater than $3n/16 + 3n(n-1)/16.$ In both cases it's straightforward to check that $\Sigma$ is positive semidefinite, because in each case it is a multiple of the positive semidefinite matrix $1_n^\prime 1_n$ (the $n\times n$ matrix of ones) plus a positive diagonal matrix.

When $n=4$ this gives extreme values of $12/16 - 12/16 = 0$ and $12/16 + 36/16=3.$

These variances can arise in the following ways. In the first case (zero variance), write the vectors $(1,0,0,0),$ $(0,1,0,0),$ $(0,0,1,0),$ and $(0,0,0,1)$ on four slips of paper and put them into a hat. Mix them and draw one out randomly. Let $(X_1,\ldots,X_4)$ be the values you see on that slip. Obviously $\Pr(X_i=1)=3/4$, but since the sum of the $X_i$ on each slip is always $1,$ the variance of the sum is zero.

In the second case, let the four tickets bear the values $(0,0,0,0),$ $(1,1,1,1),$ $(1,1,1,1),$ and $(1,1,1,1).$ Again $\Pr(X_i=1)=3/4$ but now the sum is four times any one of the $X_i,$ whence its variance is 16 times the common variance of the $X_i.$


For those who might think this is all trivial, note that some of the matrices $\Sigma$ just described are not positive semidefinite and therefore cannot occur as covariance matrices of the $X_i.$ For instance, let $\sigma_{ij}=0$ for $ij\in \{14,34,43,41\}$ and otherwise let $\sigma_{ij}=3/16.$ You may compute that for $y=(3,-7,3,5),$ $y\Sigma y^\prime$ is negative.