[Math] Show $X_1$ and $X_2$ are negatively correlated

correlationprobability distributionsprobability theory

Consider $n$ independent tosses of a die. Each toss has probability $p_i$ of resulting in $i$. Let $X_i$ be the number of tosses that result in $i$. Show that $X_1$ and $X_2$ are negatively correlated.

My question is how $p_i$ plays into this proof. When I proved that $X_1$ and $X_2$ are negatively correlated, I didn't see an importance in making $p_i$ a variable. Here is my work:

To say two variables are negatively correlated suggests that an increase in occurrence of one lowers the appearance of the other. Mathematically:

  • Correlation coefficient $= \rho (X_1, X_2) = \frac{\mathrm{Cov}(X_1, X_2)}{\sqrt{\mathrm{Var}(X) \mathrm{Var}(Y)}}$

We don't have to deal with the denominator since the variance of any random variable is nonnegative by design meaning the denominator will always be positive. We focus on the covariance:

  • $\mathrm{Cov}(X_1,X_2) = E[X_1X_2] – E[X_1]E[X_2]$

We can interpret $E[X_1X_2]$ as $P(X=1) P(X=2 \mid X=1)$. In words, this is the probability that the dice roll results in $1$ and also results in $2$ which is impossible since the die can only display a single number at a time. So, $E[X_1X_2] = 0$.

Thus, we are left to just proving that $-E[X_1]E[X_2]$ is negative but because $X_1$ and $X_2$ are sums of independent Bernoulli random variables, these expectations are always positive implying

  • $\mathrm{Cov}(X_1, X_2) = -(\text{some positive number})$

Proving the correlation coefficient is negative. But, what is the point of specifying that the probability of each number in separate tosses is a random value?

Best Answer

We have $\mathbb P(Y_k=i)=p_i$ for $i=1,2,3,4,5,6$ and $k=1,2,\ldots,n$. Then $X_i = \sum_{k=1}^n \mathsf 1_{\{Y_k=i\}}$. It follows that \begin{align} \operatorname{Cov}(X_1,X_2) &= \mathbb E[X_1X_2]-\mathbb E[X_1]\mathbb E[X_2]\\ &= \mathbb E\left[\left(\sum_{k=1}^n\mathsf 1_{\{Y_k=1\}}\right)\left(\sum_{k=1}^n\mathsf 1_{\{Y_k=2\}}\right)\right] - (np_1)(np_2).\\ \end{align} Now, $\mathsf 1_{\{Y_i=1\}}\mathsf 1_{\{Y_i=2\}}=0$ so $\mathbb E\left[\mathsf 1_{\{Y_i=1\}}\mathsf 1_{\{Y_i=2\}}\right]=0$ and for $i\ne j$, $$\mathbb E\left[\mathsf 1_{\{Y_i=1\}}\mathsf 1_{\{Y_j=2\}}\right]=\mathbb P(Y_i=1,Y_j=2)=\mathbb P(Y_i=1)\mathbb P(Y_j=2) = p_1p_2.$$ Hence $$\operatorname{Cov}(X_1,X_2) = (n^2-n)p_1p_2 -n^2p_1p_2 =-np_1p_2<0, $$ so $X_1$ and $X_2$ are negatively correlated.

Related Question