Variance of multivariate hypergeometric random variables

probabilitysolution-verificationvariance

Question

A box contains $N_1$ white balls, $N_2$ black balls, and $N_3$ red balls, where $N_1 + N_2 + N_3 = N$. Two balls are randomly selected from the box without replacement. Let $Y_1$, $Y_2$, and $Y_3$ denote the number of white, black, and red balls, respectively, observed in the sample. What is the variance of $Y_1$ and $Y_2$?

My working

We have
$$\begin{aligned}
Y_1 & = \sum_{i=1}^{N_1} \mathbb{1} \{\mathrm{white} \ \mathrm{marble} \ i \ \mathrm{chosen}\}\\
Y_2 & = \sum_{j=1}^{N_2} \mathbb{1} \{\mathrm{black} \ \mathrm{marble} \ j \ \mathrm{chosen}\}
\end{aligned}$$

Moreover, denote $X_i$ as $\mathbb{1} \{\mathrm{white} \ \mathrm{marble} \ i \ \mathrm{chosen}\}$.

$$\begin{aligned}
Var(Y_1) & = Var \left(\sum_{i=1}^{N_1} \mathbb{1} \{\mathrm{white} \ \mathrm{marble} \ i \ \mathrm{chosen}\}\right)\\
& = \sum_{i=1}^{N_1} Var(X_i) \\
& = \sum_{i=1}^{N_1} \left[E({X_i}^2) – E(X_i)^2 \right]\\
& = \sum_{i=1}^{N_1} E({X_i}) – \sum_{i=1}^{N_1}E(X_i)^2 \\
& = N_1 \left(\frac{N_1}{N}\right) – N_1 \left(\frac{N_1}{N} \right)^2 \\
& = \frac{N_1^2 (1-N)}{N^2}
\end{aligned}$$

By symmetry, $$Var(Y_2) = \frac{N_2^2 (1-N)}{N^2}.$$

I am uncertain of my calculations. Is my approach and answer correct? Any intuitive explanation will be highly appreciated! 🙂

Edit

$$\begin{aligned}
E(Y_1^2) & = E\left(\sum_{i=1}^{N_1} \mathbb{1} \{\mathrm{white} \ \mathrm{marble} \ i \ \mathrm{chosen}\} \sum_{j=1}^{N_1-1} \mathbb{1} \{\mathrm{white} \ \mathrm{marble} \ j \ \mathrm{chosen}\}\right)\\
& = \sum_{i=1}^{N_1} \sum_{j=1}^{N_1-1} \mathbb{P} \{\mathrm{white} \ \mathrm{marble} \ i, \mathrm{white} \ \mathrm{marble} \ j \ \mathrm{chosen}\}\\
& = \sum_{i=1}^{N_1} \sum_{j=1}^{N_1-1} \left(\frac{2}{N}\right) \left(\frac{1}{N-1}\right) \\
& = N_1(N_1-1)\left(\frac{2}{N}\right) \left(\frac{1}{N-1} \right)
\end{aligned}$$

$$\begin{aligned}
\implies Var(Y_1) & = E(Y_1^2) – E(Y_1)^2\\
& = N_1(N_1-1)\left(\frac{2}{N}\right) \left(\frac{1}{N-1}\right) – \left(\frac{2N_1}{N}\right)^2\\
& = \frac{4N_1^2}{N^2(N-1)} – \frac{2N_1(N_1+1)}{N(N-1)}
\end{aligned}$$

By symmetry, $$Var(Y_2) = \frac{4N_2^2}{N^2(N-1)} – \frac{2N_2(N_2+1)}{N(N-1)}.$$

Best Answer

Let's first define some indicator variables: $$R_i=\begin{cases}1\;\;\;\; \text{if $i$th selected ball is red}\\0\;\;\;\; \text{otherwise}\end{cases}$$ We use $B_i$ and $W_i$ for black and white balls respectively.

First we define $Y_1$, $Y_2$ & $Y_3$ based on $W_i$'s, $B_i$'s and $R_i$'s. $$Y_1 = W_1+W_2$$ $$Y_2 = B_1+B_2$$ $$Y_3 = R_1+R_2$$

Now let's find $Var(Y_1)$. We have: $$E(Y_1) = E(W_1)+E(W_2)=P(\text{First ball is white})+P(\text{Second ball is white})$$ $$ = \frac{N_1}{N}+\frac{N_1}{N}=\frac{2N_1}{N}$$ And also: $$E(Y_1^2)=E((W_1+W_2)^2)$$ $$=0*P(\text{None of the balls is white}) + 1*P(\text{one of the balls is white})+ 4*P(\text{both balls are white})$$ $$= \frac{{N_1\choose 1}{N-N_1\choose 1}}{{N \choose 2}}+4\frac{{N_1\choose 2}}{{N \choose 2}}=\frac{2N_1(N-N_1)}{N(N-1)}+\frac{4N_1(N_1-1)}{N(N-1)}$$

Now we have: $$Var(Y_1) = E(Y_1^2)-E^2(Y_1)=\frac{2N_1(N-N_1)}{N(N-1)}+\frac{4N_1(N_1-1)}{N(N-1)}-\frac{4N_1^2}{N^2}$$ $$=\frac{2N_1(N^2-N_1N-2N+2N1)}{N^2(N-1)}$$

Substitute $N_1$ with $N_2$ in the above formula to get $Var(Y_2)$.

Related Question