[Math] Deriving the variance of the sample mean $\mathrm{Var} (\bar{y})=\frac{1}{n}(1-\frac{n}{N})S^2$

covariancesamplingstatistics

For a population of size $N$ with a simple random sample size $n$ derive the formula

$$\mathrm{Var}(\bar{y})=\frac{1}{n}\left(1-\frac{n}{N}\right)S^2$$

where $S^2$ is the population variance.

Hint: Define the individual inclusion indicators and express the sample mean in terms of these indicators.

My attempt so far has:

$$\bar{y} = \frac{1}{n}\sum_{i=1}^{n}y_i
= \frac{1}{n}\sum_{i=1}^{N}y_i I_i$$
where $I_i =1$ if $y_i$ in sample and $I_i=0$ otherwise.

Then $$\mathrm{Var}(\bar{y})=\mathrm{Cov}(\bar{y},\bar{y}) = \mathrm{Cov}\left(\frac{1}{n}\sum_{i=1}^{N}y_iI_i, \frac{1}{n}\sum_{j=1}^{N}y_jI_j\right).$$

We expand and eventually get $$\mathrm{Var}(\bar{y})=\frac{1}{n^2}\left(\sum_{i=1}^{N}\sum_{j=1}^{N}y_iy_j\right)\mathrm{E}\left(I_iI_j-\mathrm{E}(I_i)I_j-I_i \mathrm{E}(I_j)+\mathrm{E}(I_i) \mathrm{E}(I_j) \right).$$

And now I'm unsure of where to go. Thanks.

Best Answer

Writing the expansion a little differently, we have $$ \mathrm{Var}(\bar{y})=\frac{1}{n^2}\left[\sum_{i=1}^ny_i^2\mathrm{Var}(I_i)+\sum_{i=1}^n\sum_{j=1,j\neq i}^ny_iy_j\mathrm{Cov}(I_i,I_j)\right]. $$ Now, the number of samples containing both $i$ and $j$ is $\binom{N-2}{n-2}$ and hence the probability of including both $i$ and $j$ in the sample is (important that $i\neq j$) $$ {\rm E}[I_iI_j]=P(I_i=1,I_j=1)=\frac{\binom{N-2}{n-2}}{\binom{N}{n}}\frac{n(n-1)}{N(N-1)}. $$ Thus $$ \mathrm{Cov}(I_i,I_j)={\rm E}[I_iI_j]-{\rm E}[I_i]{\rm E}[I_j]=\frac{n(n-1)}{N(N-1)}-\left(\frac{n}{N}\right)^2=-\frac{n(1-n/N)}{N(N-1)} $$ and we conclude that $$ \begin{align} \mathrm{Var}(\bar{y})&=\frac{1}{n^2}\left[\sum_{i=1}^n y_i^2\frac{n}{N}\left(1-\frac{n}{N}\right)-\sum_{i=1}^n\sum_{j\neq i}y_iy_j\frac{n(1-n/N)}{N(N-1)}\right]\\ &=\frac{1}{n}\left(1-\frac{n}{N}\right)\frac{1}{N}\left[\sum_{i=1}^n y_i^2-\frac{1}{N-1}\sum_{i=1}^n\sum_{j\neq i}y_iy_j\right]\\ &=\frac{1}{n}\left(1-\frac{n}{N}\right)\frac{1}{N-1}\sum_{i=1}^n (y_i-\mu)^2\\ &=\left(1-\frac{n}{N}\right)\frac{S^2}{n}, \end{align} $$ where $\mu=\frac{1}{N}\sum\limits_{i=1}^N y_i$ is the population mean.

Related Question