Standard Error – Standard Error of Sampling Distribution of the Mean

finite-populationstandard error

I found an equation that says the standard error of the sampling distribution of the mean is:

$$\sigma_{\bar{X}} = \sigma \cdot \sqrt{\frac{1}{n}-\frac{1}{N}}$$

And when the population size is very large, the factor $1/N$ is approximately equal to zero; and the standard deviation formula reduces to:

$$\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$$

However, I did not get how he found the first equation, may someone explain it?

I found this equation here: http://stattrek.com/sampling/sampling-distribution.aspx

I could explain the second equation in that way:
\begin{equation}
\begin{split}
Var(\bar{X}) = Var\left(\frac{\sum_{i=1}^{n} X_i}{n}\right) \\
= \left(\frac{1}{n^2}\right)Var\left(\sum_{i=1}^{n} X_i\right) \\
= \left(\frac{1}{n^2}\right)\left(\sum_{i=1}^{n} Var(X_i)\right) \\
= \left(\frac{1}{n^2}\right)\left(\sum_{i=1}^{n}\sigma^2 \right) \\
= \left(\frac{1}{n^2}\right)\left(n\sigma^2\right) \\ = \frac{\sigma^2}{n}
\end{split}
\end{equation}

However I don't know from where appeared the first equation.

Best Answer

The quoted formula is not quite right. Let's derive the correct one.

Since the population mean (or any other constant) may be subtracted from every value in a population $S$ without changing the variance of the population or of any sample thereof, we might as well assume the population mean is zero. Letting the values in the population be $\{x_i\, \vert\, i\in S\}$, this implies

$$0 = \sum_{i\in S} x_i.$$

Squaring both sides maintains the equality, giving

$$0 = \sum_{i,j\in S}x_ix_j = \sum_{i\in S}x_i^2 + \sum_{i \ne j \in S} x_ix_j,$$

whence

$$\sum_{i\ne j \in S} x_ix_j = -\sum_{i\in S} x_i^2.$$

This key result will be employed later.

Let $S$ have $N$ elements. Because its mean is zero, its variance is the average squared value:

$$s^2 = \frac{1}{N}\sum_{i\in S}x_i^2.$$

(Please note that there can be no dispute about the denominator of $N$; in particular, it definitely is not $N-1$: this is a population variance, not an estimator.)

To find the variance of the sample distribution of the mean, consider all possible $n$-element samples. Each corresponds to an $n$-subset $A\subset S$ and has mean

$$\frac{1}{n}\sum_{i\in A} x_i.$$

Since the mean of all the sample means equals the mean of $S$, which is zero, the variance of these $\binom{N}{n}$ sample means is the average of their squares:

$$s_n^2 = \frac{1}{\binom{N}{n}} \sum_{A\subset S}\left(\frac{1}{n}\sum_{i\in A}x_i\right)^2 = \frac{1}{n^2\binom{N}{n}} \sum_{A\subset S}\sum_{i,j\in A}x_ix_j \\= \frac{1}{n^2\binom{N}{n}} \sum_{A\subset S}\left(\sum_{i\in A}x_i^2 + \sum_{i\ne j\in A}x_ix_j\right) .$$

(Once again, $\binom{N}{n}$, not $\binom{N}{n}-1$, is the correct denominator: this is the variance of a collection of $\binom{N}{n}$ numbers, not an estimator of anything.)

Fix, for a moment, any particular index $i$. The value $x_i$ will appear in $\binom{N-1}{n-1}$ samples, because each such sample supplements $x_i$ with $n-1$ more elements of $S$ out of the $N-1$ remaining elements (sampling is without replacement, remember). Its contribution to the right hand side therefore equals $\binom{N-1}{n-1}x_i^2$.

Also fixing an index $j\ne i$, similar reasoning shows the product $x_ix_j$ appears in $\binom{N-2}{n-2}$ samples, thereby contributing $\binom{N-1}{n-1}x_ix_j$ to the right hand side. Therefore, upon summing over all such $i$ and $j$ in $S$,

$$s_n^2 = \frac{1}{n^2\binom{N}{n}} \left(\binom{N-1}{n-1}\sum_{i\in S}x_i^2 + \binom{N-2}{n-2}\sum_{i\ne j\in S}x_ix_j\right).$$

Plug the first result into that last sum:

$$s_n^2 = \frac{1}{n^2\binom{N}{n}} \left(\binom{N-1}{n-1}\sum_{i\in S}x_i^2 + \binom{N-2}{n-2}\left(-\sum_{i\in S}x_i^2\right)\right).$$

It is now straightforward to relate this to the variance of $S$, because $\sum_{i\in S}x_i^2 = Ns^2$:

$$s_n^2 = \frac{1}{n^2\binom{N}{n}} \left(\binom{N-1}{n-1} - \binom{N-2}{n-2}\right)\left(Ns^2\right) = \frac{s^2}{n}\left(1 - \frac{n-1}{N-1}\right).$$

Thus the sampling variance for sampling with replacment, $\frac{s^2}{n}$, is multiplied by $1 - \frac{n-1}{N-1}$ to obtain the sampling variance for sampling without replacement, $s_n^2$. Accordingly, the multiplicative adjustment for the sampling standard deviation is its square root, $\sqrt{1- \frac{n-1}{N-1}}$. This differs from the quoted formula, which uses $\sqrt{1 - \frac{n}{N}}$.

Two simple checks can give us some comfort concerning the correctness of this result. First, the sample variance of means of samples of size $n=1$, $s_1^2$, obviously equals the population variance $s^2$. The correct formula states

$$s_1^2 = \frac{s^2}{1}\left(1 - \frac{1-1}{N-1}\right) = s^2,$$

as it should. Unfortunately, the quoted formula asserts that $s_1^2 = s^2(\frac{1}{1} - \frac{1}{N})$ which obviously cannot be right. Second, the sample variance of the means of samples of size $n=N$ is zero, because there is no variation, and indeed both formulas give $0$ in this case.

Best Answer

Related Solutions

Solved – Proof for the standard error of parameters in linear regression

Solved – Why does the standard deviation of the sampling distribution of the sample mean needs N >= 20n

Related Question