Solved – Population Variance and Sample Variance

populationsamplestandard deviationvariance

How is it that lowering the number of frequence,$n$ by $1$ in the for formula for Population and Sample Variance account for the discrepency of using sample rather than population.

I mean how is dividing the whole expression by $n-1$ in sample variance better than dividing by $n$?

I am referring to the formula $$s^2=\frac{\sum_{i=1}^{n}({x_i-x_{avg}})^2}{n-1}$$ more accurate than $$s^2=\frac{\sum_{i=1}^{n}({x_i-x_{avg}})^2}{n}$$

Thank You.

Best Answer

If you use $$ s^2 = \frac{\sum_i^n (x_i - \bar{x})^2}{n} $$ as an estimate, based on a sample of size $n$, of the population variance then your estimate results to be biased. The formula for the bias however shows that $$ \tilde{s}^2 := \frac{n}{n-1} s^2 $$ is unbiased.

I just came across this pdf where the formula for the bias is derived.

With a sample of size $n$, the usual practice is then to use $$ \tilde{s}^2 = \frac{\sum_i^n (x_i - \bar{x})^2}{n-1} $$ as an estimate of the population variance.

If, on the other hand, the $x_i$'s form the whole population, then there is no discussion about bias or anything, and we just apply the definition of the variance.

Related Question