Statistics – Difference Between Population Variance and Sample Variance

probabilitysamplingstatistical-inferencestatistics

Sorry if this answer is simple but I was wondering why is there a difference between a population variance and a sample variance?

I understand The variance is calculated as:

$$\text{Var} = \frac{1}{N}(x_i-\mu)^2$$

and the sample variance is computed as

$$\text{Var}_s = \frac{1}{N-1}(x_i-\mu)^2$$

In real world data sets would you use the sample variance most of the time? What if the population is also a sample? How would you know? Also does the mean change when you are looking at a population or a sample?

Best Answer

It is probably more understandable to refer to the sample variance as the unbiased estimator of the overall variance. Usually, the set of numbers from which we are estimating these values do not reflect the entire universe of possibilities; they are a sample from which we want to make some inferences. We want to use this sample to estimate the mean and variance not of the sample itself, but of the underlying distribution. Understanding this, and running through the algebra (which can be found here) you see that using the statistic known as the "sample variance" is the statistic, if calculated on a hundred gazillion separate samples from the underlying distribution, whose expected value is the variance of the underlying distribution. When an estimator's expected value is the actual value in which we are interested, we call it "unbiased". Using the statistic known as the"population" variance, if we were to apply it to a huge set of samples and take ITS expectation (mean) it would be slightly lower than the "true" variance.

If, however, you have the entire (finite) population, and not a sample from it, such as the distribution of a six-sided die, then the population variance is the statistic to use, as you are not estimating it from a sample, but calculating it from the complete probability space.