Prove that using the Sample Standard Deviation formula is more accurate than the population standard population formula on a sample dataset

pythonstandard deviation

So, for an assignment for a Python class at college I have to demonstrate that the Sample Standard Deviation formula is more accurate than the population standard population formula on a sample data Set.

So the full original data Set is an array of numbers 5,7,8,3,10,21,4,13,1,0,0,9,17

I get a random sub set of those so, for example, it could be [5, 21, 7]

Using the population formula on that set gets a standard deviation of: 7.118052168020874

Using the sample formula on that set gets a standard deviation of: 8.717797887081348

So does this demonstrate the sample formula being more accurate? I feel like you need to know what the answer is beforehand to know which is more accurate for calculating it.

Best Answer

The sample formula of the st. dev on $[5;21;7]$ is not the value you showed but it is $\approx 8.72$

The sample formula is more accurate because $S^2$ is an unbiased estimator of $\sigma^2$ (unbiased estimator of the population variance)

Proof:

$$\mathbb{E}[S^2]=\mathbb{E}\Bigg[\frac{1}{n-1}\Sigma_i(X_i-\overline{X}_n)^2\Bigg]=\frac{1}{n-1}\mathbb{E}[\Sigma_i(X_i-\mu)^2-n(\overline{X}_n-\mu)^2]=$$

$$=\frac{1}{n-1} \left[\sum_i\mathbb{E}(X_i-\mu)^2-n\mathbb{E}(\overline{X}_n-\mu)^2\right] = \frac{1}{n-1}\Bigg(n\sigma^2-n\frac{\sigma^2}{n} \Bigg) = \sigma^2$$

Related Question