[Math] Standard Deviation: Why divide by $(N-1)$ rather than $N$

standard deviationstatistics

The forumlae for standard deviation seems to be the square root of the sum of the squared deviation from mean divided by $N-1$.

Why isn't it simply the square root of the mean of the squared deviation from mean? i.e, divided by $N$.

Why is it divided by $N-1$ rather than $N$?

Best Answer

If you have n samples, the variance is defined as: $$s^2=\frac{\sum_{i=1}^n (X_i-m)^2}{n}$$ where $m$ is the average of the distribution. In order to have an estimator non $biased$ you have to have: $$E(s^2)=\sigma^2$$ where $\sigma$ is the real unknown value of the variance. It's possible to show that $$E(s^2)=E\left(\frac{\sum_{i=1}^n (X_i-m)^2}{n} \right)=\frac{n}{n-1}\sigma^2$$ So if you want to estimate the 'real' value of $\sigma^2$ you must divide by $n-1$