[Math] Standard deviation of the mean of sample data

standard deviationstatistics

I can't quite understand what this formula means:

$$\sigma_{\overline{x}}=\frac{\sigma}{\sqrt n}$$

I know what standard deviation $\sigma$ is – it's the average distance of my data points (samples) from the mean. But this part is confusing:

For example, suppose the random variable $X$ records a randomly selected
student's score on a national test, where the population distribution
for the score is normal with mean $70$ and standard deviation $5$
($N(70,5)$). Given a simple random sample (SRS) of $200$ students, the
distribution of the sample mean score has mean $70$ and standard
deviation $$\frac{5}{\sqrt{200}} \approx \frac{5}{14.14} \approx 0.35$$

Source

I thought the standard deviation $\sigma = 5$ means that if I take the scores of all students and calculate the mean, then the average distance of a score from that mean will be equal to $5$. The set of all scores is called the 'population', right? But here it says the more students' scores I take, the lower the standard deviation – thus the closer the number of samples gets to the size of population, the lower the standard deviation (and its get further from $5$).

Best Answer

First, the standard deviation is not the average distance to the mean, that is always zero. It is however, a value to measure how far the points are from the mean or not. Assuming the values are normally distributed, we know that ~68% of the values are between $\mu-\sigma$ and $\mu+\sigma$, for example.

Suppose we weigh potatoes with average weight 100 g and stadard deviation 5 g. What does hold for the average of the average weight of a group of 4 potatoes? I hope you see that the average of the average weight is still 100 g. But what is the standard deviation of this average weight? That is where you use the formula

$$\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{5}{\sqrt{4}} =2.5$$

Feel free to ask if you still don't understand.


Proof that the average distance between the actual data and the mean is $0$: $$\frac{\sum^n_{i=1} (x_i-\mu)}{n} = \frac{(\sum^n_{i=1} x_i)-\mu n}{n} = \frac{\sum^n_{i=1} x_i}{n}-\mu = \mu - \mu = 0$$
Related Question