Solved – Relationship between standard error of the mean and standard deviation

rstandard deviationstandard error

Consider the following question.

The respiratory disturbance index (RDI), a measure of sleep disturbance, for a specific population has a mean of 15 (sleep events per hour) and a standard deviation of 10. They are not normally distributed. Give your best estimate of the probability that a sample mean RDI of 100 people is between 14 and 16 events per hour?

The answer that I have been given confuses me as follows.

The standard error of the mean is 10/√100 = 1. Thus between 14 and 16 is with one standard deviation of the mean of the distribution of the sample mean. Thus it should be about 68%.

What confuses me is the term standard error of the mean. For the answer to be correct the term in the following R code is the standard deviation

pnorm(16, mean = 15, sd = 1) - pnorm(14, mean = 15, sd = 1) 
## [1] 0.6827

1) Why is standard deviation not 10 as described in the original problem?

2) The answer gives the standard error of the mean as 1, yet this value is termed standard deviation in the R code (sd=1). Why is that ?

3) The samples are 'not normally disributed'. Don't they have to be to use pnorm ?

thanks

Best Answer

The samples are 'not normally disributed'. Don't they have to be to use pnorm?

The question is asking about the distribution of sample means, not the distribution of the original variable.

Under mild conditions, sample means will tend to be closer to normally distributed than the original variable was. See what happens when we sample from a distribution of counts (representing number of disturbances), which has population mean 15 and sd 10:

histogram of distribution and histogram of means for sample size 100

(many elementary books attribute this tendency to the central limit theorem, though the central limit theorem doesn't tell us what will happen with small samples; nonetheless this is a real effect -- I'd argue it's better attributed to the Berry-Esseen inequality)

What confuses me is the term standard error of the mean.

The term means "the standard deviation of the distribution of sample means". See the histogram on the right above -- its standard deviation is consistent with 1 (for this large sample - 30000 values from the distribution of sample means - we got a standard deviation of just under 1.01).

We see that the distribution of the sample means -- while not actually normal -- is quite close to normal in this case; using a normal distribution with mean 15 and standard deviation 1 as an approximation of the distribution of means (of samples of 100 observations from the original quite skewed distribution) will work quite well in this case.

While $n=100$ was plenty large enough to treat the sample mean as approximately normal in the situation I simulated, it's not true for every distribution -- in some cases, even when the distribution of sample means will still be well approximated by a normal distribution in large samples -- you may need $n$ to be a great deal larger than 100 for it work well; we don't know the population distribution here, so we don't know for sure that $n=100$ would be sufficient (it was for the example distribution I used, which you can see it at least moderately skewed); that n=100 is large enough to approximate as normal in this case is an assumption.

1) Why is standard deviation not 10 as described in the original problem?

Because the distribution of sample means has a smaller standard deviation than the original variable that you took means of. This is why you divide the original standard deviation by $\sqrt{n}$ -- because that then gives the standard deviation of the distribution of means from samples of size $n$.

2) The answer gives the standard error of the mean as 1, yet this value is termed standard deviation in the R code (sd=1). Why is that ?

It's the standard deviation of the distribution of means (and the call to pnorm is because we're using the normal distribution to approximate the distribution of sample means).

Related Question