Solved – Why can we assume a single sample is normally distributed

central limit theoremestimationsampling

As I understood it, a sampling distribution of the desired parameter is a distribution of the parameter for all possible samples with a given sample size $n$. The Central Limit Theorem states that every sampling distribution tends to a normal distribution for a large sample size $n$ even if the underlying population or the single samples are not normally distributed. The question on which $n$ is actually sufficiently large is more complicated, but a lot of sources state that $n \ge 30$ is a good rule of thumb in most cases. So far so good.

After stating the above, most authors just proceed with taking one random sample to estimate a certain population parameter and assume it is normally distributed in the calculations. For example the computation of the critical values (z-scores) is based on a normal distribution. But we just said that the underlying population might not be normally distributed and neither single samples.

How can we now assume that the single sample we took is normally distributed? Even if it has sample size $n \ge 30$, we only said $n \ge 30$ is good for the sampling distribution to be normally distributed. Why can we say it is also good for a single sample? I don't get why we can make the seemless transition from the sampling distribution to a single sample.

Here's one example to illustrate my question:

A random sample of 605 plain M&Ms contains 87 red M&Ms. Find a 95% confidence interval for the population proportion of red M&Ms.

$\hat{p} = \frac{87}{605} = 14.4%$ is a point estimate for the population proportion $p$ and the standard error is $\sigma_{\hat{p}} = \sqrt{\frac{\hat{p}\hat{q}}{n}} = 1.4\%$.

The Z-score for 95% is $Z_{\alpha / 2} = 1.96$. (<- need to assume that sample is normally distributed)

Then $E = Z_{\alpha / 2} \cdot \sigma_{\hat{p}} = 2.9\%$.

And therefore $\hat{p} − 0.029 \le p \le \hat{p} + 0.029$ or $0.115 \le p \le 0.173$ with 95% confidence.

Best Answer

You got the sample, and use an estimator to obtain a given property, such as the mean $\hat \mu$. The value of the estimator is a random value itself, and comes from some unknown distribution, called a sampling distribution. What your ">30" rule of thumb says is that this distribution could be approximated by the normal distribution if the sample size is larger than 30 observations. I'm not here to discuss the validity of this rule itself.

So, we're not saying here that a "single sample is normally distributed." In fact I don't even understand what you mean when saying this. We're talking about the sampling distribution of the parameter estimator such as the average $\bar x=\frac 1 n \sum_{i=1}^nx_i$. We're not saying anything about the distribution of $x_i$, because the sample size does not have anything to do with it.

In your case with a proportion there's something else that is going on. The proportion comes from Binomial distribution, which can be approximated by Normal distribution when the sample size is large. I wouldn't apply your rule of thumb here, because it's obtuse in comparison to the estimator of the variance of the proportion that is based on Binomial distribution.

Related Question