[Math] what the central limit theorem says

probabilityprobability distributionsprobability theorystatistics

Asked what the central limit theorem says, a student replies, "as you take larger
and larger samples from a population, the histogram of the sample values looks more and
more Normal". Is the student right? Explain your answer.

My answer is no, the student is wrong. My explanation is the histogram of the sample values will look like the population distribution, whatever it might happen to be. The central limit theorem says that the histogram of sample means (from many large samples) will look more and more Normal.

Am I right about it? It is that simple? Is there anything more I can say about this?

Best Answer

Maybe this helps:

Take an appropriate random variable (finite second moment). Let's say that $\frac{S_n}{n}$ is the empirical mean of the random variable, and $\mu$ the theoretical mean. In this setting, $$\displaystyle \frac{S_n}{n} - \mu$$ is the deviation of the empirical mean from the theoretical one.

What the CLT says is: with an appropriate scaling, the deviations are normally distributed, i.e.

$$\mathbb P\left( \frac{\sqrt{n}}{\sigma}\left(\frac{S_n}{n} - \mu\right) \leq x \right) \to \phi(x) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x \exp(-\frac{y^2}{2}) dy.$$

Quoting Frank den Hollander (Large Deviations, AMS): "CLT quantifies the probability that $S_n$ differs from $\mu n$ by an amount of order $\sqrt{n}$. Deviations of this size are called "normal". [...] [Deviations of size $n$] are called "large"."

An equivalent formulation of the result above is: $$\frac{S_n}{n} \sim \mathcal N(\mu, \frac{\sigma^2}{n}),$$ so, I would say that you are right.

Related Question