[Math] Why is there a square-root in the central limit theorem

central limit theoremprobability theory

I'm having trouble understanding the derivation of the central limit theorem. From my understanding it says

$$\DeclareMathOperator{\Var}{\mathrm{Var}}
\overline{X_n} \rightarrow N\left(EX_1, \frac{\Var(X_1)}{n}\right)$$

My understanding of this is that since when $n \rightarrow \infty$ the sampling mean $\overline{X_n} = EX_1$, the above basically ensures that the variance of the normal distribution is zero, and thus the distribution is just 100% probability at $EX_1$.

Now another form I've seen is

$$\sqrt{n}\frac{(\overline{X_n} – EX_1)}{\sqrt{\Var(X_1)}} = \frac{(\overline{X_n} – EX_1)}{\sqrt{\Var(\overline{X_n})}} \rightarrow N(0,1)$$

Now I understand that subtracting $EX_1$ changes $\mu$ to $0$ in the normal distribution, but I don't understand how we got the $\sqrt{\Var(\overline{X_n})}$. In the first example as $n \rightarrow \infty$ we're getting something completely different than $N(0, 1)$. At the same time, in the second example we somehow magically got a square root in the denominator. Wouldn't just dividing by $\Var(X_1)$ give a variance of 1?

Sorry if this is confusing, but I've found so many sources that all say something a little bit different, or in a different way, and almost none of them give any kind of explanation.

Best Answer

The goal is to get a standardized normal random variable $Z$, which has expectation (ensemble mean) $0$ and standard deviation $1$, by definition of N(0,1).

Starting from the sample mean for $n$ samples $$\overline{X} = (X_1 + X_2 + \ldots + X_n)/n,$$ it means that we must:

a) first subtract the ensemble mean of this sample mean (called centering), which is $E(\overline{X})$. That is, let $n \rightarrow \infty$, then $\overline{X}$ will approach E$(X)$. This gives indeed zero: $E[\overline{X}-E(\overline{X})]=E(\overline{X}) - E[E(\overline{X})] = E(\overline{X}) - E(\overline{X}) = 0$. So that is the numerator sorted out;

b) then divide this difference by the standard deviation, $s$, of this difference $\overline{X}-E(\overline{X}),$ not by anything else, by definition of standardization. Expressed as a formula, $$ Z = \frac{\overline{X} - \mu_{\overline{X}}}{s_{\overline{X}-\mu_{X}}}$$ where $\mu_{\overline{X}} \equiv E(\overline{X})$. Now, $E(\overline{X})$ is a fixed number hence has zero variance, so that $s_{\overline{X}-\mu_{\overline{X}}} = s_{\overline{X}}$. That leaves the standard deviation of $\overline{X}$ to be calculated, as the square root of its variance. For $n$ independent samples of $X$ drawn, their variances add up, so that $$ var(\overline{X}) = var[(X_1+X_2+\ldots+X_n)/n] = (1/n^2) var(X_1+\ldots+X_n) = (1/n^2) [var(X_1)+var(X_2)+\ldots+var(X_n)] = (1/n^2) [n \cdot var(X)].$$ The penultimate equality arises because all samples are independent and come from the same population $X$, so their uncertainty is equal for each draw and their is no dependence (correlation) between draws. Therefore, the standard deviation of $\overline{X}$ (denominator for $Z$) is $$s_{\overline{X}} = \sqrt{var(X)/n} = s_X/\sqrt{n}.$$

NB: your notation is a bit imprecise, in my opinion: one writes $X$ for the universal random variable, then $X_1$, $X_2$, etc. for the random variables (before performing any draw) of the first draw, second draw, etc. So in your formulas, perhaps you should replace all $X_n$ and $X_1$ simply by $X$ to avoid confusion.