Intuition: Why do we divide by $\sqrt{n}$ instead of $n$ in the Central Limit Theorem

central limit theoremprobability theorystatistics

The law of large numbers for example is very straightforward to understand. You sum up the random variables, and divide them by their number. You take the almost sure limit of the arithmetic mean and end up with the expectation. What's not to love?

Now for the Central Limit Theorem (CLT) we consider convergence in distribution so I realize the logic is different. Let me state a simple version for reference:

Let $(X_n)_{n \in \mathbb{N}}$ be a sequence of i.i.d. random variables with $E[X_1]=0$ and >$Var(X_1) = \sigma^2$ where $0< \sigma <\infty$. Then we have that

$$ G_n := \frac{1}{\sqrt{n \sigma^2}} \sum_{k = 1}^{n} X_k \overset{d}{\longrightarrow} G$$

where $G \sim N(0,1)$

I have seen more than one proof for the CLT but they are quite technical. For a long time I brushed this issue aside but it is repeating itself in several other theorems I encountered (e.g. Donsker's theorem for partial sum processes) so by now I really want to understand. Is there an intuition as to where the square root is coming from?

Best Answer

You want $G_n$ to be a normalized random variable. i.e. $E(G_n) = 0$ and $\text{Var}(G_n) = 1$. This is done by translating $\sum_{i = 1}^n X_i$ by $n E(X_i)$ and by dividing by $\sqrt{n \sigma^2}$, since variance is quadratic.

So what you get is

$$ E(G_n) = E(\frac{\sum_{i = 1}^n X_i - n E(X_i)}{\sqrt{n \sigma^2}}) = \frac{n E(X_i) - n E(X_i)}{\sqrt{n \sigma^2}} = 0$$

$$ \text{Var}(G_n) = \text{Var}(\frac{\sum_{i = 1}^n X_i - n E(X_i)}{\sqrt{n \sigma^2}}) = \frac{n \text{Var}(X_i) - 0}{n \sigma^2} = 1$$

If you did not normalize the random variable you really could not expect any kind of convergence. In the case of the law of large numbers, the variances go to $0$ (which is why you expect it to converge to a constant).

$$ \text{Var}(\frac{\sum_{i = 1}^n X_i}{n}) = \frac{ n \text{Var}(X_i)}{n^2} \rightarrow 0$$

More generally, if you want to look for interesting limit distributions the question of "how much" one needs to normalize is very important. If you converge "too little" the sequence will not converge. If you normalize "too much" (as in the law of large numbers) the sequence will converge to something constant (which can still give meaningful information). If you converge "just right" as in the CLT you get a meaningful distribution.

There are more than one meaningful normalization factors though. For example the normalization factor $\frac{1}{\sqrt{2n \ln(\ln(n))}}$ gives the law of iterated logarithm from which one can also extract information.