[Math] Central Limit Theorem when sample size is infinity

central limit theoremlaw-of-large-numbers

I am studying Law of Large numbers, Central limit theorem etc. and one thought is still bugging me.

According to Law of Large Numbers, when we take sample from our distribution X, which size is close to infinity, the sample mean (1/n * Sum(X_i)) is the same as expected value (Sum(k*P[X=k])).

Then according to CLT sample mean of size at least 30 from our distribution X behaves like normal distribution and has the same expected value as the original distribution (or mean).

But doesn't it mean, that if we took sample mean of size lim. -> infinity, we would always get the exptected value? Imagine the graph for the distribution of sample means of size 50 – it should look like normal distribution.

But if we took sample means of size -> infinity, it wouldn't look like normal distr. would it? There should be one spike at the expected value and then nothing. Or is the spike and it's close surrounding shaped like some extreme case of normal distribution?

Best Answer

As I said in the comment, here are the two results:

(Weak) Law of Large Numbers

Let $X_{1},X_{2},\dots,X_{n}$ be i.i.d. random variables with $\mathbb{E}[X_{i}]=\mu$ and $\text{Var }[X_{i}]=\sigma^{2}<\infty$ for all $i=1,\dots,n$. Then, the so-called sample mean $\tfrac{1}{n}\sum_{i=1}^{n}X_{i}=:\bar{X}_{n}$ converges in probability to $\mu$, i.e.:

$$\forall\,\epsilon>0:\hspace{1em}\lim_{n\to\infty}\mathbb{P}\left(\left\vert\frac{1}{n}\sum_{i=1}^{n}X_{i}-\mu\right\vert\geq\epsilon\right)=0$$

Central Limit Theorem

Let $X_{1},X_{2},\dots,X_{n}$ be i.i.d. random variables with $\mathbb{E}[X_{i}]=\mu$ and $\text{Var }[X_{i}]=\sigma^{2}<\infty$ for all $i=1,\dots,n$. Then, the so-called standardized sample mean, i.e.

$$\frac{\frac{1}{n}\sum_{i=1}^{n}X_{i}-\mu}{\sqrt{\text{Var }\left[\frac{1}{n}\sum_{i=1}^{n}X_{i}\right]}}$$

converges in distribution (or in law) to a random variable $Z\sim\mathcal{N}(0,1)$. Denote $\Phi(z):=\mathbb{P}[Z\le z]$ the CDF of $Z$. The CLT means that the following holds:

$$\forall\,x\in\mathbb{R}:\hspace{1em}\lim_{n\to\infty}\mathbb{P}\left(\frac{\frac{1}{n}\sum_{i=1}^{n}X_{i}-\mu}{\sqrt{\text{Var }\left[\frac{1}{n}\sum_{i=1}^{n}X_{i}\right]}}\le x\right)=\Phi(x)$$

Indeed, saying that "a sequence of random variables $Y_{1},\dots,Y_{n}$ with respective CDF $F_{1},\dots,F_{n}$ converges in distribution to some random variable with CDF $F$" means that the sequence of CDF $\{F_{n}\}_n{}$ converges pointwise to $F$, i.e.

$$\forall x\in\mathbb{R}:\hspace{1em} \lim_{n\to\infty} F_{n}(x) = F(x)$$ Be aware that, in the case of the CLT, we have $Y_{k}=\tfrac{1}{k}\sum_{i=1}^{k}X_{i}$

The big difference is that the CLT takes into account the dispersion of the sample mean, which depends on the size of the sample: in some sense, it rescales the data by dividing them by the variance of the sample mean. Intuitively:

$$\bar{X}_{n}-\mu\overset{P}{\longrightarrow} 0 \tag{LLN}$$

but, taking the scaling/the size into account:

$$\color{red}{\sqrt{n}}(\bar{X}_{n}-\mu)\overset{\mathcal{D}}{\longrightarrow}\mathcal{N}(0,1)\tag{CLT}$$

How can we interpret these results? The CLT says that the sample mean behaves like a normal distribution, i.e. gives, in some sense, a distribution of the error caused by using the LLN for a sample of size $n$. The LLN says that if you run enough experiences, the probability that the sample mean is not close to the theoric mean (close means the difference between them is less than $\epsilon$) is close to $0$.

See also this question.

Informally

For example, suppose that you have repeated the same experience $n$ times, independently. You have $X_{1},\dots,X_{n}$ as the results. Thanks to the LLN, you know that $\bar{X}_{n}$ is closed to $\mu$ with a "good" probability if $n$ is very big. This is conforted by the variance, which decreases as $n$ does:

$$\text{Var}[\bar{X}_{n}]=\frac{1}{n^{2}}\text{Var}\left[\sum_{i=1}^{n}X_{i}\right]=\frac{1}{n^{2}}\left(n\text{Var}[X_{1}]\right)=\frac{1}{n}\sigma^{2}$$

But now, suppose you want to know the probability that your empiric mean is close to $\mu$:

$$\mathbb{P}\left(\left\vert\bar{X}_{n}-\mu\right\vert\le\epsilon\right)=\mathbb{P}\left(-\epsilon\le\bar{X}_{n}-\mu\le\epsilon\right)\tag{1}$$

In order to apply the CLT, note that:

$$\frac{\frac{1}{n}\sum_{i=1}^{n}X_{i}-\mu}{\sqrt{\text{Var }\left[\frac{1}{n}\sum_{i=1}^{n}X_{i}\right]}}=\frac{\bar{X}_{n}-\mu}{\sqrt{\frac{\sigma^{2}}{n}}}=\frac{\bar{X}_{n}-\mu}{\frac{\sigma}{\sqrt{n}}}$$

Hence, we can rewrite $(1)$ as follows:

\begin{align*} \mathbb{P}\left(\left\vert\bar{X}_{n}-\mu\right\vert\le\epsilon\right) &=\mathbb{P}\left(\left\vert\frac{\bar{X}_{n}-\mu}{\frac{\sigma}{\sqrt{n}}}\right\vert\le\frac{\epsilon}{\frac{\sigma}{\sqrt{n}}}\right)\\ &=\mathbb{P}\left(-\frac{\epsilon}{\frac{\sigma}{\sqrt{n}}}\le\frac{\bar{X}_{n}-\mu}{\frac{\sigma}{\sqrt{n}}}\le\frac{\epsilon}{\frac{\sigma}{\sqrt{n}}}\right)\\ &=\mathbb{P}\left(-\frac{\epsilon\sqrt{n}}{\sigma}\le\frac{\bar{X}_{n}-\mu}{\frac{\sigma}{\sqrt{n}}}\le\frac{\epsilon\sqrt{n}}{\sigma}\right)\\ &\approx \mathbb{P}\left(-\frac{\epsilon\sqrt{n}}{\sigma}\le Z\le\frac{\epsilon\sqrt{n}}{\sigma}\right)\tag{where $Z\sim\mathcal{N}(0,1)$} \end{align*}

This approximation is exactly using the CLT. Now, using the fact that the density of the Normal is symmetric w.r.t. the origin, we get:

\begin{align*} \mathbb{P}\left(\left\vert\bar{X}_{n}-\mu\right\vert\le\epsilon\right) &\approx \mathbb{P}\left(-\frac{\epsilon\sqrt{n}}{\sigma}\le Z\le\frac{\epsilon\sqrt{n}}{\sigma}\right)\\ &=2\mathbb{P}\left(0\le Z\le\frac{\epsilon\sqrt{n}}{\sigma}\right)\\ &=2\Phi\left(\frac{\epsilon\sqrt{n}}{\sigma}\right)-1 \end{align*}

Intuitively if you now let $n$ go to infinity, $\Phi\left(\frac{\epsilon\sqrt{n}}{\sigma}\right)$ goes to $1$ since a CDF $F$ verifies $\lim_{x\to\infty}F(x)=1$, which gives you

$$\mathbb{P}\left(\left\vert\bar{X}_{n}-\mu\right\vert\le\epsilon\right)\approx 1$$

when $n$ is big and it holds for any $\epsilon >0$. Be aware that this "shows" intuitive thoughts: for $n$ fixed, the greater the dispersion $\sigma$, the fewer probability has the sample mean to be close to $\mu$; also, for $n$ fixed, the smaller $\epsilon$ is, the fewer this probability.

Though it is an informal argument, it gives an insight on the link between the CLT and the LLN.

Related Question