Central Limit Theorem and Law of Large Numbers for Non-Constant “N”

central limit theoremlaw-of-large-numbersprobabilitystatistics

This is a question I have been having for a while.

Usually, we define the Central Limit Theorem as:

Let $X_1, X_2, \dots, X_n$ be a random sample of size $n$ from a population with mean $\mu$ and finite variance $\sigma^2$. As $n$ approaches infinity, the distribution of the sample mean $\overline{X}$ converges to a normal distribution with mean $\mu$ and variance $\frac{\sigma^2}{n}$, i.e.,

$$\sqrt{n} \left( \frac{\overline{X} – \mu}{\sigma} \right) \xrightarrow{d} N(0,1)$$

Similarly, we can also define the Law of Large Numbers as:

Let $X_1, X_2, \dots, X_n$ be a sequence of independent and identically distributed random variables with finite mean $\mu$. The Law of Large Numbers states that as the sample size $n$ increases, the sample mean $\overline{X}$ converges in probability to the population mean $\mu$, i.e.,

$$\overline{X} = \frac{1}{n} \sum_{i=1}^n X_i \xrightarrow{p} \mu$$

Given this definition, I thought of the following example:

  • Suppose there are fish in a river
  • Naturally, fish have the ability to enter and exit the river (e.g. birth , death, migration)
  • We are interested in estimating the average mercury level in the average fish
  • Suppose we take a random sample of fish from this river and measure the mercury level of each fish in the sample

In most introductory mathematics textbooks, we would use both the Central Limit Theorem and Law of Large Numbers to argue that : As we measure the mercury level of more and more fish – our average sample measurement would better and better reflect the true average mercury measurement of the population. However, it seems that there is an implicit assumption that the underlying fish population of the river is constant.

This brings me to my question: Are there any variations of the Central Limit Theorem and Law of Large Numbers that can be applied in situations where the population size is not-constant? Or is this actually irrelevant? (i.e. The results of both the Central Limit Theorem and Law of Large Numbers are still valid when the population size is non-constant provided that there is a large enough sample size?)

Thanks!

Best Answer

You are correct that the classical CLT assumes iid random variates with finite mean and variance. The typical SLLN also requires iid, or at least each variable needs to have constant mean and finite variance.

There are generalized versions of each that allow you to deal with certain types of "nonstationary" populations.

For both, we will use the following sequence of independent random variables:

$$E[X_i]=\mu_i, V[X_i]=\sigma_i^2,\;\;i\in \mathbb{N}$$

And define the following quantities:

$$S_n = \sum_{i=1}^n X_i,\;m_n := \sum_{i=1}^n \mu_i,\;s^2_n := \sum_{i=1}^n \sigma_i^2,\;\;s_n = \sqrt{s_n^2} $$

Lindeberg-Feller CLT

There is a CLT where your variables only need to be independent but not identically distributed. Then, if the following condition is met:

Lindeberg Condition

Let $$L_{\epsilon}(n) := \frac{1}{s_n^2}\sum_{i=1}^n E[(X_i-\mu_i)^2\cdot 1_{|X_i-\mu_i|>\epsilon \sigma s_n}]$$

Then the condition is $$\forall \epsilon > 0\;\; \lim_{n\to \infty} L_{\epsilon}(n) = 0$$

We can say that:

$$Z_n := \frac{S_n - m_n}{s_n} \xrightarrow{d} N(0,1)$$

What the Lindeberg condition is basically saying is that the contribution of each $X_i$ to the overall variability of $Z_n$ approaches zero as you add more terms.

General SLLN

A similar style theorem holds for almost sure convergence of the mean:

If

$$\left|\sum_{i=1}^{\infty} \frac{\sigma_i^n}{i^2}\right|\leq \infty$$

Then we have (more or less -- see link)

$$P\left(\lim_{n\to\infty}\frac{|S_n-m_n|}{n} = 0\right) = 1$$

Related Question