[Math] How to the Central Limit Theorem apply to Finite Populations

descriptive statisticsstatistics

In my statistics for beginners course we've just been introduced to the CLT, where it's been stated that a distribution of sample means tends to the normal dist. as your sample size $n$ increases to infinity.

But what if your population is finite (i.e. of size $N$), so that your max sample size can only be of size $N \ll \infty$? Will such a distribution (which must be that of nearly all practical statistical surveys)not follow the CLT?


My best attempt at thoughts on this so far go like this: If I were to take a random sample from my population of size N, each sample though containing just a single member of the pop, calculate and plot the 'mean' of each sample (which would just equal the single value) until I've sampled and plotted every member and done so for each only once, I would eventually of course replicate exactly the population distribution.

Suppose then I repeat the experiment, increasing my sample size each repetition, until my sample is of size $N$. I take a single sample, plot its mean, then by definition this is the same as the population mean $\mu$.

So here, as my sample size has increased, my distribution of sample means hasn't tended to the Normal – with an ever thinner distribution with flatter tails and a taller peak – but more like a hyper-idealised version of the Normal – a single value at the population mean.

Clearly then, for finite populations – if I've understood the idea behind the CLT correctly, which is a big if -the CLT does not apply, rather in these practical cases, their sample mean distribution approaches something approximately Normal? Is it the case then that the CLT is more a theoretical concept, that applies to infinitely large populations, from which samples sizes can tend to infinity?


Further to this, I've read for the CLT to apply, the random variables of your population have to be I.I.D – if I'm using SRS without replacement for a finite population, does that mean the variables aren't I.I.D anymore, and thus the CLT would also not apply because of this? If the population were infinite though and I used SRSWOR, would the r.v.'s then be I.I.D, thus meaning the CLT would apply?

I appreciate all your insight on this; I'm very new to statistics, so I apologise if a lot of this is pretty basic and if my thoughts were way off. Thanks for any help you can lend, really appreciate it.

Best Answer

As the aggregate of my comments on this questions have gotten quite large, I feel it appropriate to collect them into an answer.

The CLT only applies to independent samples so technically you cannot apply the CLT at all to a SRSWOR. However, in practice, you can treat a SRSWOR as a SRS with replacement provided your sample is no more than some fixed fraction of the population, say 10%. In which case, even though your sample is not independent, many introductory statistics courses say that it is “independent enough” to apply the CLT to. It is important to say that this is all an approximation. There may exist a formal justification for this approximation, but I have not seen it. Intuitively, as we take larger and larger samples with SRSWOR, initially the distribution of sample means becomes more and more normal.

However, as our samples grow quite large, as we increase our sample size the sampling distribution will become less normal as the dependence of the samples begins to kick in. Clearly, as you said, as $n \to N$, then the sample mean will become exactly the population mean. It is important to note that in many (but certainly not all) practical applications, even getting even 10% of your population is nigh-impossible, so this issue is not often encountered in practice. If you're interested in this topic and you know how to program, it might be interesting to perform some numerical experiments and explore the sampling distribution for yourself. To my knowledge, however, there is no theoretical justification that 10% (which as far as I know was kind of picked out of a hat) is some magical limit after which the normal convergence stops working. I imagine the "ceiling" where the CLT approximation starts getting worse would be dependent on your sample data and its distribution.

Provided you use a SRS with replacement, then your random sample will be IID so the CLT applies. It doesn't matter whether your population is finite or infinite. If I use SRS with replacement, then each element of the sample can be picked multiple times. So even if I had a population of $N=2$, I could still take a sample of $n=1000$ by simply picking the first element of my population with probability $0.5$ and the second with probability $0.5$. Thus with SRS with replacement, you can take arbitrarily large IID samples so the CLT holds.


As an aside, you refer to a distribution which has probability $1$ of being some constant $C$ as a uniform distribution. It is worth noting that a uniform distribution refers to the much broader class of probability distributions which have an equal probability of being any number between two constants $a$ and $b$. I would refer to the distribution which has probability $1$ of being some constant $C$ as having a constant distribution.

Related Question