Sampling – Reasons to Consider Sampling Without Replacement

finite-populationsampling

Sampling with replacement has two advantages over sampling without replacement as I see it:

1) You don't need to worry about the finite population correction.

2) There is a chance that elements from the population are drawn multiple times – then you can recycle the measurements and save time.

Of course from an academic POV one has to investigate both methods. But from a practical POV I don't see why one would consider sampling without replacement given the advantages of with replacement.

But I am a beginner in statistics so there might be plenty of good reasons why without replacement might be the superior choice – at least for specific use cases. Please, unconfuse me!

Best Answer

Expanding on the answer of @Scortchi . . .

Suppose the population had 5 members and you have budget to sample 5 individuals. You are interested in the population mean of a variable X, a characteristic of individuals in this population. You could do it your way, and randomly sample with replacement. The variance of the sample mean will be V(X)/5.

On the other hand, suppose you sample the five individuals without replacement. Then, the variance of the sample mean is 0. You've sampled the whole population, each individual exactly once, so there is no distinction between "sample mean" and "population mean." They are the same thing.

In the real world, you should jump for joy each time you have to do the finite population correction because (drumroll . . .) it makes the variance of your estimator go down without you having to collect more data. Almost nothing does this. It's like magic: good magic.

Saying the exact same thing in math (pay attention to the <, and assume sample size is greater than 1): \begin{equation} \textrm{finite sample correction} = \frac{N-n}{N-1} < \frac{N-1}{N-1} = 1 \end{equation}

Correction < 1 means that applying the correction makes the variance go DOWN, 'cause you apply the correction by multiplying it against the variance. Variance DOWN == good.

Moving in the opposite direction, entirely away from math, think about what you are asking. If you want to learn about the population and you can sample 5 people from it, does it seem likely that you will learn more by taking the chance of sampling the same guy 5 times or does it seem more likely that you will learn more by ensuring that you sample 5 different guys?

The real world case is almost the opposite of what you are saying. Almost never do you sample with replacement --- it's only when you are doing special things like bootstrapping. In that case, you are actually trying to screw up the estimator and give it a "too big" variance.