Statistical Sampling – Can the Sample Equal the Population?

I came across this test question from an introductory statistics course for undergraduates in biology. The solutions are in square brackets.

Which cases are possible?

The sample is larger than the population. [False]
The sample equals the population. [True]
The sample is smaller than the population. [True]
The sample is an empty set. [False]

I can't wrap my head around the solutions.

If we define the sample as a subset of the population, then 1. is false, and 2. and 3. are true indeed. But also 4. should be true because the empty set is a subset of any set.

However, if we define the sample as a proper subset of the population, then 1. and 2. are false, and 3. and 4. are true, assuming that the population is nonempty.

Quibbling over the empty set might be too pedantic. The main focus of my question is case 2., and my intuition suggests that it should be false. There is no sampling involved in the literal sense if one can examine the whole population, isn't there?

Additionally, I suspect that biologists may tend to inadvertently associate the word population with the concept of biological populations. And in principle, it's possible to examine every individual of a biological population. I'm also a biologist, but instead of biologists, statisticians have taught me statistics. And my recollection is that the concept of statistical populations is much more abstract. I'm not even sure whether it is meaningful to say something like examining every element of a statistical population.
I remember a remark from one of my teachers. In response to a nontrivial question (which has escaped my mind), they said something along the lines of "Well, we usually don't confess this at introductory courses but let me tell you: the statistical population doesn't really exist." Unfortunately, their explanation was over my head, so I can't recall it.

So does it make sense to say that the sample can equal the population, or it does not? And if not, then how to conceive statistical populations? References to relevant literature are much appreciated.

Best Answer

It can realistically be the case that you end up with the whole population. There's a question then as to whether you call it a sample, but I think it's reasonable to call it a sample (at least) when you didn't know in advance you would get the whole population.

Suppose you have a plan for spending a week sampling some area. You don't know in advance how much you'll get done in that time -- it might depend on weather or if you're sampling things that move around, it might depend on where they move. You could find that you get to all the streams or all the tall trees or the whole known population of takahē in the area. Your plan didn't call for (necessarily) getting the whole population, but your sample ended with the whole population in it. In that setting I think it would be very natural to still call what you have a sample.

Given that possibility, you could imagine there might be other reasons to call the data you have a sample even if it is the whole population. So answer 2 is not wildly unreasonable. It is, however, just a choice about the precise meaning of the word "sample". I would also argue that it's reasonable to define 'sample' either so that 4 is true, or so that 4 is untrue. I would not consider it reasonable to define 'sample' so that 1 is true. In a test, the correct answers would depend on precisely how the course had defined 'sample', and that's why I don't really like this sort of question.

Best Answer

Related Solutions

Sample Size – Are Tables for the t Statistic Wrong When Population Size Is Known and Fixed?

Related Question