Sampling Techniques – Justifying the Use of Finite Population Correction

finite-populationsampling

Given the fact the finite population correction effectively reduces the standard error of the sampling distribution, which (compared with the lack of a finite population correction) will increase the test statistic from a hypothesis test (and therefore increase the probability that the null will be rejected), it seems quite a powerful tool to use without a strong justification.

Here is one scenario I feel it is justified to use it:

[Edit 2 – I have changed the scenario here slightly following a very valid point from Steve. I don't want non-response rates to distract from the focus of the finite population correction].

I have a company which has new leadership in the past 12 months. During the old leadership, I performed a random sample on 50% of the employees (with a 100% response rate). Under new leadership, I conducted the same survey on a random sample of 45% of the employees (again, with a 100% response rate)

If I want to see how the results to some questions in the survey have changed in a statistically significant way, I should apply the finite population correction. I am comparing two sets of employees at two specific points in time. I don't care about anyone other than the population working within the company at those two points in time.

[Edit 2 – To clarify my concern]

In addition, those who were not working in the firm at those two points in time would not be able to answer the survey anyway. Taking a basic question like 'I have enjoyed working here in the last couple of months,' only those who have actually worked at the firm could reasonably answer. In which case, assuming an infinite population seems like it is applying an arbitrary restriction on the sampling distribution.

Here is one scenario I feel it wouldn't be justified to use it:

I carry out a questionnaire to a sample of customers who bought products from me to assess their interests, in order for me to inform me how best to diversify my business for the next couple of years. While this sample is 20% of all those customers I have served since I opened, I could perceive the population here to be inclusive of all those customers who haven't bought products from me (because they aren't in the area or my prices were too high or they didn't know about my shop etc.) and therefore I have a population which leaves my sample to be <1% of it in size).

Questions:

Do people agree with these my interpretation of the two example scenarios?

When producing a piece of work, is it good practice to include the justification why you are using a finite population correction?

Is this largely a assessment of 'who' the population is and whether you want your results to apply just to the 'known' population or whether you are seeking to provide something that has a wider level of application than just the 'known' population?

Best Answer

You are correct about the second scenario, for the reason you give, but not about the first scenario. The theory of the finite population correction (fpc) applies only to a random sample without replacement (Lohr (2009) Sec 2.8,pp 51-530. The key word is random. The hallmark of a random sample is that selection is determined by random numbers or the physical equivalent. In your first scenario the 45% of the population who responded were not selected by random numbers. The same would be true if the 45% were part of an even large random sample of the population: response is not governed by random numbers.

Even if you have a sample of a substantial part of the population with (near) 100% response, you should still omit the fpc if the purpose of your study is to develop predictions, estimate odds ratios, or to otherwise test hypotheses or quote p-values. The reasoning is interesting (Cochran, 1977, p.39): For a finite population it is seldom of scientific interest to ask if a null hypothesis (e.g. that two proportions are equal) is exactly true. Except by a very rare chance, it will not be, as one would discover this by enumerating the entire population. This leads to the adoption of a "superpopulation" viewpoint, which is taken by almost all statisticians these days. Your second scenario is a variant of this. See also Deming(1966) pp 247-261 "Distinction between enumerative and analystic studies"; Korn and Graubard (1999), p. 227.

ADDED NOV 26 I should have noted that the finite population correction is a minor concern here. The major problem is the 55% non-response and the subsequent non-response bias. Survey professionals universally agree that it is better to take a smaller manageable sample and to focus on reducing non-response by personalizing the initial contacts and by following-up with non-responders. Post-survey weighting fixes may also help, but will increase standard errors.

In summary, to answer your three questions:

  1. Your interpretation of the first scenario is incorrect.
  2. You really don't need to say anything. If your goal is to describe only the finite population from which you drew the sample, then you can mention that you omit the fpc because the effect is miniscule. Otherwise, when you do hypothesis testing or prediction, you could mention that omit the fpc, but I've never seen anyone do it.
  3. The decision of whether to use the fpc is the assessment you describe in the question. So the answer is "Yes".

Additional discussion See a related CV discussion here.

References

Cochran, W. G. (1977). Sampling techniques (3rd Ed.). New York: Wiley.

Deming, W. E. (1966). Some theory of sampling. New York: Dover Publications.

Korn, E. L., & Graubard, B. I. (1999). Analysis of health surveys (Wiley series in probability and statistics). New York: Wiley.

Levy, Paul S, and Stanley Lemeshow. 2008. Sampling of populations : methods and applications. Wiley series in survey methodology. Hoboken, N.J: Wiley.

Lohr, Sharon L. 2009. Sampling: Design and Analysis. Boston, MA: Cengage Brooks/Cole.