Solved – Philosophy of sampling for experiments: finite versus infinite

experiment-designsample-sizesampling

What is the difference between a finite population and an infinite one – when you are designing an experiment (sample/power and interpretation of the results)?

Say a company has a database of 20,000 customers. Given that a response to some stimulus is relatively small, and a meaningful minimum detectable difference is also small, if you run a power analysis for a 2 sample proportion, you may find that you need 2 groups of 15,000 for the experiment.

Do you quit and say you cant experiment on this population? Or do you (somehow) instead treat the population as finite and run a power analysis that way? What are the implications?

ADD:

What I would like to know, and wanted to add this detail in case the last part of the question wasn't completely clear – is what the difference is between

  1. The inference with assuming an infinite population, say the inference from a logistic regression model with glm() in R.

  2. The inference with assuming a finite population, say inference from a logistic regression model with svyglm in R?

Will #1 allow inference about the wider population / data generating process and #2 only allow inference about that particular population (which is assumed fixed)?

Best Answer

You can do the power analysis assuming a finite population. Because the variance of the estimate goes to $0$ as the sample size gets close to the population size, this makes a big difference. The variance for a binomial proportion based on the infinite population assumption would be $$p(1-p)/n$$ where $n$ is the sample size. But if the population size is $N$ it will be $$[p(1-p)/n][1-n/N].$$ The finite population correction factor, $(1-n/N)$, will make it go to $0$ as $n$ approaches $N$ rather than be $p(1-p)/N$ that you would get for a single proportion assumong an infinite population. For your two sample problem the formula is a little more complicated but the idea is the same.

Related Question