Mean – How to Calculate Standard Error of the Mean for Known Population Size

finite-populationmeanstandard error

Background

Let us suppose that we are sampling from a finite population which is itself drawn from a univariate normal distribution. This accomplishes (1) having an underlying probability model and (2) having a probably-normal-like-but-finite population.

The following plot shows jackknifes of increasing sample size of the finite population. The red line is the true mean of the finite population. The black line plot is the estimated mean of the finite population from a sample. The blue fill-between is the sample mean plus-or-minus the estimated standard error of the mean.

enter image description here

I have noticed that the sample mean must, as one would expect, intersect with the finite population mean when we have "sampled" (i.e. taken a census) the entirety of the population. But notice that the "estimated" standard error calculated on the whole finite population is not zero.

The advice I have generally seen for this is to just take the standard error of the mean to be zero when you have actually taken a census. I think that is sound advice, but I cannot help but wonder if there is a population-size-aware standard error. What I mean is some function that estimates the standard error of the mean for finite samples of infinite populations in the usual way, but when finite samples are taken from a finite population it has a stochastic tapering down to zero. Metaphorically, it would "know" if you had sampled most or all of the population.

I am hoping for something that takes the population size as an input that algebraically affects the output rather than just setting some piecewise function that partitions the finite and infinite cases. And in the limit as the population size goes to infinity acts like the usual estimate of the standard error of the mean.

Question

Is there a standard error of the mean for when population size is known?


Code for Plot

import matplotlib.pyplot as plt
import numpy as np

N = 10**2
x = np.random.normal(0,100,size=N)

means = []
stderrors = []
for i in range(2, N+1):
    sample = np.random.choice(x, size=i, replace=False)
    mean = np.mean(sample)
    means.append(mean)
    stderror = np.std(sample) / np.sqrt(i)
    stderrors.append(stderror)

means = np.array(means)
stderrors = np.array(stderrors)

plt.plot(range(2, N+1), means, color='k')
plt.plot((2, N), [np.mean(x)]*2, color='r')
plt.fill_between(range(2, N+1), means - stderrors, means + stderrors)
plt.ylabel('Mean')
plt.xlabel('Sample Size')
plt.show()

Best Answer

Transcribing from comments.

For final populations there exists a correction factor which will probably useful for this purpose.

$FPC=\sqrt{(N-n)/(N-1)}$,

and the standard error formula becomes

$SE=s/\sqrt{n} \cdot FPC$.

Related Question