Sample Size – How to Calculate Sample Size to Ensure Confidence in Sample Mean

confidence intervalsample-size

Unfortunately, it's a long while since I did statistics and despite reading & research I'm not 'confident' as to how to calculate this correctly.

I would like to know the smallest sample size required in order to have a given confidence level that the sample mean would be with a given % of the population mean.

Whilst arbitrary, the following would be known (can be calculated):

  • population size
  • population mean
  • population standard deviation

Sampling would be without replacement.

Distribution is normal.

For example, for a population of 1,000,000 with a mean of 0.90 and a population standard deviation of 1.32 I would need a sample n to be 99% confident that the sample mean is within 1% of the population mean.

I'm interested in understanding the formula as I have to solve this many times for different populations, different confidence levels, and different margins of error. Thank you.

Best Answer

For example, for a population of 1,000,000 with a mean of 0.90 and a population standard deviation of 1.32 I would need a sample n to be 99% confident that the sample mean is within 1% of the population mean.

Okay.

Sampling would be without replacement.

With a million in the population?

To a first approximation, it doesn't matter enough to be worth worrying about

Actually, turns out in this case it does. I'll do it both without replacement and with. With replacement is simpler, and I do it first.

Distribution is normal.

Don't need it. The sample size will be large enough that with the other assumptions, only really strongly non-normal distributions will have any impact.

Can we assume independence (apart from the effect of sampling without replacement)? e.g. sampling completely at random? I'll take it that we can.

$\mu = 0.90$

$\sigma = 1.32$

Want 'to be 99% confident that the sample mean is within 1% of the population mean'.

i.e. Find $n$ such that $P(|\bar{x}-\mu| < .01\mu) = 0.99$

$\bar{x}-\mu \sim N(0, \frac{\sigma^2}{n})$

99% of a normal distribution is within 2.576 s.d.'s of the population mean (this figure is gettable from normal tables, or using a function in a program. I used R) ` Thus I need $2.576 \times \sigma/\sqrt{n} < 0.01 \mu = 0.009$

Hence $2.576^2 \sigma^2/n < 0.009^2$

Hence $2.576^2 \sigma^2 < n \times 0.009^2$

Or $n > (2.576 \times 1.32/0.009)^2 = 142742.9$

So if $n$ is about 142700, (the means and sd's and normal table values were only accurate to about the same number of figures - only the first 3-4 digits will be meaningful) then the required probability statement should hold.

If we allow for the 'without replacement' the sample size would reduce about 14% percent (google for finite population correction to the variance); other factors are likely to affect you by more than a couple of percent (like not having perfectly random sampling, for one example)


Let's look at the without replacement case using the finite population correction now.

The finite population correction multiplies the variance by a factor $f = \frac{N-n}{N-1} = 1-\frac{n-1}{N-1}$.

Some people approximate this by $1 -\, n/N$, which is easily accurate enough with the large numbers for $n$ and $N$ involved here. However, I'll try to do the first version there.

$2.576^2 \sigma^2 (N-n)/(N-1) < n \times 0.009^2$

$(2.576\sigma/0.009)^2 /(N-1) < n/(N-n) $

$(2.576\sigma/0.009)^2 /(N-1) < 1/[N/n\,\,\, -1] $

$142743 \times 1000000/1142742 < n$

So (if I did that right), $n > 124912.7$

Or to the accuracy in the normal value, $n$ should be about $124900$.

(assuming the mean and s.d. are actually accurate to at least 4 figures, too)

Calculation check:

Interval half-width =

$(2.576\times 1.32/\sqrt{124900})\sqrt{(1000000-124900)/999999}$

$= 0.00900$

Related Question