Confidence Interval – How to Calculate Confidence Level for a Given Sample Size and Population Size?

confidence intervalmathematical-statisticsproportion;sample-sizesampling

It's been a while since I had statistics in uni, and so I'm a little rusty. I need some help with a fairly straight forward calculation of the confidence level of a sample size. I've been trying to look for an answer on CrossValidated but came to the conclusion that answers are often to complicated for me to quickly grasp. I hope that one of you is kind enough to talk me through an example and provide a formula I can apply in a confidence interval calculator I'm building.

An example: I have a sample size of 1406 respondents ($n$), a population size of 29,245,6752 ($N$), I want to have a confidence level of 95% ($z$ = 1.96) and the percentage of respondents picking a certain option 50% ($p$ = 0.5).

Is there anyone who wants to walk me through the calculation with the data I just gave, and give me the formula so that I can create my calculator? Thank you very much!

Best Answer

When constructing confidence intervals usually the size of a population is far larger than the sample size. In these cases we treat the sample as if it came from an infinite population and this simplifies the analysis a bit. For these cases the confidence interval formula is the following

Lower limit:

$$p-z\sqrt{\frac{p(1-p)}{n}}$$

For your example this is $0.5-1.96\sqrt{\frac{0.5(1-0.5)}{1406}}=0.4739$

Upper limit:

$$p+z\sqrt{\frac{p(1-p)}{n}}$$

For your example this is $0.5261$ so the 95% confidence interval for the population value of $p$ is $(0.4739,0.5261)$

Small population size

When the size of the population is small then you can make an adjustment to account for this fact. In this case the confidence interval is

Lower limit:

$$p-z\sqrt{\frac{p(1-p)}{n}\left(\frac{N-n}{N-1} \right)}$$

Upper limit:

$$p+z\sqrt{\frac{p(1-p)}{n}\left(\frac{N-n}{N-1} \right)}$$

The part under the square root is modified slightly. In your example the population is huge so it's being modified by a factor of $\frac{292456752-1406}{292456752-1}= 0.999995$. You can try calculating the modified confidence interval, it doesn't change the first four decimal places.

Small sample sizes

When you sample very few people then the methods used to derive the above formulas can be invalid. A common rule for deciding if sample size is large enough is the following:

If $np > 5$ and $n(1-p)>5$ then the sample size is large enough. Your example certainly has a large enough sample size. When the sample size is too small then you should use a different interval such as the Wilson Score interval:

$$\text{Lower limit} = \frac { 2n\hat{p} + z^2 - \left[z \sqrt{z^2 - \frac{1}{n} + 4n\hat{p}(1 - \hat{p}) + (4\hat{p} - 2)} + 1\right] } { 2(n + z^2) }$$

$$\text{Upper limit} = \frac { 2n\hat{p} + z^2 + \left[z \sqrt{z^2 - \frac{1}{n} + 4n\hat{p}(1 - \hat{p}) + (4\hat{p} - 2)} + 1\right] } { 2(n + z^2) }$$

If these formulas give a value below $0$ or above $1$ (which is an impossible value for $p$) then round them to $0$ or $1$

This one doesn't have a nice way of adjusting for a small population size. If you have both a small population size and a small sample size I'd recommend prioritizing the small population size and using the second set of confidence interval formulas I described.

Related Solutions

Confidence Interval for Proportion When Sample Proportion is Near 0 or 1

Use a Clopper-Pearson interval?

Wikipedia discribes how to do this here: http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

For example if you take your 39 successes in 40 trial example you get:

> qbeta(.025,39,2) #qbeta(alpha/2,x,n-x+1) x=num of successes and n=num of trials
[1] 0.8684141
> qbeta(1-.025,39,2)
[1] 0.9938864

For your 40 out of 40 you get:

> qbeta(1-.025,40,1)    
[1] 0.9993673
> qbeta(.025,40,1)
[1] 0.9119027

Sample Size – How to Calculate Sample Size to Ensure Confidence in Sample Mean

For example, for a population of 1,000,000 with a mean of 0.90 and a population standard deviation of 1.32 I would need a sample n to be 99% confident that the sample mean is within 1% of the population mean.

Okay.

Sampling would be without replacement.

With a million in the population?

~~To a first approximation, it doesn't matter enough to be worth worrying about~~

Actually, turns out in this case it does. I'll do it both without replacement and with. With replacement is simpler, and I do it first.

Distribution is normal.

Don't need it. The sample size will be large enough that with the other assumptions, only really strongly non-normal distributions will have any impact.

Can we assume independence (apart from the effect of sampling without replacement)? e.g. sampling completely at random? I'll take it that we can.

$\mu = 0.90$

$\sigma = 1.32$

Want 'to be 99% confident that the sample mean is within 1% of the population mean'.

i.e. Find $n$ such that $P(|\bar{x}-\mu| < .01\mu) = 0.99$

$\bar{x}-\mu \sim N(0, \frac{\sigma^2}{n})$

99% of a normal distribution is within 2.576 s.d.'s of the population mean (this figure is gettable from normal tables, or using a function in a program. I used R) ` Thus I need $2.576 \times \sigma/\sqrt{n} < 0.01 \mu = 0.009$

Hence $2.576^2 \sigma^2/n < 0.009^2$

Hence $2.576^2 \sigma^2 < n \times 0.009^2$

Or $n > (2.576 \times 1.32/0.009)^2 = 142742.9$

So if $n$ is about 142700, (the means and sd's and normal table values were only accurate to about the same number of figures - only the first 3-4 digits will be meaningful) then the required probability statement should hold.

If we allow for the 'without replacement' the sample size would reduce about 14% percent (google for finite population correction to the variance); other factors are likely to affect you by more than a couple of percent (like not having perfectly random sampling, for one example)

Let's look at the without replacement case using the finite population correction now.

The finite population correction multiplies the variance by a factor $f = \frac{N-n}{N-1} = 1-\frac{n-1}{N-1}$.

Some people approximate this by $1 -\, n/N$, which is easily accurate enough with the large numbers for $n$ and $N$ involved here. However, I'll try to do the first version there.

$2.576^2 \sigma^2 (N-n)/(N-1) < n \times 0.009^2$

$(2.576\sigma/0.009)^2 /(N-1) < n/(N-n) $

$(2.576\sigma/0.009)^2 /(N-1) < 1/[N/n\,\,\, -1] $

$142743 \times 1000000/1142742 < n$

So (if I did that right), $n > 124912.7$

Or to the accuracy in the normal value, $n$ should be about $124900$.

(assuming the mean and s.d. are actually accurate to at least 4 figures, too)

Calculation check:

Interval half-width =

$(2.576\times 1.32/\sqrt{124900})\sqrt{(1000000-124900)/999999}$

$= 0.00900$

Best Answer

Related Solutions

Confidence Interval for Proportion When Sample Proportion is Near 0 or 1

Sample Size – How to Calculate Sample Size to Ensure Confidence in Sample Mean

Related Question