Confidence Interval – How to Calculate Confidence Level for a Given Sample Size and Population Size?

confidence intervalmathematical-statisticsproportion;sample-sizesampling

It's been a while since I had statistics in uni, and so I'm a little rusty. I need some help with a fairly straight forward calculation of the confidence level of a sample size. I've been trying to look for an answer on CrossValidated but came to the conclusion that answers are often to complicated for me to quickly grasp. I hope that one of you is kind enough to talk me through an example and provide a formula I can apply in a confidence interval calculator I'm building.

An example: I have a sample size of 1406 respondents ($n$), a population size of 29,245,6752 ($N$), I want to have a confidence level of 95% ($z$ = 1.96) and the percentage of respondents picking a certain option 50% ($p$ = 0.5).

Is there anyone who wants to walk me through the calculation with the data I just gave, and give me the formula so that I can create my calculator? Thank you very much!

Best Answer

When constructing confidence intervals usually the size of a population is far larger than the sample size. In these cases we treat the sample as if it came from an infinite population and this simplifies the analysis a bit. For these cases the confidence interval formula is the following

Lower limit:

$$p-z\sqrt{\frac{p(1-p)}{n}}$$

For your example this is $0.5-1.96\sqrt{\frac{0.5(1-0.5)}{1406}}=0.4739$

Upper limit:

$$p+z\sqrt{\frac{p(1-p)}{n}}$$

For your example this is $0.5261$ so the 95% confidence interval for the population value of $p$ is $(0.4739,0.5261)$

Small population size

When the size of the population is small then you can make an adjustment to account for this fact. In this case the confidence interval is

Lower limit:

$$p-z\sqrt{\frac{p(1-p)}{n}\left(\frac{N-n}{N-1} \right)}$$

Upper limit:

$$p+z\sqrt{\frac{p(1-p)}{n}\left(\frac{N-n}{N-1} \right)}$$

The part under the square root is modified slightly. In your example the population is huge so it's being modified by a factor of $\frac{292456752-1406}{292456752-1}= 0.999995$. You can try calculating the modified confidence interval, it doesn't change the first four decimal places.

Small sample sizes

When you sample very few people then the methods used to derive the above formulas can be invalid. A common rule for deciding if sample size is large enough is the following:

If $np > 5$ and $n(1-p)>5$ then the sample size is large enough. Your example certainly has a large enough sample size. When the sample size is too small then you should use a different interval such as the Wilson Score interval:

$$\text{Lower limit} = \frac { 2n\hat{p} + z^2 - \left[z \sqrt{z^2 - \frac{1}{n} + 4n\hat{p}(1 - \hat{p}) + (4\hat{p} - 2)} + 1\right] } { 2(n + z^2) }$$

$$\text{Upper limit} = \frac { 2n\hat{p} + z^2 + \left[z \sqrt{z^2 - \frac{1}{n} + 4n\hat{p}(1 - \hat{p}) + (4\hat{p} - 2)} + 1\right] } { 2(n + z^2) }$$

If these formulas give a value below $0$ or above $1$ (which is an impossible value for $p$) then round them to $0$ or $1$

This one doesn't have a nice way of adjusting for a small population size. If you have both a small population size and a small sample size I'd recommend prioritizing the small population size and using the second set of confidence interval formulas I described.