Use a Clopper-Pearson interval?
Wikipedia discribes how to do this here:
http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
For example if you take your 39 successes in 40 trial example you get:
> qbeta(.025,39,2) #qbeta(alpha/2,x,n-x+1) x=num of successes and n=num of trials
[1] 0.8684141
> qbeta(1-.025,39,2)
[1] 0.9938864
For your 40 out of 40 you get:
> qbeta(1-.025,40,1)
[1] 0.9993673
> qbeta(.025,40,1)
[1] 0.9119027
For example, for a population of 1,000,000 with a mean of 0.90 and a population standard deviation of 1.32 I would need a sample n to be 99% confident that the sample mean is within 1% of the population mean.
Okay.
Sampling would be without replacement.
With a million in the population?
To a first approximation, it doesn't matter enough to be worth worrying about
Actually, turns out in this case it does. I'll do it both without replacement and with. With replacement is simpler, and I do it first.
Distribution is normal.
Don't need it. The sample size will be large enough that with the other assumptions, only really strongly non-normal distributions will have any impact.
Can we assume independence (apart from the effect of sampling without replacement)? e.g. sampling completely at random? I'll take it that we can.
$\mu = 0.90$
$\sigma = 1.32$
Want 'to be 99% confident that the sample mean is within 1% of the population mean'.
i.e. Find $n$ such that $P(|\bar{x}-\mu| < .01\mu) = 0.99$
$\bar{x}-\mu \sim N(0, \frac{\sigma^2}{n})$
99% of a normal distribution is within 2.576 s.d.'s of the population mean (this figure is gettable from normal tables, or using a function in a program. I used R)
`
Thus I need $2.576 \times \sigma/\sqrt{n} < 0.01 \mu = 0.009$
Hence $2.576^2 \sigma^2/n < 0.009^2$
Hence $2.576^2 \sigma^2 < n \times 0.009^2$
Or $n > (2.576 \times 1.32/0.009)^2 = 142742.9$
So if $n$ is about 142700, (the means and sd's and normal table values were only accurate to about the same number of figures - only the first 3-4 digits will be meaningful) then the required probability statement should hold.
If we allow for the 'without replacement' the sample size would reduce about 14% percent (google for finite population correction to the variance); other factors are likely to affect you by more than a couple of percent (like not having perfectly random sampling, for one example)
Let's look at the without replacement case using the finite population correction now.
The finite population correction multiplies the variance by a factor $f = \frac{N-n}{N-1} = 1-\frac{n-1}{N-1}$.
Some people approximate this by $1 -\, n/N$, which is easily accurate enough with the large numbers for $n$ and $N$ involved here. However, I'll try to do the first version there.
$2.576^2 \sigma^2 (N-n)/(N-1) < n \times 0.009^2$
$(2.576\sigma/0.009)^2 /(N-1) < n/(N-n) $
$(2.576\sigma/0.009)^2 /(N-1) < 1/[N/n\,\,\, -1] $
$142743 \times 1000000/1142742 < n$
So (if I did that right), $n > 124912.7$
Or to the accuracy in the normal value, $n$ should be about $124900$.
(assuming the mean and s.d. are actually accurate to at least 4 figures, too)
Calculation check:
Interval half-width =
$(2.576\times 1.32/\sqrt{124900})\sqrt{(1000000-124900)/999999}$
$= 0.00900$
Best Answer
When constructing confidence intervals usually the size of a population is far larger than the sample size. In these cases we treat the sample as if it came from an infinite population and this simplifies the analysis a bit. For these cases the confidence interval formula is the following
Lower limit:
$$p-z\sqrt{\frac{p(1-p)}{n}}$$
For your example this is $0.5-1.96\sqrt{\frac{0.5(1-0.5)}{1406}}=0.4739$
Upper limit:
$$p+z\sqrt{\frac{p(1-p)}{n}}$$
For your example this is $0.5261$ so the 95% confidence interval for the population value of $p$ is $(0.4739,0.5261)$
Small population size
When the size of the population is small then you can make an adjustment to account for this fact. In this case the confidence interval is
Lower limit:
$$p-z\sqrt{\frac{p(1-p)}{n}\left(\frac{N-n}{N-1} \right)}$$
Upper limit:
$$p+z\sqrt{\frac{p(1-p)}{n}\left(\frac{N-n}{N-1} \right)}$$
The part under the square root is modified slightly. In your example the population is huge so it's being modified by a factor of $\frac{292456752-1406}{292456752-1}= 0.999995$. You can try calculating the modified confidence interval, it doesn't change the first four decimal places.
Small sample sizes
When you sample very few people then the methods used to derive the above formulas can be invalid. A common rule for deciding if sample size is large enough is the following:
If $np > 5$ and $n(1-p)>5$ then the sample size is large enough. Your example certainly has a large enough sample size. When the sample size is too small then you should use a different interval such as the Wilson Score interval:
$$\text{Lower limit} = \frac { 2n\hat{p} + z^2 - \left[z \sqrt{z^2 - \frac{1}{n} + 4n\hat{p}(1 - \hat{p}) + (4\hat{p} - 2)} + 1\right] } { 2(n + z^2) }$$
$$\text{Upper limit} = \frac { 2n\hat{p} + z^2 + \left[z \sqrt{z^2 - \frac{1}{n} + 4n\hat{p}(1 - \hat{p}) + (4\hat{p} - 2)} + 1\right] } { 2(n + z^2) }$$
If these formulas give a value below $0$ or above $1$ (which is an impossible value for $p$) then round them to $0$ or $1$
This one doesn't have a nice way of adjusting for a small population size. If you have both a small population size and a small sample size I'd recommend prioritizing the small population size and using the second set of confidence interval formulas I described.