(1) Yes.
(2) Yes. There are only $n+1$ possible outcomes for a binomial random variable, so it is possible to look at what happens for each possible outcome - in fact this is faster than simulating lots and lots of outcomes!
Let $X$ be the number of "successes" among the $n$ customers and let $\hat{p}=X/n$. The confidence interval is $\hat{p}\pm z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$, so the halfwidth is $z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$. Thus we want to compute $P(z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}\leq 0.005)$. In R, we can do this as follows:
target.halfWidth<-0.005
p<-0.016 #true proportion
n.vec<-seq(from=1000, to=3000, by=100) #number of samples
# Vector to store results
prob.hw<-rep(NA,length(n.vec))
# Loop through desired sample size options
for (i in 1: length(n.vec))
{
n<-n.vec[i]
# Look at all possible outcomes
x<-0:n
p.est<-x/n
# Compute halfwidth for each option
halfWidth<-qnorm(0.95)*sqrt(p.est*(1-p.est)/n)
# What is the probability that the halfwidth is less than 0.005?
prob.hw[i]<-sum({halfWidth<=target.halfWidth}*dbinom(x,n,p))
}
# Plot results
plot(n.vec,prob.hw,type="b")
abline(0.95,0,col=2)
# Get the minimal n required
n.vec[min(which(prob.hw>=0.95))]
The answer is $n=2200$ in this case as well.
Finally, it is usually a good idea to verify that the asymptotic normal approximation interval actually gives the desired coverage. In R, we can compute the coverage probability (i.e. the actual confidence level) as:
p<-0.016
n<-2200
x<-0:n
p.est<-x/n
halfWidth<-qnorm(0.95)*sqrt(p.est*(1-p.est)/n)
# Coverage probability
sum({abs(p-p.est)<=halfWidth}*dbinom(x,n,p))
Different $p$ give different coverages. For $p$ around $0.015$, the actual confidence level of the nominal $90\%$ interval seems to be about $89\%$ in general, which I presume is fine for your purposes.
(3) When you sample from a finite population, the number of successes is not binomial but hypergeometric. If the population is large compared to your sample size, the binomial works just fine as an approximation. If you sample 1000 out of 5000, say, it does not. Have a look at confidence intervals for proportions based on the hypergeometric distribution!
Answers to additional questions:
Let $(p_L,p_U)$ be the confidence interval.
1) In that case you are no longer computing $P(p_L-p_U\leq0.01)$ but $$P\Big(p_L-p_U\leq0.01~\mbox{and}~p\in(p_L,p_U)\Big),$$ i.e. the probability that the length of intervals that actually contain $p$ is at most 0.01. This may be an interesting quantity, depending on what you're interested in...
2) Maybe, but probably not. If the population size is large compared to the sample size you don't need it, and if it's not then the binomial distribution is not appropriate to begin with!
3) Sprop
seems to contain confidence intervals based on the hypergeometric intervals, so that should work just fine.
You could try a nonparametric bootstrap approach. For example
require(boot)
the.means = function(dt, i) {mean(dt[i])}
boot.obj <- boot(data=mydata, statistic=the.means , R=10000)
quantile(boot.obj$t, c(.025,.975))
You can repeat this for each of your 6 subsets of data.
Best Answer
So you have a population each of whom can have zero or more conditions. To answer the question: How many hospital patients have A? It seems to me that the best you can do is take your favourite proportion estimator and offer it up with your favourite confidence interval. There are lots of choices, which will make a difference for very high or very low proportions. If you have such a situation, the estimator above may not be optimal.
If you are interested in the population of just your hospital then you can, as SheldonCooper points out, dispense with the statistics altogether. I suspect however that you are interested in hospital patients more generally, so your standard errors and intervals might be interpreted relative to this population. In your suggested estimator the identity of the population will determine what 1-F is. Certainly hospital patients don't look like non-hospital patients with respect to the conditions you're counting, but that need not matter.
Following Sheldon's second observation, it is probable that the conditions correlate. But as far as I can see this is only useful information if you are asking conditional questions, e.g. the prevalence of A among B sufferers. In probabilistic terms your question is about estimating marginals, and correlation information only tells you about estimating conditionals.
If you were interested in these sorts of subgroups, you'd certainly want to model this information. You'd also want it if there were differential measurement errors or sample selection issues, etc. e.g. only getting tested for A if you have a B diagnosis... That might also make certain sample marginals problematic as estimates of population marginals. Thankfully, I don't know much about hospital populations, but I'd be willing to bet that there are some of these issues around.
Finally, about reporting: If you in fact want to report confidence regions rather than condition-wise intervals, then again the correlation structure matters, and things get considerably trickier. I seem to remember that Agresti had a paper on simultaneous confidence intervals for multivariate Binomial proportions, which might be helpful for this approach.