(1) Yes.
(2) Yes. There are only $n+1$ possible outcomes for a binomial random variable, so it is possible to look at what happens for each possible outcome - in fact this is faster than simulating lots and lots of outcomes!
Let $X$ be the number of "successes" among the $n$ customers and let $\hat{p}=X/n$. The confidence interval is $\hat{p}\pm z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$, so the halfwidth is $z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$. Thus we want to compute $P(z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}\leq 0.005)$. In R, we can do this as follows:
target.halfWidth<-0.005
p<-0.016 #true proportion
n.vec<-seq(from=1000, to=3000, by=100) #number of samples
# Vector to store results
prob.hw<-rep(NA,length(n.vec))
# Loop through desired sample size options
for (i in 1: length(n.vec))
{
n<-n.vec[i]
# Look at all possible outcomes
x<-0:n
p.est<-x/n
# Compute halfwidth for each option
halfWidth<-qnorm(0.95)*sqrt(p.est*(1-p.est)/n)
# What is the probability that the halfwidth is less than 0.005?
prob.hw[i]<-sum({halfWidth<=target.halfWidth}*dbinom(x,n,p))
}
# Plot results
plot(n.vec,prob.hw,type="b")
abline(0.95,0,col=2)
# Get the minimal n required
n.vec[min(which(prob.hw>=0.95))]
The answer is $n=2200$ in this case as well.
Finally, it is usually a good idea to verify that the asymptotic normal approximation interval actually gives the desired coverage. In R, we can compute the coverage probability (i.e. the actual confidence level) as:
p<-0.016
n<-2200
x<-0:n
p.est<-x/n
halfWidth<-qnorm(0.95)*sqrt(p.est*(1-p.est)/n)
# Coverage probability
sum({abs(p-p.est)<=halfWidth}*dbinom(x,n,p))
Different $p$ give different coverages. For $p$ around $0.015$, the actual confidence level of the nominal $90\%$ interval seems to be about $89\%$ in general, which I presume is fine for your purposes.
(3) When you sample from a finite population, the number of successes is not binomial but hypergeometric. If the population is large compared to your sample size, the binomial works just fine as an approximation. If you sample 1000 out of 5000, say, it does not. Have a look at confidence intervals for proportions based on the hypergeometric distribution!
Answers to additional questions:
Let $(p_L,p_U)$ be the confidence interval.
1) In that case you are no longer computing $P(p_L-p_U\leq0.01)$ but $$P\Big(p_L-p_U\leq0.01~\mbox{and}~p\in(p_L,p_U)\Big),$$ i.e. the probability that the length of intervals that actually contain $p$ is at most 0.01. This may be an interesting quantity, depending on what you're interested in...
2) Maybe, but probably not. If the population size is large compared to the sample size you don't need it, and if it's not then the binomial distribution is not appropriate to begin with!
3) Sprop
seems to contain confidence intervals based on the hypergeometric intervals, so that should work just fine.
Best Answer
In order to find the required sample size $n,$ you need a confidence level (such as $.95 = 95\%)$ and a margin of error (such as $\pm .03 = \pm 3\%).$
The calculator in the link also asks for a population size, but that is not important unless you're thinking you might sample more than 10% of the population. So if this is for a nationwide poll in a large country with millions of eligible subjects, you can ignore that part. (If you're using the calculator in the link, you'd enter something like $10\,000\,000).$
The margin of error for a 95% confidence interval from a poll is $\pm 1.96\sqrt{\frac{p(1-p)}{n}},$ where $n$ is the sample size and $p$ is the true population proportion with the relevant attribute (such as favoring Proposition A on in an upcoming election).
The margin of error is the proportion (percentage in your link) that determines the width of your confidence interval. Maybe you'd like to say that the true proportion is $0.55 \pm 0.03$ or $55\% \pm 3\%.$ Then $E = .03 = 3\%.$
Not knowing $p,$ you could either guess what $p$ might be, or take the worst case, which is $p = 1/2$ (giving the largest possible margin of error). Then for a 95% confidence interval (CI), you'd have a CI of the form $\hat p \pm E.$ So $E=1.96\sqrt{\frac{p(1-p)}{n}}.$ If you're taking $p = 1/2,$ then you have $E = 1.96\sqrt{.25/n} \approx 1/\sqrt{n}.$ So, if $E = 3\%,$ then $n \approx 1/(.03^2) = 1111$ subjects.
Note: Here's why I say that $p = 1/2$ is the 'worst case', leading to the largest margin of error. The factor $Q = p(1 - p)$ in the margin of error reaches its maximum when $p = 1/2.$ So the margin of error $E$ is maximized when $p = 1/2$ and for a fixed value of $E$ that leads to the largest required $n.$