(1) Yes.
(2) Yes. There are only $n+1$ possible outcomes for a binomial random variable, so it is possible to look at what happens for each possible outcome - in fact this is faster than simulating lots and lots of outcomes!
Let $X$ be the number of "successes" among the $n$ customers and let $\hat{p}=X/n$. The confidence interval is $\hat{p}\pm z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$, so the halfwidth is $z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$. Thus we want to compute $P(z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}\leq 0.005)$. In R, we can do this as follows:
target.halfWidth<-0.005
p<-0.016 #true proportion
n.vec<-seq(from=1000, to=3000, by=100) #number of samples
# Vector to store results
prob.hw<-rep(NA,length(n.vec))
# Loop through desired sample size options
for (i in 1: length(n.vec))
{
n<-n.vec[i]
# Look at all possible outcomes
x<-0:n
p.est<-x/n
# Compute halfwidth for each option
halfWidth<-qnorm(0.95)*sqrt(p.est*(1-p.est)/n)
# What is the probability that the halfwidth is less than 0.005?
prob.hw[i]<-sum({halfWidth<=target.halfWidth}*dbinom(x,n,p))
}
# Plot results
plot(n.vec,prob.hw,type="b")
abline(0.95,0,col=2)
# Get the minimal n required
n.vec[min(which(prob.hw>=0.95))]
The answer is $n=2200$ in this case as well.
Finally, it is usually a good idea to verify that the asymptotic normal approximation interval actually gives the desired coverage. In R, we can compute the coverage probability (i.e. the actual confidence level) as:
p<-0.016
n<-2200
x<-0:n
p.est<-x/n
halfWidth<-qnorm(0.95)*sqrt(p.est*(1-p.est)/n)
# Coverage probability
sum({abs(p-p.est)<=halfWidth}*dbinom(x,n,p))
Different $p$ give different coverages. For $p$ around $0.015$, the actual confidence level of the nominal $90\%$ interval seems to be about $89\%$ in general, which I presume is fine for your purposes.
(3) When you sample from a finite population, the number of successes is not binomial but hypergeometric. If the population is large compared to your sample size, the binomial works just fine as an approximation. If you sample 1000 out of 5000, say, it does not. Have a look at confidence intervals for proportions based on the hypergeometric distribution!
Answers to additional questions:
Let $(p_L,p_U)$ be the confidence interval.
1) In that case you are no longer computing $P(p_L-p_U\leq0.01)$ but $$P\Big(p_L-p_U\leq0.01~\mbox{and}~p\in(p_L,p_U)\Big),$$ i.e. the probability that the length of intervals that actually contain $p$ is at most 0.01. This may be an interesting quantity, depending on what you're interested in...
2) Maybe, but probably not. If the population size is large compared to the sample size you don't need it, and if it's not then the binomial distribution is not appropriate to begin with!
3) Sprop
seems to contain confidence intervals based on the hypergeometric intervals, so that should work just fine.
Best Answer
The first thing is that with a constant intensity Poisson-process with rate $\lambda$ per unit time, the time between events has an exponential distribution with mean $\mu=1/\lambda$.
Second, you actually ask two separate questions, so let's deal with them one at a time.
If you average ten such events, the mean is distributed as $\text{gamma}(10,\mu/10)$ (this is the gamma parameterized in the shape-scale form, not the shape-rate form)
If you want a $1-\alpha$ interval for the population mean, then one approach is to construct a pivotal quantity for the mean, which is to say a quantity $Q$ which is a function of the data and the parameter whose distribution doesn't depend on the parameter.
Note that $\hat{\mu}/\mu$ will also have a gamma distribution, it will be distributed as $\text{gamma}(10,1/10)$, so that is a pivotal quantity (and works just fine for this), but it may be easier to work with its inverse $\mu/\hat{\mu}$, which will be distributed as an inverse-gamma distribution, specifically $\text{inverse-gamma}(10,1/10)$.
Similarly, for a sample of size $n$, the distribution of $\mu/\hat{\mu}$ will be $\text{inverse-gamma}(n,1/n)$.
(The inverse chi-square is a particular case of this distribution, and it, too can be used for this purpose, with a suitable rescaling. So the advice you were given was correct, if not immediately helpful to you.)
(I'll assume you want a symmetric (in probability) interval.)
Let $l$ be the $\alpha/2$ quantile of an $\text{inverse-gamma}(n,1/n)$ distribution, and let $u$ be the $1-\alpha/2$ quantile - that is, the values that cut off a lower and upper tail area of $\alpha/2$.
Then $P(l<Q<u) = 1-\alpha$, so a $1-\alpha$ interval for $Q$ is $(l,u)$. Hence we can work out the interval for $\mu$:
$$l<\mu/\hat{\mu}<u$$
$$l\,\hat{\mu}<\mu<u\,\hat{\mu}$$
where $\hat{\mu}=\bar x$, which is to say a $1-\alpha$ interval for $\mu$ is $(l\,\bar x,u\, \bar x)$
Imagine the sample size was 10, and that the mean time was 3.21 seconds. Further, imagine you wanted a 95% interval.
I'm going to work in R, but anything that will give you gamma, inverse gamma, chi-square or inverse-chi-square quantiles will work.
So here's the 0.025 and 0.975 quantiles of a gamma(10,1/10):
The corresponding cutoffs for an inverse gamma will be the inverses of these, flipped around:
So that's our $l$ and $u$ respectively.
Hence a 95% interval for $\mu$ in this example is $(3.21\times 0.5853, 3.21\times 2.0853)$, or $(1.879, 6.694)$
That interval is far bigger that $\pm 20\%$, of course.
If you want to do it in terms of an inverse-chi-square instead, note that a $χ²(\nu)$ is a $\text{gamma}(\nu/2, 1/2)$; from there it's reasonably easy to convert the above calculations.
This is basically the inverse of the above problem, where we specify information about the interval and solve for $n$
Edit: the approach I took here before didn't quite solve the problem as stated, but a related problem (I solved the true mean being within 20% of the estimates). This version should be right:
An interval that's like this: $0.8 \mu <\hat \mu < 1.2 \mu$
is equivalent to (after dividing by $\mu\,\hat\mu$ and inverting):
$0.833 \hat\mu < \mu < 1.25 \hat\mu$
which is in the right form to play with.
So our interval is $l_n \hat{\mu}<\mu<u_n \hat{\mu}$.
You want to choose $n$ that $u_n>1.25$ and that $l_n<0.833$.
Unfortunately, the distribution of a $\text{gamma}(n,1/n)$ (and, correspondingly, and inverse gamma) is different for every $n$.
One could of course proceed by trial and error and this will work perfectly well, if sometimes being a little tedious.
Half a dozen guesses for the lower limit gets us to:
While for the other end:
Which means we need 106 observations to be within the desired limit on both ends and get the desired coverage.
Note that for the lower end we needed a larger sample size to achieve the desired width. [If you are willing to use an asymmetric interval (asymmetric in probability, I mean), in order to get a symmetric-in-percentage-error interval -- i.e. to be substantially more likely to be wrong out one end of the interval than the other -- then you can achieve a smaller sample size while still getting $1-\alpha$ coverage.]
Computing an asymmetric interval
If we try different values of $n$, we can find how much tail area is cut off either side of the +/-20%, and then find the smallest $n$ that totals to no more than $\alpha$ in the tails. In our example, the first place to try is to 'split the difference' and try $n=96$:
Those two tail areas give just under 0.05. Trying n=95 gives more than 0.05 in the tails combined, so n=96 is as small as you can go.
(Note that the 0.03033 value corresponds to the lower tail for the inverse gamma, but it's the probability of the estimate being at least 20% too high)
So if you are happy with more probability of being too high than too low, you can do n=96 for a 95% interval. [A higher coverage ($1-\alpha$) would give a larger $n$, and a lower coverage would give a smaller $n$.]
Simulation example in R to show that this approach is doing what it should. In this example, we calculate the limits ($l$ and $u$) for a symmetric-in-probability (2.5% in each tail) interval based on the 106 just computed:
Since $l$ and $u$ are within $(0.8333, 1.25)$ if the true mean lies inside that interval the estimate can't be further than 20% from the true mean and by construction, 95% of the time the generated interval includes the true mean, then at least 95% of the time we generate such an interval, the estimate must be within 20% of the true mean.
Now we simulate ten thousand intervals to show it does have the required coverage:
- looks like it's right.
To avoid (or at least greatly reduce) the trial and error (not that it takes more than a few moments), we can make use of an approximation that will get us very close to the right answer - the Wilson-Hilferty transformation:
If $X \sim χ²(k)$ then $\scriptstyle\sqrt[3]{X/k}$ is approximately normally distributed with mean $\scriptstyle 1-2/(9k)$ and variance $\scriptstyle 2/(9k)$.