Confidence Interval – How Confidence Interval for Parameter p of a Bernoulli Trial Varies with p Value

bernoulli-distributionconfidence intervalproportion;sample-size

This is a very basic question (I'm currently studying undergrad level statistics), but I was hoping for some clarification regarding an assertion I read in a newspaper article earlier today. The author asserts that

evidence about the risks (of childbirth) has been hard to come by and difficult to interpret. This is partly because the overall risks of maternal and neonatal death are now very small (about five per 100,000 women die in childbirth and four per 1,000 babies), so large numbers of mums are needed to assess relative risks.

Now, intuitively, this seems uncontroversial – if an event occurs rarely, then you'd expect more trials to be needed to achieve a good estimate of the likelihood of the event, when compared to one that occurs more frequently.

However, I'm having trouble seeing how the mathematics bares this out. If we take the example in the article, we can model the probability of the death of a mother during childbirth as a Bernoulli trial with an estimate of the parameter $p$ given by $\hat{p}=\frac{5}{1000}$.

From this, we can construct a 95% confidence interval for the true value of p:

$$ p^{\pm} = \hat{p} \pm 1.96 \frac{p(1-p)}{\sqrt{n}} $$

However, if we take the limit as $p \to 0$ we get

$$ \begin{aligned}
\lim_{p \to 0} p^{\pm} = \hat{p} \pm 1.96 \frac {0(1-0)}{\sqrt{n}}\\
= \hat{p} \pm 1.96 \frac {0}{\sqrt{n}}\\
= \hat{p} \pm 0
\end{aligned} $$

Therefore, it appears that our estimate of $p$ in fact becomes more accurate as $p$ gets smaller, regardless of our value of $n$. This seems pretty counterintuituve to me – am I going wrong somewhere, and if so, where?

Many thanks,

Tim

Best Answer

The method you use, a normal approximation, is an archaicism and should never be taught or even offered as an option in software. It has very poor coverage properties, particularly for small proportions as in your example.

There are many alternative approaches to calculating these intervals, with varying assumptions and coverage characteristics. Some are very ad hoc in design and so are hard to prefer for pedagogic purposes. My preference is the method of Wilson, sometimes called Wilson's scores intervals. It approximates a conditional interval and has excellent frequentist properties.

See this answer for a little more detail: Discrete functions: Confidence interval coverage?

See this question for a formal statement of the meaning of different types of CI for binomial proportions: Statement of result for binomial confidence intervals

This one for confidence interval coverage: Clarification on interpreting confidence intervals?

Related Question