Solved – Bernoulli Confidence Intervals for p very close to 0

bernoulli-distributionconfidence intervalprobabilitysamplesmall-sample

Let's say I have the following observations from many Bernoulli distributions with different p (p1, p2, ..):

Observations from Distribution 1: 10 successes, 100,000 trials, p_hat = 0.0001
Observations from Distribution 2: 0 successes, 100 trials, p_hat = 0
Observations from Distribution 3: 4 successes, 60,000 trials, p_hat = 0.00007

I want to order these distributions by their true probabilities of success and get rid of the ones that have low probability of success. However, because of the inherent nature of these distributions, the probability of success is so low, that if I use a standard Wald and Wilson confidence interval for Bernoulli distributions, the results don't make too much sense.

Is there a standard statistical way to deal with these types of problems? Or do I have to resort to some self defined heuristics to remove distributions with low probability of success?

Best Answer

Confidence intervals are great, but

  • With probabilities near zero, different takes on how to do it have to be considered. The question and discussion so far mention various possibilities. The paper by Brown and friends in Statistical Science 2001 remains the best guide I know for 21st century statisticians and data analysts.

  • With sample sizes this different, overlap of intervals is inevitable and a clear ordering a little elusive.

  • The leading evidence arguably remains the point estimates.

Putting your cases in different order, of the point estimates 0.0001, 0.00007, 0, Stata gives for 95% confidence intervals:

 
Exact          0.0000480         0.0001839
Agresti        0.0000515         0.0001869
Jeffreys       0.0000514         0.0001774
Wald           0.0000380         0.0001620
Wilson         0.0000543         0.0001841

Exact          0.0000182         0.0001707
Agresti        0.0000192         0.0001781
Jeffreys       0.0000225         0.0001585
Wald           0.0000013         0.0001320
Wilson         0.0000259         0.0001714

Exact          0.0000000         0.0362167
Agresti        0.0000000         0.0444121
Jeffreys       0.0000000         0.0247453
Wald           0.0000000         0.0000000
Wilson         0.0000000         0.0369935

Notes: "Exact" here means Clopper-Pearson. Stata is explicit that it clips at 0 (or 1).

Normally I would add a graph, but its main point would be that the intervals for the $n = 100$ sample are massively larger, and logarithmic scale is not appropriate here.

If the samples were from quite different populations, you would have to take all the samples seriously. Otherwise one possible conclusion is that the sample of $n = 100$ is far too small to take seriously compared with the other samples.