Solved – Confidence interval for Bernoulli sampling

bernoulli-distributionbinomial distributionconfidence intervalfaq

I have a random sample of Bernoulli random variables $X_1 … X_N$, where $X_i$ are i.i.d. r.v. and $P(X_i = 1) = p$, and $p$ is an unknown parameter.

Obviously, one can find an estimate for $p$: $\hat{p}:=(X_1+\dots+X_N)/N$.

My question is how can I build a confidence interval for $p$?

Best Answer

  • If the average, $\hat{p}$, is not near $1$ or $0$, and sample size $n$ is sufficiently large (i.e. $n\hat{p}>5$ and $n(1-\hat{p})>5$, the confidence interval can be estimated by a normal distribution and the confidence interval constructed thus:

    $$\hat{p}\pm z_{1-\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

  • If $\hat{p} = 0$ and $n>30$, the $95\%$ confidence interval is approximately $[0,\frac{3}{n}]$ (Javanovic and Levy, 1997); the opposite holds for $\hat{p}=1$. The reference also discusses using using $n+1$ and $n+b$ (the later to incorporate prior information).

  • Else Wikipedia provides a good overview and points to Agresti and Couli (1998) and Ross (2003) for details about the use of estimates other than the normal approximation, the Wilson score, Clopper-Pearson, or Agresti-Coull intervals. These can be more accurate when above assumptions about $n$ and $\hat{p}$ are not met.

R provides functions binconf {Hmisc} and binom.confint {binom} which can be used in the following manner:

set.seed(0)
p <- runif(1,0,1)
X <- sample(c(0,1), size = 100, replace = TRUE, prob = c(1-p, p))
library(Hmisc)
binconf(sum(X), length(X), alpha = 0.05, method = 'all')
library(binom)
binom.confint(sum(X), length(X), conf.level = 0.95, method = 'all')

Agresti, Alan; Coull, Brent A. (1998). "Approximate is better than 'exact' for interval estimation of binomial proportions". The American Statistician 52: 119–126.

Jovanovic, B. D. and P. S. Levy, 1997. A Look at the Rule of Three. The American Statistician Vol. 51, No. 2, pp. 137-139

Ross, T. D. (2003). "Accurate confidence intervals for binomial proportion and Poisson rate estimation". Computers in Biology and Medicine 33: 509–531.