Binomial Distribution – Accounting for Uncertainty of p When Estimating Mean of Binomial Distribution

beta-binomial distributionbinomial distributionconfidence intervalcredible-interval

I have a binomial distribution with parameters $N$ and $p$, and the estimate for the mean of my distribution is N$\times p$. The values of $N$ and $p$ are such that we can use the Gaussian approximation to estimate the $\sigma$ of the mean as $\sqrt{(n\times p (1-p)}$. The problem is that I have already estimated $p$, so $p$ is actually a Gaussian distribution with a know mean and $\sigma$. My goal is to find a confidence interval for the mean of my binomial distribution, but how do I take the uncertainty of $p$ into account?

Best Answer

There are several problems with your approach. First, you want to use confidence intervals for something that they were not designed for. If $p$ varies, then confidence interval will not show you how does it vary. Check Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean? to learn more about confidence intervals. Moreover, using normal approximation for binomial proportion and it's confidence intervals is not a good idea, as described by Brown et al (2001).

In fact, from your description it sounds like you want to estimate the Bayesian credible interval, i.e. interval that will contain certain fraction of $p$'s distribution. Yes, I said Bayesian, since in fact you already defined your problem as a Bayesian model. You say that you assume that $p$ is a random variable, while in frequentist setting $p$ would be a fixed parameter. If you already assumed it, why not use a Bayesian model for your data? You would be using beta-binomial model (see also An introduction to the Beta-Binomial model paper by Dan Navarro and Amy Perfors). In cases like this it is extremely easy to estimate such model. We can define it as follows:

$$ X \sim \mathrm{Binomial}(N, p) \\ p \sim \mathrm{Beta}(\alpha, \beta) $$

so, your data $X$ follows binomial distribution parametrized by $N$ and $p$, where $p$ is a random variable. We assume beta distribution with parameters $\alpha$ and $\beta$ as a prior for $p$. I guess that if you wanted to use frequentist method, you do not have any prior knowledge about possible distribution of $p$, so you would choose "uninformative" prior parametrized by $\alpha = \beta = 1$, or $\alpha = \beta = 0.5$ (if you prefer, you may translate those parameters to mean and precision, or mean and variance). After updating your prior, posterior distribution of $p$ is simply a beta distribution parametrized by

$$ \alpha' = \alpha + \text{total number of successes} \\ \beta' = \beta + \text{total number of failures} $$

with mean

$$ E(X) = N \frac{\alpha'}{\alpha'+\beta'} $$

To read more about calculating other quantities of this distribution check Wikipedia article on beta-binomial distribution. You can compute credible intervals numerically either by (a) inverting numerically the cumulative distribution function of beta-binomial distribution, or by (b) sampling large number of random values from beta-binomial distribution and then computing sample quantiles from it. Second approach is pretty easy since you only need to sequentially repeat the following procedure:

  1. draw $p$ from beta distribution parametrized by $\alpha'$ and $\beta'$,
  2. draw $x$ from binomial distribution parametrized by $p$ and $N$.

until you draw sample large enough to find it confident for calculating the quantities of interest.


Of course if you know mean and standard deviation of $p$ and you insist on using normal distribution for it, you can use simulation as well, but with using normal distribution for simulating the values of $p$. Below I provide code example in R for such simulation.

R <- 1e5                       # number of samples to draw in simulation
N <- 500                       # known N
mu <- 0.3                      # known mean of p
sigma <- 0.07                  # known standard deviation of p
p <- rnorm(R, mu, sigma)       # simulate p
x <- rbinom(R, N, p)           # simulate X
mean(x)                        # estimate for mean of X
quantile(p*N, c(0.025, 0.975)) # 95% interval estimate for variability of E(X)

Or you can simply take appropriate quantiles using inverse of normal cumulative distribution function and multiply them by $N$. Remember however that this is not a confidence interval, but a credible interval.


Brown, L.D., Cai, T.T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical science, 101-117.