Solved – Success of Bernoulli trials with different probabilities

bernoulli-distributiondistributionspoisson-binomial-distributionprobability

If 20 independent Bernoulli trials are carried out each with a different probability of success and therefore failure. What is the probability that exactly n of the 20 trials was successful?

Is there a better way of calculating these probabilities rather than simply summing together the combinations of success and failure probabilities?

Best Answer

The distribution you are asking about is called the Poisson Binomial distribution, with rather complicated pmf (see Wikipedia for broader description)

$$ \Pr(X=x) = \sum\limits_{A\in F_x} \prod\limits_{i\in A} p_i \prod\limits_{j\in A^c} (1-p_j) $$

Generally, the problem is that using it directly for a larger number of trials would be very slow. There are also other methods of calculating the pmf, e.g. recursive formulas, but they are numerically unstable. The easiest way around those problems are approximation methods (described e.g. by Hong, 2013). If we define

$$ \mu = \sum_{i=1}^n p_i $$

$$ \sigma = \sqrt{ \sum_{i=1}^n p_i(1-p_i) } $$

$$ \gamma = \sigma^{-3} \sum_{i=1}^n p_i (1 - p_i) (1 - 2p_i)$$

then we can approximate pmf with Poisson distribution via law of small numbers or Le Cams theorem

$$ \Pr(X = x) \approx \frac{\mu^x \exp(-\mu)}{x!} $$

but it sees that generally Binomial approximation behaves better (Choi and Xia, 2002)

$$ \Pr(X = x) \approx \mathrm{Binom} \left( n, \frac{\mu}{n} \right)$$

you can use Normal approximation

$$ f(x) \approx \phi \left( \frac{x + 0.5 - \mu}{ \sigma} \right) $$

or cdf can be approximated using so-called refined Normal approximation (Volkova, 1996)

$$ F(x) \approx \max\left(0, \ g \left( \frac{x + 0.5 - \mu}{ \sigma} \right) \right)$$

where $g(x) = \Phi(x) + \gamma(1-x^2) \frac{\phi(x)}{6}$.

Another alternative is of course a Monte Carlo simulation.

Simple dpbinom R function would be

dpbinom <- function(x, prob, log = FALSE,
                    method = c("MC", "PA", "NA", "BA"),
                    nsim = 1e4) {

  stopifnot(all(prob >= 0 & prob <= 1))
  method <- match.arg(method)

  if (method == "PA") {
    # poisson
    dpois(x, sum(prob), log)
  } else if (method == "NA") {
    # normal
    dnorm(x, sum(prob), sqrt(sum(prob*(1-prob))), log)
  } else if (method == "BA") {
    # binomial
    dbinom(x, length(prob), mean(prob), log)
  } else {
    # monte carlo
    tmp <- table(colSums(replicate(nsim, rbinom(length(prob), 1, prob))))
    tmp <- tmp/sum(tmp)
    p <- as.numeric(tmp[as.character(x)])
    p[is.na(p)] <- 0
        
    if (log) log(p)
    else p 
  }
}

Most of the methods (and more) are also implemented in R poibin package.


Chen, L.H.Y. (1974). On the Convergence of Poisson Binomial to Poisson Distributions. The Annals of Probability, 2(1), 178-180.

Chen, S.X. and Liu, J.S. (1997). Statistical applications of the Poisson-Binomial and conditional Bernoulli distributions. Statistica Sinica 7, 875-892.

Chen, S.X. (1993). Poisson-Binomial distribution, conditional Bernoulli distribution and maximum entropy. Technical Report. Department of Statistics, Harvard University.

Chen, X.H., Dempster, A.P. and Liu, J.S. (1994). Weighted finite population sampling to maximize entropy. Biometrika 81, 457-469.

Wang, Y.H. (1993). On the number of successes in independent trials. Statistica Sinica 3(2): 295-312.

Hong, Y. (2013). On computing the distribution function for the Poisson binomial distribution. Computational Statistics & Data Analysis, 59, 41-51.

Volkova, A. Y. (1996). A refinement of the central limit theorem for sums of independent random indicators. Theory of Probability and its Applications 40, 791-794.

Choi, K.P. and Xia, A. (2002). Approximating the number of successes in independent trials: Binomial versus Poisson. The Annals of Applied Probability, 14(4), 1139-1148.

Le Cam, L. (1960). An Approximation Theorem for the Poisson Binomial Distribution. Pacific Journal of Mathematics 10(4), 1181–1197.