Solved – Expected number of successes from $N$ Bernoulli trials with different $p$

bernoulli-distributionpoisson-binomial-distribution

Suppose I have N probabilities $(p_1, p_2,…,p_N)$ that represent the chance that each that a corresponding test was passed. How do I apply the Bernoulli distribution to determine the expected number of passes?

Best Answer

If you add up $n$ Bernouilli random variables with a different success probabilities then the sum has a Binomial Distribution of Poisson.

If the success probabilities are $p_i$ then the mean of the Binomial of Poisson is $n \bar{p}$ where $\bar{p}=\frac{\sum_i p_i}{n}$ and the variance of the Binomial distribution of Poisson is $n \bar{p} (1-\bar{p}) - n \times var(p)$ (Note: for $var(p)$ there must be $n$ in the denominator!).

Note that if $var(p)$ is relatively small then the variance reduces to a binomial variance, which was expected because a small $var(p)$ means that the success probabilities of the Bernouillis are more or less equal.

I have some code to simulate this:

# Function that simulates Poisson Binomial random variable
# 
#   parameter 'ps' contains the success probabilities of the Bernouilli's to add up
#   parameter n.sim is the number of simulations
#
#   The return value is a list containing
#      - the simulated mean
#      - the simulated variance
#      - the 'true' mean of Poisson Binomial namely  n x average(ps)
#      - the true variance of Poisson Binomial namely n x average(ps) x (1-average(ps)) - n var(ps)

simulate.Poisson_Binomial<-function(ps, n.sim=100000) {

  sum.all<-vector(mode="numeric", length=n.sim)
  for ( i in 1:n.sim ) {
    # generate the random outcome for each Bernouilli
    random.outcome<-( runif(n=length(ps)) <= ps )

    # count the number of successes
    sum.all[i]<-sum(random.outcome)
  }

  ret<-list(sim.mean=mean(sum.all), 
            sim.var=var(sum.all),
            PoisBin.mean=length(ps)*mean(ps), 
            PoisBin.var=length(ps)*mean(ps)*(1-mean(ps))-(length(ps)-1)*var(ps))

  return(ret)
}


# Generate 50 Bernouilli success probabilities
set.seed(1)
N<-50
ps<-runif(n=N)

# do the simulation
simulate.Poisson_Binomial(ps=ps, n.sim=5e5)

In the return of the R-function there is (length(ps)-1)*var(ps) this is because the R-function var() has (n-1) in the denominator. so $-n \times var(p)$ in the formula above should be 'translated' to -length(ps) * ( var(ps) * (length(ps)-1)/length(ps) which becomes - var(ps) * (length(ps)-1)

See also this Intuitive explanation for dividing by $n-1$ when calculating standard deviation?