Probability Distribution – How to Determine the Distribution of a Random Sum of Non-IID Bernoulli Variables

distributionsprobabilityrandom variable

I'm trying to find the probability distribution of a sum of a random number of variables which aren't identically distributed. Here's an example:

John works at a customer service call center. He receives calls with problems and tries to solve them. The ones he can't solve, he forwards them to his superior. Let's assume that the number of calls he gets in a day follows a Poisson distribution with mean $\mu$. The difficulty of each problem varies from pretty simple stuff (which he can definitely deal with) to very specialized questions which he won't know how to solve. Assume that the probability $p_i$ he'll be able to solve the i-th problem follows a Beta distribution with parameters $\alpha$ and $\beta$ and is independent of the previous problems. What is the distribution of the number of calls he solves in a day?

More formally, I have:

$Y = I(N > 0)\sum_{i = 0}^{N} X_i$ for $i = 0, 1, 2, …, N$

where $N \sim \mathrm{Poisson}(\mu)$ , $(X_i | p_i) \sim \mathrm{Bernoulli}(p_i)$ and $p_i \sim \mathrm{Beta}(\alpha, \beta)$

Note that, for now, I'm happy to assume that the $X_i$'s are independent. I'd also accept that the parameters $\mu, \alpha$ and $\beta$ do not affect each other although in a real-life example of this when $\mu$ is large, the parameters $\alpha$ and $\beta$ are such so that the Beta distribution has more mass on low success rates $p$. But let's ignore that for now.

I can calculate $P(Y = 0)$ but that's about it. I can also simulate values to get an idea of what the distribution of $Y$ looks like (it looks like Poisson but I don't know if that's down to the numbers of $\mu, \alpha$ and $\beta$ I tried or whether it generalizes, and how it might change for different parameter values). Any idea of what this distribution is or how I could go about deriving it?

Please note that I have also posted this question on TalkStats Forum but I thought that it might get more attention here. Apologies for cross-posting and many thanks in advance for your time.

EDIT: As it turns out (see the very helpful answers below – and thanks for those!), it is indeed a $\mathrm{Poisson}(\frac{\mu\alpha}{\alpha + \beta})$ distribution, something which I was guessing based on my intuition and some simulations, but was not able to prove. What I now find surprising though, is that the Poisson distribution only depends on the mean of the $\mathrm{Beta}$ distribution but is not affected by its variance.

As an example, the following two Beta distributions have the same mean but different variance. For clarity, the blue pdf represents a $\mathrm{Beta}(2, 2)$ and the red one $\mathrm{Beta}(0.75, 0.75)$.

Beta Distributions

However, they would both result in the same $\mathrm{Poisson}(0.5\mu)$ distribution which, to me, seems slightly counter-intuitive. (Not saying that the result is wrong, just surprising!)

Best Answer

The calls (that is, the $X_i$) arrive according to a Poisson process. The total number of calls $N$ follows a Poisson distribution. Divide the calls into two types, e.g. whether $X_i = 1$ or $X_i = 0$. The goal is to determine the process that generates the $1$s. This is trivial if $X_i = 1$ with a fixed probability $p$: by the superposition principle of Poisson processes, the full process thinned to just the $1$s would also be a Poisson process, with rate $p\mu$. In fact this is the case, we just require an additional step to get there.

Marginalize over $p_i$, so that $$\mathrm{Pr}(X_i|\alpha, \beta) = \int_0^1 p_i^{X_i} (1-p_i)^{1-X_i} \frac{p_i^{\alpha-1} (1-p_i)^{\beta-1}}{\mathcal{B}(\alpha, \beta)} dp_i = \frac{\mathcal{B}(X_i + \alpha, 1 - X_i + \beta)}{\mathcal{B}(\alpha, \beta)}$$

Where $\mathcal{B}(a, b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a + b)}$ is the beta function. Using the fact that $\Gamma(x+1) = x\Gamma(x)$, the above simplifies to;

$$\mathrm{Pr}(X_i = 1|\alpha, \beta) = \frac{\Gamma(1+\alpha)\Gamma(\beta)}{\Gamma(1+\alpha+\beta)} \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} = \frac{\alpha}{\alpha+\beta}$$ In other words, $X_i \sim \mathrm{Bernoulli}(\frac{\alpha}{\alpha+\beta})$. By the superposition property, $Y$ is Poisson distributed with rate $\frac{\alpha \mu}{\alpha+\beta}$.

A numerical example (with R) ... in the figure, the vertical lines are from simulation and red points are the pmf derived above:

draw <- function(alpha, beta, mu) 
{ N <- rpois(1, mu); p = rbeta(N, alpha, beta); sum(rbinom(N, size=1, prob=p)) }

pmf <- function(y, alpha, beta, mu)
  dpois(y, alpha*mu/(alpha+beta))

y <- replicate(30000,draw(4,5,10))
tb <- table(y)

# simulated pmf
plot(tb/sum(tb), type="h", xlab="Y", ylab="Probability")
# analytic pmf
points(0:max(y), pmf(0:max(y), 4, 5, 10), col="red")

enter image description here

Related Question