Bivariate Poisson-Binomial distribution.

computabilitycomputational complexityprobabilityprobability distributionsrandom variables

Suppose you have $100$ coins whose probabilities of obtaining the outcome "head" are $p_1,\ldots,\,p_{100}$. These probabilities are not necessarily equal each other. Consider the following random experiment divided into two rounds.

  • Round 1: Throw simultaneously the $100$ coins and observe the number of outcomes "head".
  • Round 2: Throw only those coins you obtained the outcome "tail" in Round 1, and observe the number of outcomes "head".

Define the random variables

$Y_1$: number of outcomes "head" in Round 1,

$X_2$: number of outcomes "head" in Round 2,

$Y_2=Y_1+X_2$.

I learned that

  • $Y_1\sim\text{Poisson-Binomial}(\{p_1,\ldots,\,p_{100}\})$,
  • $X_2\sim\text{Poisson-Binomial}(\{q_1,\ldots,\,q_{100}\})$, where $q_j=(1-p_j)\cdot p_j$, for all $j\in\{1,\ldots,100\}$, and
  • $Y_2\sim\text{Poisson-Binomial}(\{r_1,\ldots,\,r_{100}\})$, where $r_j=p_j+q_j$, for all $j\in\{1,\ldots,100\}$.

Problem: Obtain efficiently the joint distribution of the random vector $(Y_1,\,Y_2)$ whose range is $\{(y_1,\,y_2)\in\{0,\,1,\ldots,\,100\}^2:\,y_2\geq y_1\}$.

Note that $\mathbb{P}(Y_1=y_1,\,Y_2=y_2)=\mathbb{P}(X_2=y_2-y_1 \mid Y_1=y_1) \cdot \mathbb{P}(Y_1=y_1)$.

If $y_1=0$, then $\mathbb{P}(Y_1=0,\,Y_2=y_2)=\left(\displaystyle\prod_{j=1}^{100}(1-p_j)\right) \cdot \mathbb{P}(X_2=y_2\mid Y_1=0)$,

where the second factor can be computed efficiently with the ${\tt R}$-command ${\tt dpoibin}$, for all $y_2\in\{0,\ldots,\,100\}$.

If $y_1=100$, then $\mathbb{P}(Y_1=100,\,Y_2=y_2)=\left(\displaystyle\prod_{j=1}^{100}p_j\right)\cdot 1$.

Troubles to compute $\mathbb{P}(Y_1=y_1,\,Y_2=y_2)$ arise when $y_1\in\{1,\,2,\ldots,\,99\}$ and $y_2\in\{y_1,\ldots,\,100\}$.

Does anyone know how to compute efficiently $\mathbb{P}(Y_1=y_1,\,Y_2=y_2)$, for all $(y_1,\,y_2)$? Thanks a lot for your help and suggestions.

Best Answer

There is no efficient theoretical way to do this, but it is a straightforward dynamic programming problem for a computer. Here is sample code for it in Python.

#! /usr/bin/env python3
def transition_distribution (prob_vector):
    transitions = {(0,0): 1.0}
    for p_i in prob_vector:
        new_transitions = {}
        for entry, prob in transitions.items():
            entry_adjustments = {
                entry: prob * (1.0-p_i) * (1.0-p_i),
                tuple([entry[0], entry[1]+1]): prob * (1.0-p_i) * p_i,
                tuple([entry[0]+1, entry[1]+1]): prob * p_i,
                }
            for new_entry, new_prob in entry_adjustments.items():
                new_transitions[new_entry] = new_transitions.get(new_entry, 0.0) + new_prob
        transitions = new_transitions
    return transitions

print(transition_distribution([0.5, 0.3, 0.2]))

Note that with 100 coins you'll be throwing around dictionaries with 5,500 keys. But that isn't too hard for a computer to do.