Solved – How to combine two beta-binomial distributions

bayesianbeta-binomial distributionmixture-distribution

Say I have the following situation. I have two weighted coins:

Coin 1: In the past I've seen this coin flipped 10 times, 8 of which it came up heads. So I can model the probability of $n$ heads out of $N$ coin tosses with a beta-binomial distribution with beta parameters (8, 2).

Coin 2: This one I've seen flipped 15 times, 4 of which came up heads. So again this could be a beta-binomial with beta parameters (4, 11).

Now say I $N$ times randomly choose from a bag of coins that contains 5 coins, 3/5 are of type 1 and 2/5 are of type 2. Each time I flip the coin and put it back. How do I model the probability of getting n heads out of $N$ tosses?

At first I naively though it would be $(3/5)P(n|\text{coin 1})+(2/5)P(n|\text{coin 2})$ but that would be if only one coin were chosen and flipped $N$ times, not if a new coin is chosen between each flip.

I guess what I need is to have a binomial model with a prior that takes into account the uncertainty of which coin is being flipped, some weighted combination of the two beta distributions. How does one go about this and is it computationally tractable?

Best Answer

Probability of drawing first coin follows binomial distribution with probability $\pi= 3/5$ for a single trial. Since you make $N=1000$ tosses, you expect to see first coin $N\pi$ times and the second one $N(1-\pi)$. Probability of tossing a head using either of the coins follows a Bernoulli distributions, with parameters $p_1$ and $p_2$ respectively. Probabilities of drawing one of the coins and of throwing heads are independent, so we expect

$$ N\pi p_1 + N(1-\pi)p_2 $$

heads in total.

This can be checked using a simple simulation:

simfun <- function(N, pi, p1, p2) {
  sum(rbinom(N, 1, ifelse(runif(N)<pi, p1, p2)))
}

set.seed(123)
N <- 1000
pi <- 3/5
p1 <- 8/10
p2 <- 4/15

that returns

> summary(replicate(1e5, simfun(1000, pi, p1, p2)))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  516.0   576.0   587.0   586.7   597.0   659.0
> N*pi*p1 + N*(1-pi)*p2
[1] 586.6667

Formally what you have is a mixture of two Bernoulli distributions $f_1$, $f_2$ with parameters $p_1$ and $p_2$ and mixing proportion $\pi$:

$$ f(x; \pi, p_1, p_2) = \pi f_1(x; p_1) + (1-\pi) f_2(x; p_2) $$

Unfortunately, if the coins are shuffled, and if what you observe are only the heads and tails, then the individual coins are undistinguishable (in a single throw you see only a head, or tail and it tells you nothing about the coin). In total you observe $k$ heads in $N$ trials, and this also does not enable you to identify the coins that were used in the trials -- it could have been a single coin that resulted in heads with overall probability $ \alpha = \pi p_1 + (1-\pi)p_2 $, or any number of distinct coins, so such model is unidentifiable.

If your experiment were different, say you picked a coin randomly and then thrown it a number of times, then the outcome would tell you something about the coin that you picked. In such case you could use a finite mixture model and estimate it easily using e.g. EM algorithm.

Multi-armed Bandit

This is a particular case of a Multi-Armed bandit problem. I say a particular case because generally we don't know any of the probabilities of heads (in this case we know one of the coins has probability 0.5).

The issue you raise is known as the exploration vs exploitation dilemma: do you explore the other options, or do you stick with what you think is the best. There is an immediate optimal solution assuming you knew all probabilities: simply choose the coin with the highest probability of winning. The problem, as you have alluded to, is that we are unsure about what the true probabilities are.

There is lots of literature on the subject, and there are many deterministic algorithms, but since you tagged this Bayesian, I'd like to tell you about my personal favourite solution: the Bayesian Bandit!

The Baysian Bandit Solution

The Bayesian approach to this problem is very natural. We are interested in answering "What is the probability that coin X is the better of the two?".

A priori, assuming we have observed no coin flips yet, we have no idea what the probability of coin B's Heads might be, denote this unknown $p_B$. So we should assign a prior uniform distribution to this unknown probability. Alternatively, our prior (and posterior) for coin A is trivially concentrated entirely at 1/2.

As you have stated, we observe 2 tails and 1 heads from coin B, we need to update our posterior distribution. Assuming a uniform prior, and flips are Bernoulli coin-flips, our posterior is a $Beta( 1 + 1, 1 + 2)$. Comparing the posterior distributions or A and B now:

enter image description here

Finding an approximately optimal strategy

Now that we have the posteriors, what to do? We are interested in answering "What is the probability coin B is the better of the two" (Remember from our Bayesian perspective, although there is a definite answer to which one is better, we can only speak in probabilities):

$$w_B = P( p_b > 0.5 )$$

The approximately optimal solution is to choose B with probability $w_B$ and A with probability $1 - w_B$. This scheme maximizes out expected gains. $w_B$ can be computed in calculated numerically, as we know the posterior distribution, but an interesting way is the following:

1. Sample P_B from the posterior of coin B
2. If P_B > 0.5, choose coin B, else choose coin A.

This scheme is also self-updating. When we observe the outcome of choosing coin B, we update our posterior with this new information, and select again. This way, if coin B is really bad we will choose it less, and it coin B is in fact really good, we will choose it more often. Of course, we are Bayesians, hence we can never be absolutely sure coin B is better. Choosing probabilistically like this is the most natural solution to the exploration-exploitation dilemma.

This is a particular example of Thompson Sampling. More information, and cool applications to online advertising, can be found in Google's research paper and Yahoo's research paper. I love this stuff!

Probability – Analyzing Conjugate Mixture of Beta Distributions and Weights

Assuming you meant a binomial likelihood,

$$ \begin{eqnarray*} \text{Posterior}(\theta) & \propto & \text{Likelihood}(\theta) \times \text{Prior}(\theta) \\ \\ & = & \text{Binomial}(20 \mid 30, \theta) \times \bigg[ \lambda \times \text{Beta}(\theta \mid 20,10) + (1-\lambda) \times \text{Beta}(\theta \mid 20, 20) \bigg] \\ \\ & = & \lambda \times \text{Binomial}(20 \mid 30, \theta) \times \text{Beta}(\theta \mid 20,10) \\[8pt] &&+ (1 - \lambda) \times \text{Binomial}(20 \mid 30, \theta) \times \text{Beta}(\theta \mid 20,20) \\ \\ & = & \lambda { 30 \choose 20} \frac{1}{\text{B}(20, 10)} \theta^{40 - 1} (1-\theta)^{20-1} \\[8pt] &&+ (1-\lambda) {30 \choose 20} \frac{1}{\text{B}(20,20)} \theta^{40-1} (1-\theta)^{30-1} \\ \\ & = & \lambda { 30 \choose 20} \frac{\text{B}(40,20) \text{B}(40,30)}{\text{B}(20, 10) \text{B}(40,20) \text{B}(40,30)} \theta^{40 - 1} (1-\theta)^{20-1} \\[8pt] &&+ (1-\lambda) {30 \choose 20} \frac{\text{B}(40,20) \text{B}(40,30)}{\text{B}(20,20) \text{B}(40,20) \text{B}(40,30)} \theta^{40-1} (1-\theta)^{30-1} \\ \\ & = & \lambda { 30 \choose 20} \frac{ \text{B}(40,20)}{\text{B}(20, 10)} \text{Beta}(\theta \mid 40,20) \\[8pt] &&+ (1- \lambda) { 30 \choose 20} \frac{\text{B}(40,30)}{\text{B}(20, 20)} \text{Beta}(\theta \mid 40,30) \\ \\ & \propto & \lambda \frac{ \text{B}(40,20)}{\text{B}(20, 10)} \text{Beta}(\theta \mid 40,20) \\[8pt] &&+ (1- \lambda) \frac{\text{B}(40,30)}{\text{B}(20, 20)} \text{Beta}(\theta \mid 40,30). \end{eqnarray*} $$ Thus, the new weights $\omega_1, \omega_2$ are $$ \begin{eqnarray*} \omega_1 & = & \left( \lambda \frac{ \text{B}(40,20)}{\text{B}(20, 10)} \right) \left( \lambda \frac{ \text{B}(40,20)}{\text{B}(20, 10)} + (1- \lambda) \frac{\text{B}(40,30)}{\text{B}(20, 20)} \right)^{-1} \\ \omega_2 & = & 1 - \omega_1, \end{eqnarray*} $$ and $$ \text{Posterior}(\theta) = \omega_1 \times \text{Beta}(\theta \mid 40,20) + \omega_2 \times \text{Beta}(\theta \mid 40,30). $$

Best Answer

Related Solutions

Solved – Coin flipping, decision processes and value of information

Multi-armed Bandit

The Baysian Bandit Solution

Finding an approximately optimal strategy

Probability – Analyzing Conjugate Mixture of Beta Distributions and Weights

Related Question