Solved – Testing if a coin is fair using Bayesian statistics

bayesian

Suppose we have a coin and want to decide whether it's fair. We assume that the a priori probability of a coin being fair is 1/2. However, we can't yet calculate the probability of the outcome of a coin flip, because to do so would require data we do not have. How would we estimate the probability that this coin is fair?

Idea: We can conduct an experiment where in we flip the coin 1000 times. We count the number of successes (either heads or tails), and calculate the probability that this number of successes occurs in a fair coin*. We'll call this Pfair.

We then estimate the probability of this number of successes occurring in this coin* by assuming that the probability of a single success = the number of successes / 1000. We'll call the resulting binomial probability Pcoin. Then, we can estimate that P(success) = Pfair * P(fair coin) + Pcoin * P(unfair coin) = (Pfair + Pcoin) / 2.

*= Alternatively, we can calculate the probabilities that the number of successes is either less than or greater than the expected/actual number of successes.

Edit: What I'm after is more of a model of the likelihood that a given model is true. To do this, I would need the a priori probability of observing the data. Suppose I don't know that the probability that I pick up a random coin and get heads is 1/2. How can I apply Bayes' rule?

Best Answer

Bayesian hypothesis testing is usually done by formulating a model that decomposes the prior into the null and alternative cases, which leads to a particular form for Bayes factor. For an arbitrary prior probability for the null hypothesis we can update to find the posterior probability of the null hypothesis using a simple equation involving Bayes factor. This leads to a simple graph of the prior-to-posterior probability mapping, which is a bit like an AUC curve. Here is a general model form for your problem, plus the more specific uniform-prior model.

General Bayesian model: Suppose we observe coin tosses yielding the indicator variables $x_1, x_2, ..., x_n \sim \text{IID Bern}(\theta)$ where one indicates heads and zero indicates tails. We observe $s = \sum_{i=1}^n x_i$ heads in $n$ coin tosses, and we want to use this data to test the hypotheses:

$$H_0: \theta = \tfrac{1}{2} \quad \quad \quad H_A: \theta \neq \tfrac{1}{2}.$$

Without any loss of generality, let $\delta \equiv \mathbb{P}(H_0)$ be the prior probability of the null hypothesis and let $\pi_A(\theta) = p(\theta|H_A)$ be the conditional prior for $\theta$ under the alternative hypothesis. For this arbitrary prior we can express Bayes factor as:

$$\begin{equation} \begin{aligned} BF(n,s) \equiv \frac{p(\mathbf{x}|H_A) }{p(\mathbf{x}|H_0)} &= \frac{\int_0^1 \theta^s (1-\theta)^{n-s} \pi_A(\theta) d\theta}{(\tfrac{1}{2})^n} \\[6pt] &= 2^n \int_0^1 \theta^s (1-\theta)^{n-s} \pi_A(\theta) d\theta. \\[6pt] \end{aligned} \end{equation}$$

We then have posterior:

$$\begin{equation} \begin{aligned} \mathbb{P}(H_0|\mathbf{x}) = \mathbb{P}(H_0|s) &= \frac{p(\mathbf{x}|H_0) \mathbb{P}(H_0)}{p(\mathbf{x}|H_0) \mathbb{P}(H_0) + p(\mathbf{x}|H_A) \mathbb{P}(H_A)} \\[6pt] &= \frac{\mathbb{P}(H_0)}{\mathbb{P}(H_0) + BF(n,s) \mathbb{P}(H_A)} \\[6pt] &= \frac{\delta}{\delta + (1-\delta) BF(n,s)}. \\[6pt] \end{aligned} \end{equation}$$

Use of a particular conditional prior $\pi_A$ leads to different forms for the Bayes factor,

Testing with uniform prior under alternative: Suppose we take $\pi_A(\theta) = \mathbb{I}(\theta \neq 1/2)$ so that this conditional prior is uniform over the allowable parameter values. (We could take $\pi_A(\theta) = 1$ since changing the density at a single point has no effect; thus, we can be a bit fast-and-loose with the support.) Under this model the Bayes factor is:

$$BF(n,s) = 2^n \int_0^1 \theta^s (1-\theta)^{n-s} d\theta = 2^n \cdot \frac{\Gamma(s+1) \Gamma(n-s+1)}{\Gamma(n+2)}.$$

We can implement this model in R using the code below. In this code we give a function for the Bayes factor, and we generate the prior-posterior plot for some example data.

#Create function to calculate Bayes factor
BF <- function(n,s) { exp(n*log(2) + lbeta(s+1, n-s+1)) };

#Example with n = 1000 and s = 449
n <- 1000;
s <- 449;
B <- BF(n, s);

#Generate prior-posterior plot for of null hypothesis
library(ggplot2);

FIGURE <- ggplot(data.frame(delta = c(0, 1)), aes(delta)) + 
          stat_function(fun = function(delta) { delta/(delta + (1-delta)*B) }, 
                        colour = 'red', size = 1) +
          ggtitle('Prior-Posterior Plot - Bayesian Hypothesis Test for Fair Coin') +
          labs(subtitle = paste0('(Number of tosses = ', n, 
                                 ', Number of heads = ', s, ')')) +
          xlab('Prior Probability of Fair Coin') + 
          ylab('Posterior Probability of Fair Coin');

FIGURE;

Multi-armed Bandit

This is a particular case of a Multi-Armed bandit problem. I say a particular case because generally we don't know any of the probabilities of heads (in this case we know one of the coins has probability 0.5).

The issue you raise is known as the exploration vs exploitation dilemma: do you explore the other options, or do you stick with what you think is the best. There is an immediate optimal solution assuming you knew all probabilities: simply choose the coin with the highest probability of winning. The problem, as you have alluded to, is that we are unsure about what the true probabilities are.

There is lots of literature on the subject, and there are many deterministic algorithms, but since you tagged this Bayesian, I'd like to tell you about my personal favourite solution: the Bayesian Bandit!

The Baysian Bandit Solution

The Bayesian approach to this problem is very natural. We are interested in answering "What is the probability that coin X is the better of the two?".

A priori, assuming we have observed no coin flips yet, we have no idea what the probability of coin B's Heads might be, denote this unknown $p_B$. So we should assign a prior uniform distribution to this unknown probability. Alternatively, our prior (and posterior) for coin A is trivially concentrated entirely at 1/2.

As you have stated, we observe 2 tails and 1 heads from coin B, we need to update our posterior distribution. Assuming a uniform prior, and flips are Bernoulli coin-flips, our posterior is a $Beta( 1 + 1, 1 + 2)$. Comparing the posterior distributions or A and B now:

enter image description here

Finding an approximately optimal strategy

Now that we have the posteriors, what to do? We are interested in answering "What is the probability coin B is the better of the two" (Remember from our Bayesian perspective, although there is a definite answer to which one is better, we can only speak in probabilities):

$$w_B = P( p_b > 0.5 )$$

The approximately optimal solution is to choose B with probability $w_B$ and A with probability $1 - w_B$. This scheme maximizes out expected gains. $w_B$ can be computed in calculated numerically, as we know the posterior distribution, but an interesting way is the following:

1. Sample P_B from the posterior of coin B
2. If P_B > 0.5, choose coin B, else choose coin A.

This scheme is also self-updating. When we observe the outcome of choosing coin B, we update our posterior with this new information, and select again. This way, if coin B is really bad we will choose it less, and it coin B is in fact really good, we will choose it more often. Of course, we are Bayesians, hence we can never be absolutely sure coin B is better. Choosing probabilistically like this is the most natural solution to the exploration-exploitation dilemma.

This is a particular example of Thompson Sampling. More information, and cool applications to online advertising, can be found in Google's research paper and Yahoo's research paper. I love this stuff!

Solved – Probability for Fair Dice/Coin According to Bayesian Theorem

Probability of throwing "6" on fair dice is $1/6$ and this has nothing to do with Bayes theorem. Bayes theorem is used to update our knowledge about conditional probabilities. If you are learning about properties of a die by observing it's outcomes, then you'll update your knowledge while observing new data, so the estimated probabilities will change. However the die itself does not change by observing it and calculating probabilities so the true probability does not change.

Your second question is answered in Does 10 heads in a row increase the chance of the next toss being a tail?

Best Answer

Related Solutions

Solved – Coin flipping, decision processes and value of information

Multi-armed Bandit

The Baysian Bandit Solution

Finding an approximately optimal strategy

Solved – Probability for Fair Dice/Coin According to Bayesian Theorem

Related Question