Solved – Bernoulli Trials and Bayes Rule for a Beta Distribution

bayesianbernoulli-distributionnormal distributionself-study

So I stumbled upon the following question from Peter Lee's website.

Suppose that your prior beliefs about the probability π of success in
Bernoulli trials have mean 1/4 and variance 1/48. Give a 90% HDR [High Density Region] for π
given that you have observed six successes in 10 trials.

What I am struggling is how to incorporate the mean and variance. I get the impression, that I am suppose to evaluate the beta distribution, which the is conjugate, by using the normal distribution with the same mean and variance. As I remember, you can use the normal distribution to approximate $beta(a,b)$ given $a$ and $b$ are greater than $10$. But I am not sure how to incorporate that into what the question is asking. Can anyone help me figure this out ? I know that I might have to use Bayes's rule somewhere but I don't see how.

UPDATE:

So I calculated the $\alpha = 2$ and $\beta = 6 $. Using this, I was able to write,

$P(\pi | k = 6) \propto [\pi^6(1-\pi)^4] [\pi(1-\pi)^5] = \pi^7(1-\pi)^9 $

I calculated the $\mu = \frac 7{16} $ and $\sigma = 0.1203$.

Which then I finally used to calculate the 90% interval:

$\frac7{16}\pm 1.645 (0.1203) $

Is this correct ?

Best Answer

As you should know by now, the whole reason why we model the prior distribution for $\pi$ as a beta distribution, is so that once you observe your evidence $\boldsymbol x = (x_1, \ldots, x_n)$ where each $x_i$ is a Bernoulli variable with parameter $\pi$, the posterior distribution for $\pi$ given the evidence is also beta. Thus, we call the beta distribution a conjugate prior for a Bernoulli (or binomial) likelihood. This has the nice property that we can repeat the Bayesian inference again and again, each time new evidence is observed.

Formally, we suppose that the probability of success $\pi$ follows a beta distribution with hyperparameters $a, b$: $$\pi \sim \operatorname{Beta}(a,b).$$ We also suppose that we observe a sample $\boldsymbol x = (x_1, \ldots, x_n)$ of size $n$ in which the $x_i$s are IID, with $$x_i \mid \pi \sim \operatorname{Bernoulli}(\pi).$$ Equivalently, we may express this as a single binomial observation counting the number of successes in $n$ trials: $$y | \pi \sim \operatorname{Binomial}(n,\pi),$$ where $y = \sum_{i=1}^n x_i$.

The prior distribution for $\pi$ then has the form $$f_{\pi}(\pi) \propto \pi^{a-1}(1-\pi)^{b-1}, \quad 0<\pi<1.$$ Note we omit the constant of proportionality, for reasons that will become clear soon. The joint distribution of $\boldsymbol x$ given $\pi$ is to $$f_{\boldsymbol x}(\boldsymbol x \mid \pi) = \prod_{i=1}^n \pi^{x_i}(1-\pi)^{1-x_i} = \pi^y (1-\pi)^{n-y}.$$ Therefore, the posterior distribution of $\pi$ \given the sample $\boldsymbol x$ is $$f(\pi \mid \boldsymbol x) = \frac{f_{\boldsymbol x}(\boldsymbol x \mid \pi)f_{\pi}(\pi)}{f(\boldsymbol x)} \propto f_{\boldsymbol x}(\boldsymbol x \mid \pi)f_{\pi}(\pi),$$ since the denominator is independent of $\pi$ and is just a proportionality constant. That is to say, $$f(\pi \mid \boldsymbol x) \propto \pi^{a+y-1} (1-\pi)^{b+n-y-1}, \quad 0 < \pi < 1.$$ This is of course proportional to a beta distribution with posterior hyperparameters $$a^* = a+y, \quad b^* = b+n-y,$$ where again $y$ counts the number of successes in $n$ trials.

So, what this tells you is that if your prior distribution had hyperparameters $a$ and $b$, and you observe $y$ successes in $n$ trials, the posterior distribution is again beta, but with hyperparameters $a^* = a+y$ and $b^* = b+n-y$. So the only thing left to do is to choose the appropriate values of $a, b, n, y$ from your question. $n = 10$ and $y = 6$ are obvious. Less obvious is $a$ and $b$. Recall that for a beta distribution, the mean and variance are $$\operatorname{E}[\pi] = \frac{a}{a+b}, \quad \operatorname{Var}[\pi] = \frac{ab}{(a+b)^2(a+b+1)}.$$ Setting these equal to $1/4$ and $1/48$, respectively, we obtain $a = 2$, $b = 6$. You got those correct.

Then, the posterior for $\pi$ is also beta distributed with parameters $$a^* = 2+6 = 8, \quad b^* = 6+10-6 = 10.$$ This much is clear. But your calculation of a 90% HDR is not precise, because you then proceed to use a normal approximation and calculate a Wald-type confidence interval.

In theory, what you would need to do is calculate $0 \le L < U \le 1$ such that $f(L) = f(U)$ and $$\int_{\pi = L}^{U} f(\pi) \, d\pi = 0.9,$$ where $f$ here refers to the posterior distribution for $\pi$.

Unfortunately, these criteria do not lead to an analytical closed-form solution. Only numerical methods will yield a result. Using a computer, we get $$(L, U) \approx (0.25584959040808949, 0.63145015617325108616).$$ Your approximation is reasonably close, and would probably be an acceptable answer for this question were it to appear on an examination. But it is important to note that a normal approximation would not be suitable if the posterior distribution were highly skewed, or if the posterior hyperparameters were small.