[Math] Natural conjugate prior for bernoulli distribution

bayesianprobability distributions

Assume we have an i.i.d. sample of $n$ observations from a Bernoulli distribution. That is, $\displaystyle{p(y_i|\theta) = \theta^{y_i}(1-\theta)^{1-y_i}} \ \ \ \ \text{for} \ \ y_i = 0, 1$ and $i = 1, 2, \cdots, n$. Show that the natural conjugate prior for the parameter $\theta$ is given by the beta distribution: $\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \theta^{a-1}(1-\theta)^{b-1}$. My working is as follows:

First the likelihood function is given by:
\begin{align*}
L(\theta|\mathbf{y})= p(\mathbf{y}|\theta) & = \prod_{i=1}^n p(y_i|\theta) \ \ \ \ \ \ \ \ \ \left(\text{By the assumption of independence}\right) \\
& = \theta^{y_1}(1-\theta)^{1-y_1} \cdots \theta^{y_n}(1-\theta)^{1-y_n} \\
& = \theta^{\sum_{i=1}^n y_i}(1-\theta)^{\sum_{i=1}^n (1-y_i)} \\
& = \theta^{\sum_{i=1}^n y_i}(1-\theta)^{n-\sum_{i=1}^n y_i}
\end{align*}

Next I confirm that the likelihood function admits sufficient statistics:

Let $t(y_1, y_2, \cdots, y_n) = t(\mathbf{y}) = \sum_{i=1}^n y_i$, hence $g(t|\theta) = \theta^{t(\mathbf{y})}(1-\theta)^{n-t(\mathbf{y})}$. Define $k(\mathbf{y}) = 1$, thus $p(\mathbf{y}|\theta) = g(t|\theta) \times k(\mathbf{y})$ holds and hence the distribution of $\mathbf{y}$ admits sufficient statistics.

The way that I've been taught on how to find natural conjugate priors is to look at the likelihood function as an algebraic function of $\theta$ and then mimic this function in constructing a prior density function for $\theta$ by replacing the sufficient statistics in the likelihood function with the prior parameters. Hence:
\begin{align}p(\theta) = \theta^{\mu}(1-\theta)^{\tau-\mu} \cdots [1]\end{align} where $\mu$ is the prior parameter (that replaces the sufficient statistic $\sum_{i=1}^n y_i$ and $\tau$ replaces $n$.

However, I am stuck as to how I can manipulate equation $[1]$ into the Beta distribution. Any assistance would be greatly appreciated.

Best Answer

However, I am stuck as to how I can manipulate equation [1] into the Beta distribution.

The function $p$ defined by $p(\theta) = \theta^{\mu}(1-\theta)^{\tau-\mu}$ is proportional to the beta density $(a,b)$ with $a=\mu+1$ and $b=\tau-\mu+1$.