Bayesian Networks – Posterior Distribution Impact by Prior Hyperparameters

bayesianbayesian networkgraphical-modelposterior

Suppose we randomly select one of two coins and flip it. In that situation we have random variables $\alpha$ and $\delta$, where $\alpha$ tells us which coin we select, and $\delta$ tells us whether our coin flip comes up heads.

Suppose we model this as follows:
$$\alpha\sim \textrm{Bernoulli}(\rho)$$
$$\beta_0\sim \textrm{Beta}(a_0,a_1)$$
$$\beta_1\sim \textrm{Beta}(b_0,b_1)$$
$$\delta|\alpha,\beta_0,\beta_1\sim \textrm{Bernoulli}(\beta_{\alpha})$$
(So that $\beta_i$ is the probability that coin $i$ comes up heads.)

I have already worked out that the posterior distribution of $\alpha$ given $\delta$ is:
$$p(\alpha|\delta) = \left(\rho \frac{b_{1-\delta}}{b_0+b_1}\right)^\alpha \left((1-\rho)\frac{a_{1-\delta}}{a_0+b_0}\right)^{1-\alpha}$$
which sure looks like a Bernoulli distribution. But for this to actually be a proper Bernoulli distribution, the two exponentiated terms have to sum to 1, which seems to place very strong restrictions on my choice of hyperparameters $\rho,a_0,a_1,b_0,b_1$. In other words, since those two exponentiated terms have to sum to 1, we can derive:
$$\rho = \frac{\left(1-\frac{a_{1-\delta}}{a_0+a_1}\right)}{\left(\frac{b_{1-\delta}}{b_0+b_1}-\frac{a_{1-\delta}}{a_0+a_1}\right)}$$
which seems to be a very restrictive constraint! I had thought that I could make any choice for setting my hyperparameters and still derive a proper posterior distribution. Was I wrong about that? For this very simple model, are many choices of prior hyperparameters $\rho,a_0,a_1,b_0,b_1$ actually impossible? (And if so, why? It certainly seems like my prior probability of choosing either coin should not depend on the biases of the two coins, but per my derivation it appears that it does!)

Best Answer

Let's work through the steps. To begin with we have \begin{equation} p(\delta|\beta)\,p(\beta|\alpha)\,p(\alpha) , \end{equation} where \begin{align} p(\delta|\beta) &= \textsf{Bernoulli}(\delta|\beta) \\ p(\beta|\alpha) &= \textsf{Beta}(\beta|a_\alpha,b_\alpha) \\ p(\alpha) &= \textsf{Bernoulli}(\alpha|\rho) . \end{align} Note that instead of indexing the parameter $\beta$ with $\alpha$ as in the question, I have indexed the hyperparameters $(a_\alpha,b_\alpha)$ of the beta distribution with $\alpha$.

Then \begin{equation} p(\delta|\alpha) = \int p(\delta|\beta)\,p(\beta|\alpha)\,d\beta = \textsf{Bernoulli}\Big(\delta\,\big|\,\frac{a_\alpha}{a_\alpha+b_\alpha}\Big) . \end{equation} Consequently, \begin{equation} p(\alpha|\delta) = \frac{p(\delta|\alpha)\,p(\alpha)}{p(\delta)} , %= \textsf{Bernoulli} \Big(\delta\,\big|\,\frac{a_\alpha}{a_\alpha+b_\alpha}\Big)\,\textsf{Bernoulli}(\alpha|\rho) \end{equation} where \begin{equation} p(\delta) = \sum_{\alpha\in\{0,1\}} p(\delta|\alpha)\,p(\alpha) . \end{equation} Dividing by $p(\delta)$ guarantees the probabilities for $\alpha$ sum to one without any special restrictions on the hyperparameters.

Here is some additional detail: \begin{equation} p(\delta|\alpha)\,p(\alpha) = \left(\frac{a_\alpha}{a_\alpha+b_\alpha}\right)^\delta \left(\frac{b_\alpha}{a_\alpha+b_\alpha}\right)^{1-\delta} \rho^\alpha\,(1-\rho)^{1-\alpha} . \end{equation}