I am self-learning Bayesian statistics using the book Computational Bayesian Statistics by Turkman et al. and I am currently stuck on Chapter 6 Problem 10. It can be found here on page 124.
I am confused about the introduction of latent variables. I'm not sure how to write the full joint probability model in part (a) under the change of variables for the $\pi$. What's throwing me off specifically are the $q$ and $r$. Is the full joint
\begin{align*}p(\pi^*, q, r, z, y) &\propto p(y \mid \pi^*, q, r, z)p(q, r \mid z)p(\pi^*)p(z) \\ &\propto {(\pi^*)}^{\sum_{j=1}^{M_1} y_j} (1-\pi^*)^{\sum_{j=M_1 + 1}^{N} y_j} \prod_{j=1}^{M_1} q_j^{y_j} \prod_{j=M_1+1}^N r_j^{y_j} \cdot \prod_{j = 1}^{M_1} q_j^{a_1 – 1} \prod_{j = M_1 + 1}^{N} r_j^{a_0 – 1} \\ &\cdot \frac{1}{B(a^*, b^*)} (\pi^*)^{a^* – 1}(1 – \pi^*)^{b^* – 1} \cdot \rho^{z} (1 – \rho)^{1 – z}?\end{align*}
In part (c), I am stuck on calculating the conditional posterior for $z_j$ and $q, r$. Would it make sense to just evaluate the joint at $z_j = 1$ and $z_j = 0$? I realize that there is a factor to consider in $p(\tilde q) = Dir(\tilde q \mid a_z,\ldots,a_z)$ for $z_j = z$. I know the $\tilde q$ don't directly involve the $z_j$ though, but there is an implicit dependence.
I am also stuck on part (g) – how would I compute an appropriate acceptance probability for the Metropolis-Hastings scheme? Any help with this problem would be appreciated!
Best Answer
The problem is very interesting. Instead of directly imposing a Dirichlet prior on the multinomial probabilities, it reparameterizes $\pi$ in terms of $\pi^*$ that indicates the global probability (which the authors call the "probability of observing some prevalent outcome"), and that global probability is divvied up into pieces and shared by each category. The "difficulty" (if you will) of this parameterization is that the Dirichlet parameters $(q_1,\ldots,q_{M_1})$ and $(r_1,\ldots,r_{M_0})$ are transdimensional throughout the MCMC chain.
Enough with my subjective addendum. First things first, let's write down the likelihood and the priors. To summarize:
(Sampling model) $y \sim \mathrm{Multinomial}_{N-1}(n; \pi_1,\ldots,\pi_N)$, where $$ \pi_j = \begin{cases}\pi^* q_j & \text{if }z_j=1\\ (1-\pi^*)r_j &\text{if }z_j=0 \end{cases} $$ which gives $$ \begin{align} &\binom{n}{y_1 \cdots y_N} \prod_{j=1}^N (\pi^* q_j)^{z_jy_j}((1-\pi^*)r_j)^{(1-z_j)y_j} \\ &= \binom{n}{y_1 \cdots y_N} \prod_{j\in A_1} (\pi^* \tilde{q}_j)^{y_j} \prod_{j \in A_0} ((1-\pi^*)\tilde{r}_j)^{y_j} \end{align} $$ The two expressions are the same and later, the ability to switch to whichever is more convenient comes in quite handy. The first expression puts the dependency on $z_j$ explicit in the exponenets, whereas the second one makes it implicit in the definition of the sets $A_0$ and $A_1$ as well as $\tilde{q}$ and $\tilde{r}$.
(Prior for $z_j$) $z_j \overset{\text{iid}}{\sim} \mathrm{Bernoulli}(\rho)$
(Prior for $\pi^*$) $\pi^* \sim \mathrm{Be}(a^*,b^*)$
(Prior for $\tilde{q}_1,\ldots,\tilde{q}_{M_1}$) $\mathrm{Dirichlet}_{M_1-1}(a_1,\ldots,a_1)$
(Prior for $\tilde{r}_1,\ldots,\tilde{r}_{M_0}$) $\mathrm{Dirichlet}_{M_0-1}(a_0,\ldots,a_0)$
Combining all the above, the posterior is proportion to $$ \begin{align} &\binom{n}{y_1 \cdots y_N} \prod_{j=1}^N (\pi^* q_j)^{z_jy_j}((1-\pi^*)r_j)^{(1-z_j)y_j}\\ &\times \prod_{j=1}^N \rho^{z_j}(1-\rho)^{1-z_j} (\pi^*)^{a^*-1}(1-\pi^*)^{b^*-1} \prod_{i=1}^{M_1}\tilde{q}_i^{a_1-1} \prod_{i=1}^{M_0}\tilde{r}_i^{a_0-1}. \end{align} $$
For this, discard irrelevant components in the joint likelihood, since from the perspective of the complete conditional, they're all constants and do not contribute to the distribution. You can normalize the complete conditional density after. That being said, for $z_j$, $$ \begin{align} p(z_j\mid - ) &\propto (\pi^* q_j)^{z_jy_j}((1-\pi^*)r_j)^{(1-z_j)y_j} \rho^{z_j}(1-\rho)^{1-z_j}\\ &= \left\{(\pi^* q_j)^{y_j} \rho \right\}^{z_j} \left\{((1-\pi^*)r_j)^{y_j} \right\}^{1-z_j} \end{align} $$ which is a Bernoulli distribution. The probability should be normalized though.
For $q$, by virtue of being a complete conditional and having information about $z_j$, we know which $y_j$ are to be used. Then, $p(q\mid -)\propto \prod_{j=1}^{M_1}\tilde{q}_j^{y_j+a_1-1}$, which is a Dirichlet distribution. Likewise, $p(r\mid -) \propto \prod_{j=1}^{M_0} \tilde{r}_j^{y_j+a_0-1}$.
In a Metropolis-Hastings scheme, the acceptance probability is given to us (how lucky!). It's always, $$ \alpha = \min\left\{1, \frac{L(\theta')\pi(\theta') q(\theta\mid \theta')}{L(\theta)\pi(\theta) q(\theta'\mid \theta)} \right\}, $$ where $L(\cdot)$ is the likelihood function, $\pi(\cdot)$ is the prior densities, $\theta$ is the vector of parameters as you have it in the chain, and $\theta'$ is the candidate parameters sampled from the proposal distribution, which if accepted will be $\theta \gets \theta'$. For example, the proposal distribution of $z_i$ is given by $P(z_i'=1)=0.5$ in the problem. This is a Bernoulli distribution whose probability mass function is $0.5^{z_i}0.5^{1-z_i} = 0.5$. Since the proposal density does not depend on $z_i$, the Hastings ratio $\frac{q(z_i\mid z_i')}{q(z_i'\mid z_i)}$ reduces to 1, but the proposal density of $q'$ is the complete conditional where $z$ conditioned is the candidate value $z'$ so you won't be able to ignore the Hastings ratio. In an implementation though, you can create a variable
accept_rate = 0
outside the loop and increment it each time a proposal is accepted. At the end of the program, divideaccept_rate
by the number of iterations, and it will give you a Monte Carlo estimate of the acceptance rate.