Solved – Metropolis-Within-Gibbs sampling with only marginal distribution known for a subset of variables

conditional probabilitygibbsmarkov-chain-montecarlometropolis-hastingssampling

Typically in Gibbs sampling we want to sample from a joint distribution $p(X_1, X_2, …, X_N)$, but because the joint is hard to sample from directly, we instead achieve this by iteratively sampling each variable $X_i$ from its conditional distribution $p(X_i|\{X_{-i}\})$. The resulting samples then allow us to approximate the full joint distribution of $\{X_i\}$, or any of the marginals $p(X_i)$. If we cannot sample directly from the conditionals, we can insert a Metropolis-Hastings (MH) step, which results in the Metropolis-Within-Gibbs algorithm.

The problem I have is one in which, for a subset of the $X_i$'s, I actually can't evaluate their conditional distribution, but I can easily sample from their (joint) marginal distribution. To simplify this a bit, suppose I want to sample from the joint distribution $p(X_1, X_2)$, and I only have access to the marginal distribution $p(X_1)$ (more specifically, I can sample from this distribution, but I cannot evaluate it) and the conditional $p(X_2|X_1)$. Additionally, I cannot sample from the conditional directly, so I have to use a MH step there.

One simple algorithm that works is to do the following:

  1. Sample $X_1^*$ from $p(X_1)$
  2. Given this sample $X_1^*$, generate a MH chain of samples $X_2^{(t)*}|X_1^*,X_2^{(t-1)*}$

If the chain in step 2 is run for sufficient length, it will converge to the target distribution $p(X_2|X_1^*)$, and so the final sample of $X_2$ will be a proper sample from the conditional. However, therein lies my problem, because I'm working in much higher dimensions than this, and so it would take a long time for this chain to converge, and I can't afford to run a long MCMC chain just to get a single sample of my variables.

Is there any solution to this, e.g. a way to use short chains in step 2 with some sort of correction?

Best Answer

In an ideal world, sampling from $p_1(x_1)$ and then from $p_{1|2}(x_2|x_1)$ is a correct way to simulate from the joint. In case one of these distributions is unavailable, simulating a single step of Metropolis-Within-Gibbs targeting $p_{1|2}(\cdot|x_1^{(t-1)})$ and a single step of Metropolis-Within-Gibbs targeting $p_{2|1}(\cdot|x_2^{(t)})$ is correct. Note that since $p_1(\cdot)$ is available, the MCMC chain starts at stationarity. Note also that, if $p_1(\cdot)$ is available and $p_{2|1}(\cdot|x_1^{(t-1)})$ is available, then (a) $p_{1|2}(\cdot|x_2^{(t)})$ is available up to a constant and (b) $p_1(\cdot)$ can be used as a proposal in the Metropolis-Within-Gibbs step.

In the event where $p_1(\cdot)$ is not available but generational [simulations can be produced from $p_1(\cdot)$] and $p_{1|2}(\cdot|x_1^{(t-1)})$ is available, then making Metropolis proposals $x_1'$ from $p_1(\cdot)$ and accepting these with Metropolis acceptance rate $$\dfrac{p_{1|2}(x_1'|x_2^{(t)})}{p_{1|2}(x_1^{(t-1)}|x_2^{(t)})} \dfrac{p_{1}(x_1^{(t-1)})}{p_{1}(x_1')}=\dfrac{p_{2|1}(x_2^{(t)}|x_1')}{p_{2|1}(x_2^{(t)}|x_1^{(t-1)})}$$ can be implemented.

Related Question