Solved – Metropolis-Within-Gibbs sampling with only marginal distribution known for a subset of variables

conditional probabilitygibbsmarkov-chain-montecarlometropolis-hastingssampling

Typically in Gibbs sampling we want to sample from a joint distribution $p(X_1, X_2, …, X_N)$, but because the joint is hard to sample from directly, we instead achieve this by iteratively sampling each variable $X_i$ from its conditional distribution $p(X_i|\{X_{-i}\})$. The resulting samples then allow us to approximate the full joint distribution of $\{X_i\}$, or any of the marginals $p(X_i)$. If we cannot sample directly from the conditionals, we can insert a Metropolis-Hastings (MH) step, which results in the Metropolis-Within-Gibbs algorithm.

The problem I have is one in which, for a subset of the $X_i$'s, I actually can't evaluate their conditional distribution, but I can easily sample from their (joint) marginal distribution. To simplify this a bit, suppose I want to sample from the joint distribution $p(X_1, X_2)$, and I only have access to the marginal distribution $p(X_1)$ (more specifically, I can sample from this distribution, but I cannot evaluate it) and the conditional $p(X_2|X_1)$. Additionally, I cannot sample from the conditional directly, so I have to use a MH step there.

One simple algorithm that works is to do the following:

Sample $X_1^*$ from $p(X_1)$
Given this sample $X_1^*$, generate a MH chain of samples $X_2^{(t)*}|X_1^*,X_2^{(t-1)*}$

If the chain in step 2 is run for sufficient length, it will converge to the target distribution $p(X_2|X_1^*)$, and so the final sample of $X_2$ will be a proper sample from the conditional. However, therein lies my problem, because I'm working in much higher dimensions than this, and so it would take a long time for this chain to converge, and I can't afford to run a long MCMC chain just to get a single sample of my variables.

Is there any solution to this, e.g. a way to use short chains in step 2 with some sort of correction?

Best Answer

In an ideal world, sampling from $p_1(x_1)$ and then from $p_{1|2}(x_2|x_1)$ is a correct way to simulate from the joint. In case one of these distributions is unavailable, simulating a single step of Metropolis-Within-Gibbs targeting $p_{1|2}(\cdot|x_1^{(t-1)})$ and a single step of Metropolis-Within-Gibbs targeting $p_{2|1}(\cdot|x_2^{(t)})$ is correct. Note that since $p_1(\cdot)$ is available, the MCMC chain starts at stationarity. Note also that, if $p_1(\cdot)$ is available and $p_{2|1}(\cdot|x_1^{(t-1)})$ is available, then (a) $p_{1|2}(\cdot|x_2^{(t)})$ is available up to a constant and (b) $p_1(\cdot)$ can be used as a proposal in the Metropolis-Within-Gibbs step.

In the event where $p_1(\cdot)$ is not available but generational [simulations can be produced from $p_1(\cdot)$] and $p_{1|2}(\cdot|x_1^{(t-1)})$ is available, then making Metropolis proposals $x_1'$ from $p_1(\cdot)$ and accepting these with Metropolis acceptance rate $$\dfrac{p_{1|2}(x_1'|x_2^{(t)})}{p_{1|2}(x_1^{(t-1)}|x_2^{(t)})} \dfrac{p_{1}(x_1^{(t-1)})}{p_{1}(x_1')}=\dfrac{p_{2|1}(x_2^{(t)}|x_1')}{p_{2|1}(x_2^{(t)}|x_1^{(t-1)})}$$ can be implemented.

Related Solutions

Solved – Metropolis-Hastings Algorithm within Gibbs Sampling

Are you sure the joint density$$f(x_1,x_2)=\left(\dfrac{x_1}{x_2}\right)\left(\dfrac{\alpha}{x_2}\right)^{x_1-1}\exp\left\{-\left(\dfrac{\alpha}{x_2}\right)^{x_1} > \right\}\mathbb{I}_{\mathbb{R}^*_+}(x_1,x_2)$$ is integrable?

When I consider the conditional$$f(x_2|x_1)=\dfrac{1}{{x_2}^{x_1}}\exp\left\{-\dfrac{\gamma}{{x_2}^{x_1}} \right\}$$ it should be a proper probability density if the joint above is a proper density. However, when considering the case $x_1=1$, it simplifies to $$f(x_2|x_1=1)=\dfrac{1}{{x_2}}\exp\left\{-\dfrac{\gamma}{{x_2}} \right\}$$ which does not integrate over $(0,\infty)$ since the change of variable $\eta=1/x_2$ leads to the density $$f(\eta|x_1=1)=\eta\exp\left\{-\gamma\eta\right\}\eta^{-2}.$$ Which is equivalent to $\eta^{-1}$ when $\eta\approx 0$. Your change of variable $t=x_2^{-x_1}$ shows the same issue occurs when $x_1\le 1$.

Note: When you operate the change of variable $$x_1\longrightarrow\left(\beta \right)^{x_1}=y$$ the new variate $y$ is either supported by $(0,1)$ or $(1,\infty)$ depending on whether or not $\beta<1$. But this is irrelevant relative to the main issue.

Solved – slice sampling within a Gibbs sampler

I found two references. This one details the algorithm, but the publicly-available pages that I could see on Google Books don't prove that it works.

@inbook{cruz,
    Author = {Cruz, Marcelo G. and Peters, Gareth W. and Shevchenko, Pavel V.},
    Chapter = {7.6.2: Generic univariate auxiliary variable Gibbs sampler: slice sampler},
    Publisher = {Wiley},
    Title = {Fundamental Aspects of Operational Risk and Insurance Analytics: A Handbook of Operational Risk},
    Year = {2015}}

Another one, also partially available on Google Books for free, seems to allude to slice-sampling-within-Gibbs.

@inbook{banerjee,
    Author = {Banerjee, Sudipto and Carlin, Bradley P. and Gelfand, Alan E. },
    Chapter = {9.4.1: Regression in the Gaussian case},
    Edition = {2nd},
    Publisher = {CRC Press},
    Title = {Hierarchical Modeling and Analysis for Spatial Data},
    Year = {2015}}

I agree, it would be nice to find solid proof of validity, preferably in a good journal.

EDIT: Even Gelman's famous "Bayesian Data Analysis" (3rd ed) mentions the idea. In Section 12.3: Further extensions to Gibbs and Metropolis, under the "Slice sampling" heading, the end of the first paragraph says

Slice sampling refers to the application of iterative simulation algorithms on this uniform distribution. The details of implementing an effective slice sampling procedure can be complicated, but the method can be applied in great generality and can be especially useful for sampling one-dimensional conditional distributions in a Gibbs sampling structure.

Neal's famous 2003 slice sampling paper is where I think it was first suggested. The first paragraph of Section 4 says

Slice sampling is simplest when only one (real-valued) variable is being updated. This will of course be the case when the distribution of interest is univariate, but more typically, the single-variable slice sampling methods of this section will be used to sample from a multivariate distribution for x = (x1,...,xn) by sampling repeatedly for each variable in turn. To update xi, we must be able to compute a function, fi(xi), that is proportional to p(xi|{xj}j̸=i), where {xj}j̸=i are the values of the other variables.

Yet I still can find no proof of correctness.

Best Answer

Related Solutions

Solved – Metropolis-Hastings Algorithm within Gibbs Sampling

Solved – slice sampling within a Gibbs sampler

Related Question