Metropolis-Hastings Algorithm – Calculating Acceptance Probability in Metropolis-Hastings Algorithm

distributionsmachine learningmarkov-chain-montecarlometropolis-hastingsprobability

In the Metropolis-Hastings algorithm, acceptance probability is given as
$$
\alpha = \min \left( 1,\frac{f(\theta^{'}|y)q(\theta|\theta^{'})}{f(\theta|y)q(\theta^{'}|\theta)} \right)
$$

which simplifies to
$$
\alpha = \min \left( 1,\frac{f(y|\theta^{'})f(\theta^{'}) q(\theta|\theta^{'})}{f(y|\theta)f(\theta) q(\theta^{'}|\theta)} \right)
$$

My questions are:

  1. Is it possible to evaluate $f(y|\theta^{'})$ and $f(y|\theta)$ without knowing its analytical form?
  2. If not, does that mean that we need to know the analytical form of the posterior and the likelihood to use the Metropolis-Hastings algorithm? Also, what is the purpose of the data then?
  3. If yes, how do we do it?

Thank you in advance!

Best Answer

When $$\alpha = \min \left( 1,\frac{f(y|\theta^{'})f(\theta^{'}) q(\theta|\theta^{'})}{f(y|\theta)f(\theta) q(\theta^{'}|\theta)} \right)$$ involves an intractable likelihood function $f(y|\cdot)$ that cannot be computed, several (exact) alternatives are available:

  1. the intractable part of $f(y|\theta)$ may also appear in $q(\theta|\theta')$ and hence cancels in the ratio. This is the idea of the auxiliary variable device of Møller et al. (2006). Also pursued by Murray et al. (2012). They mostly address the setup of doubly intractable distributions where the likelihood function $f(y|\theta)$ involves a multiplicative factor $\mathfrak c(\theta)$ that is itself intractable.

  2. the intractable likelihood $f(y|\theta)$ may unbiasedly estimated by a random variable $\xi(y,\theta)$, even up to a normalising constant: $$\mathbb E[\xi(y,\theta)]=\alpha(y)f(y|\theta)$$ where $\alpha(y)$ may be unknown / intractable. This is the idea of pseudo-marginal MCMC of Andrieu & Roberts (2009).

  3. Demarginalising $y$ into $(y,z)$ and $f(y|\theta)$ into $\tilde f(y,z|\theta)$ such that $$\int_{\mathbb Z} \tilde f(y,z|\theta)\,\text dz=f(y|\theta)$$and $\tilde f(y,z|\theta)$ tractable is a more general auxiliary variable method, where the augmented $(\theta,z)$ is simulated conditional on $y$ through an MCMC method. When using a Gibbs sampler, the ratio $\alpha$ may then be replaced at iteration $t$ by $$\tilde\alpha = \min \left( 1,\frac{\tilde f(y,z^t|\theta^{'})f(\theta^{'}) q(\theta^t|\theta^{'})}{f(y,z^t|\theta^t)f(\theta^t) q(\theta^{'}|\theta^t)} \right)$$ equivalent to $$\tilde\alpha = \min \left( 1,\frac{\tilde f(y|\theta^{'},z^t)f(\theta^{'}) q(\theta^t|\theta^{'})}{f(y|\theta^t,z^t)f(\theta^t) q(\theta^{'}|\theta^t)} \right)$$ which is a special case of 1.

If none of these (related) approaches can be used (in a sufficiently efficient manner), then a approximate approach is to resort to ABC (Approximate Bayesian computation).

Related Question