Bayesian – Is Understanding of Metropolis Sampling Algorithm Correct in MCMC

bayesianmarkov-chain-montecarlosampling

The Metropolis algorithm is an instance of the MCMC class of algorithms, with the purpose of sampling from a posterior distribution that may be intractable to manipulate analytically.

For the sake of keeping things simple, let's use the following example:

  • Suppose you want to find the posterior distribution over the bias $\theta$ of a coin, given
    a list of $n$ observations $X_i$.
  • We start with a uniform prior: $\theta \sim \mathsf{Uniform}(0,1)$, and consider the binomial likelihood: $X \mid \theta \sim \mathsf{Binomial}(n, \theta)$.
  • The posterior will be $p(\theta \mid X) = \dfrac{p(X\mid\theta) \cdot p(\theta)}{Z_p}$, where $Z_p=\int_0^1{p(X\mid\theta) \cdot p(\theta) d\theta}$

Let's assume that $Z_p$ is "intractable", so we will use the Metropolis algorithm to sample from the posterior.

First, we need a symmetric proposal distribution, that will give us a new potential sample to move to. First, is it true that this proposal distribution solely gives you a new sample, and is otherwise unrelated to the posterior?

Next, the acceptance probability of moving to a new sample $\theta^\star$ given the current sample $\theta_t$ is
$$
\min\left(1, \ \dfrac{p(\theta^\star \mid X)}{p(\theta_t \mid X)}\right) = \min\left(1, \ \dfrac{p(X\mid\theta^\star)\cdot p(\theta^\star)}{p(X\mid\theta_t)\cdot p(\theta_t)}\right)
$$

which can now be evaluated, since the likelihood and the prior are known, as chosen by the statistician. Essentially, this sample acceptance "trick" allows one to bypass the computation of the normalizing constant.

Is this summary of Metropolis algorithm correct?

Best Answer

This is a correct description of the (symmetric) Metropolis-[Rosenbluth²]-Hastings algorithm, but the motivation is not:

(i) ignoring the normalising constant $Z_p$ is common to a lot of simulation techniques, like accept-reject methods. The normalising constant is useful when computing the cdf and using the inverse cdf technique, but otherwise, it rarely matters. The reasons for using MCMC methods are rather that the (e.g., posterior) distributions to be simulated are complex, high-dimension, non-standard distributions.

(ii) the posterior must be available analytically in the sense that $$p(\theta)p(x|\theta)$$ must be computable for the observed value $x$ and an arbitrary $\theta$, up to a constant (in $\theta$). Otherwise, the random variable $\theta$ must be completed by an auxiliary variable to augment $p(\theta|x)$ into a joint distribution $q(\theta,z|x)$ that can be computed or simulated (as in data augmentation).

Related Question