[Math] the (proof) guarantee that Metropolis Hastings converge to our required distribution

markov chainssampling

I am trying to understand the proof behind why Metropolis Hastings (MH) will result in a stationary distribution which is proportional to the distribution from which we wish to sample from.

Here is my understanding so far:

We can easily verify that MH algorithm is an ergodic Markov Chain, under certian regularity conditions. Let say, we wish to sample from a distribution $P(X)$, which we know upto a normalization constant. We can use a proposal distribution that $Q(X)$ to generate samples in each run and accept them based on the condition \begin{align}min(1, \frac{P(X')}{P(X)} * \frac{Q(X|X')}{Q(X'|X)}) \end{align}

Also we know that, all ergodic Markov Chains have a unique stationary distribution. Let us call this stationary distribution which we can observe towards the end of this markov chain as $\Pi(X)$. Now, the aim to show that $\Pi \propto P$.

One of the proofs I have read on the internet starts by assuming that we converged to $P$, then show that once the MH algorithm converges to P, it satisfies Detailed Balance equation and hence P is stationary. I feel it is not correct to start by assuming that $\Pi \propto P$, then it is stationary.

Another proof says that Detailed Balance is a necessary condition for a stationary distribution. Then, they were able to show – using the detailed balance equation – that if at all our MH algorithm (designed using this condition $min(1, \frac{P(X')}{P(X)} * \frac{Q(X|X')}{Q(X'|X)})$) converges to a stationary distribution, then $\Pi \propto P$. The only weak link in this proof is the necessary condition. Wikipedia says that Detailed balance is not a necessary condtion for stationary.

Can someone please give a rigorous proof why the stationary distribution will be $P(X)$

Thanks in advance,

Best Answer

The first proof is fine, but probably needs a little more detail.

Let's clarify things a little bit. First, let's fix the distribution $P$, and treat the initial distribution $Q$ as the only "variable" in the MH algorithm/

The MH algorithm, based on some target distribution $P$, can be started with any distribution $Q$(under, as you say, some assumptions about regularity); when you do so, it converges to some limit distribution $d$ (independent of the starting distribution!); you've called this distribution $\Pi(P)$, but a better name would be $\Pi(Q)$, because (for a fixed $P$) it might depend on the distribution $Q$ from which you started.

The goal is to show that $d = P$.

Proof 1 argues as follows. The first two statements are general properties, unrelated to MH.

  1. Detailed balance is sufficient for proving a distribution is stationary.

  2. Among all distributions, there's a unique stationary one, $d$ (making some regularity assumptions about the Markov chain, as you observe).

  3. The MH process (starting from any input!) converges, and the limit distribution $d$ is stationary, and independent of the starting distribution, $Q$.

  4. Hence, for any starting distribution, the limit distribution [which exists by 2], being stationary, must be the unique stationary one [see item 1], $d$.

  5. The distribution $P$ is stationary because it satisfies detailed balance. To put it differently: the MH process, starting from distribution $P$, remains at $P$.

  6. On the other hand, by item 3, the MH process, starting from $P$, must reach a limit distribution which is $d$.

  7. Hence, $P = d$, and you're done.

Note too that you don't arrive at a distribution proportional to $P$ -- you arrive at $P$, because you assumed that $P$ was a distribution at the start of your discussion, and $cP$ is a distribution only if $c = 1$, because of the "integral must be 1" condition on distributions.

By the way, the writeup given in the second reference appears to be fine as well, once you replace "necessary" with "sufficient" in the description of detailed balance.

Related Question