Solved – For Hamiltonian Monte Carlo, why does negating the momentum variables result in a symmetric proposal

hamiltonian-monte-carlomarkov-chain-montecarlometropolis-hastingsmonte carlo

I have been going through Radford Neal's excellent HMC book chapter in detail. However, there is one detail that I'm really obsessing with now, and I'm not sure if I'm thinking about it right. When describing the Metropolis step for HMC, which is needed for computer implementation, he says:

The negation of the momentum variables at the end of the trajectory makes the Metropolis proposal symmetrical, as needed for the acceptance probability above to be valid

However, I cannot figure out why negating momentum variables means the proposal is symmetric. To be clear, here's my understanding of things:

  • Define a proposal $f(s_2|s_1)$ which indicates the density of starting at state $s_1$ and then getting $s_2$ as the proposal, which we then check for acceptance using MH.
  • In many cases, we want the proposal to be symmetric so the MH test reduces to the Metropolis test. One way to do this is with a Gaussian proposal, so define $f(s_2|s_1)$ as sampling from a Gaussian with fixed variance, centered at $s_1$. Then since Gaussians are symmetric, $f(s_2|s_1) = f(s_1|s_2)$.
  • With HMC, we have the following steps when starting from $(q,p)$: (1) sampling momentum variables, typically Gaussians, (2) run $L$ leapfrog steps to update both variables, (3) negate the momentum variables at the end to get our final proposed state $(q^*,p^*)$. Here, I'm using Radford Neal's notation about the state of the system, with $q$ as position and $p$ as momentum.

OK, so now I need to know why the proposal is symmetric. I suppose I need to understand these points:

  • I think the leapfrog steps are "deterministic" after the sampled momentum variables (unless we're injecting stochasticity in computing the derivatives). Does this mean that the probability of starting at a given state and then simply applying leapfrog steps (assume no sampling of momentum variables) until we land at a later state, is one? So it's a delta distribution, really.
  • But then what happens when we sample those Gaussian momentum variables. Don't they impact the proposal probabilities? I do not know how to analyze this when you include the leapfrog steps.
  • Neal also discuss reversibility, conservation of Hamiltonian, and volume preservation of Hamiltonian Dynamics. I think I need to better understand how reversibility and volume preservation mean we can claim the proposal is symmetric. (Conservation of the Hamiltonian I think is used to show that in theory, $H$ should have the same value, but in computer implementation it doesn't.) Unfortunately I seem to have a roadblock here.

Thanks for your time!

Best Answer

One of the reasons why the original construction of Hamiltonian Monte Carlo can be tricky to understand is that it is more restrictive than necessary, if only to simplify the theoretical proofs. In particular, the negation of the momenta in the deterministic update is indeed practically irrelevant because of the full momenta resampling*. If we include it however, then we can exploit the theoretical fact that the composition of two symmetric proposals is itself symmetric to show that the composite Hamiltonian Monte Carlo update is also symmetric.

The deterministic proposal distribution is a bit tricky because, as noted in the question, it is given by a delta function. Tierney (https://projecteuclid.org/euclid.aoap/1027961031) showed that for any such deterministic update to be symmetric it has to be an involution. In other words it has to yield the identify when applied twice. Assuming a symplectic integrator, which is time-reversible and volume preserving, this is guaranteed by the momentum negation.

Hamiltonian Monte Carlo is actually much more general, both in terms of theory and implementation, than the original algorithm. For more details see https://arxiv.org/abs/1701.02434, especially the appendix.

* There are partial momenta resampling schemes, from the original schemes of Horowitz to more recent "Extra Chance" or "Generalized" schemes, where the negation is of critical importance to the validity of the algorithm.

Related Question