Can someone explain the main idea behind Hamiltonian Monte Carlo methods and in which cases they will yield better results than Markov Chain Monte Carlo methods ?
Solved – Hamiltonian monte carlo
bayesianhamiltonian-monte-carlomarkov-chain-montecarlo
Related Solutions
The deterministic Hamiltonian trajectories are useful only because they are consistent with the target distribution. In particular, trajectories with a typical energy project onto regions of high probability of the target distribution. If we could integrate Hamilton's equations exactly and construct explicit Hamiltonian trajectories then we would have a complete algorithm already and not need any acceptance step.
Unfortunately outside of a few very simple examples we can't integrate Hamilton's equations exactly. That's why we have to bring in symplectic integrators. Symplectic integrators are used to construct high accuracy numerical approximations to the exact Hamiltonian trajectories that we can't solve analytically. The small error inherent in symplectic integrators causes these numerical trajectories to deviate away from the true trajectories, and hence the projections of the numerical trajectories will deviate away from the typical set of the target distribution. We need to introduce a way to correct for this deviation.
The original implementation of Hamiltonian Monte Carlo considered the final point in a trajectory of fixed length as a proposal, and then applied a Metropolis acceptance procedure to that proposal. If the numerical trajectory had accumulated too much error, and hence deviated too far from the initial energy, then that proposed would be rejected. In other words, the acceptance procedure throws away proposals that end up projecting too far away from the typical set of the target distribution so that the only samples we keep are those that fall into the typical set.
Note that the more modern implementations that I advocate for in the Conceptual paper are not in fact Metropolis-Hastings algorithms. Sampling a random trajectory and then a random point from that random trajectory is a more general way to correct for the numerical error introduced by symplectic integrators. Metropolis-Hastings is just one way to implement this more general algorithm, but slice sampling (as done in NUTS) and multinomial sampling (as currently done in Stan) work just as well if not better. But ultimately the intuition is the same -- we're probabilistically selecting points with small numerical error to ensure exact samples from the target distribution.
As mentioned in the comments by cwl, bjw and Sycorax, the following resources are useful (I can recommend them from my own experience as well):
- Statistical rethinking by R. McElreath has a short but very approachable introduction (and is a great book overall).
- Conceptual Introduction to Hamiltonian Monte Carlo by M. Betancourt goes into depth.
- Stan documentation also has a solid section on HMC
Best Answer
I believe the most up-to-date source on Hamiltonian Monte Carlo, its practical applications and comparison to other MCMC methods is this 2017-dated review paper by Betancourt: