Solved – Can somebody explain to me NUTS in english

bayesianmarkov-processmonte carlo

My understanding of the algorithm is the following:

No U-Turn Sampler (NUTS) is a Hamiltonian Monte Carlo Method. This means that it is not a Markov Chain method and thus, this algorithm avoids the random walk part, which is often deemed as inefficient and slow to converge.

Instead of doing the random walk, NUTS does jumps of length x. Each jump doubles as the algorithm continues to run. This happens until the trajectory reaches a point where it wants to return to the starting point.

My questions:
What is so special about the U-turn?
How does doubling the trajectory not skip the optimized point?
Is my above description correct?

Best Answer

The no U-turn bit is how proposals are generated. HMC generates a hypothetical physical system: imagine a ball with a certain kinetic energy rolling around a landscape with valleys and hills (the analogy breaks down with more than 2 dimensions) defined by the posterior you want to sample from. Every time you want to take a new MCMC sample, you randomly pick the kinetic energy and start the ball rolling from where you are. You simulate in discrete time steps, and to make sure you explore the parameter space properly you simulate steps in one direction and the twice as many in the other direction, turn around again etc. At some point you want to stop this and a good way of doing that is when you have done a U-turn (i.e. appear to have gone all over the place).

At this point the proposed next step of your Markov Chain gets picked (with certain limitations) from the points you have visited. I.e. that whole simulation of the hypothetical physical system was "just" to get a proposal that then gets accepted (the next MCMC sample is the proposed point) or rejected (the next MCMC sample is the starting point).

The clever thing about it is that proposals are made based on the shape of the posterior and can be at the other end of the distribution. In contrast Metropolis-Hastings makes proposals within a (possibly skewed) ball, Gibbs sampling only moves along one (or at least very few) dimensions at a time.

Related Question