Solved – Confused with MCMC Metropolis-Hastings variations: Random-Walk, Non-Random-Walk, Independent, Metropolis

markov-chain-montecarlometropolis-hastings

Over the past few weeks I have been trying to understand MCMC and the Metropolis-Hastings algorithm(s). Every time I think I understand it I realise that I am wrong. Most of the code examples I find on-line implement something that is not consistent with the description. i.e.: They say they implement Metropolis-Hastings but they actually implement random-walk metropolis. Others (almost always) silently skip the implementation of the Hastings correction ratio because they are using a symmetric proposal distribution. Actually, I haven't found a single simple example that calculates the ratio so far. That makes me even more confused. Can someone give me code examples (in any language) of the following:

  • Vanilla Non-Random Walk Metropolis-Hastings Algorithm with Hastings correction ratio calculation (even if this will end up being 1 when using a symmetric proposal distribution).
  • Vanilla Random Walk Metropolis-Hastings algorithm.
  • Vanilla Independent Metropolis-Hastings algorithm.

No need to provide the Metropolis algorithms because if I am not mistaken the only difference between Metropolis and Metropolis-Hastings is that the first ones always sample from a symmetric distribution and thus they don't have the Hastings correction ratio.
No need to give detailed explanation of the algorithms. I do understand the basics but I am kinda confused with all the different names for the different variations of the Metropolis-Hastings algorithm but also with how you practically implement the Hastings correction ratio on the Vanilla non-random-walk MH. Please don't copy paste links that partially answer my questions because most likely I have already seen them. Those links led me to this confusion. Thank you.

Best Answer

Here you go - three examples. I've made the code much less efficient than it would be in a real application in order to make the logic clearer (I hope.)

# We'll assume estimation of a Poisson mean as a function of x
x <- runif(100)
y <- rpois(100,5*x)  # beta = 5 where mean(y[i]) = beta*x[i]

# Prior distribution on log(beta): t(5) with mean 2 
# (Very spread out on original scale; median = 7.4, roughly)
log_prior <- function(log_beta) dt(log_beta-2, 5, log=TRUE)

# Log likelihood
log_lik <- function(log_beta, y, x) sum(dpois(y, exp(log_beta)*x, log=TRUE))

# Random Walk Metropolis-Hastings 
# Proposal is centered at the current value of the parameter

rw_proposal <- function(current) rnorm(1, current, 0.25)
rw_p_proposal_given_current <- function(proposal, current) dnorm(proposal, current, 0.25, log=TRUE)
rw_p_current_given_proposal <- function(current, proposal) dnorm(current, proposal, 0.25, log=TRUE)

rw_alpha <- function(proposal, current) {
   # Due to the structure of the rw proposal distribution, the rw_p_proposal_given_current and
   # rw_p_current_given_proposal terms cancel out, so we don't need to include them - although
   # logically they are still there:  p(prop|curr) = p(curr|prop) for all curr, prop
   exp(log_lik(proposal, y, x) + log_prior(proposal) - log_lik(current, y, x) - log_prior(current))
}

# Independent Metropolis-Hastings
# Note: the proposal is independent of the current value (hence the name), but I maintain the
# parameterization of the functions anyway.  The proposal is not ignorable any more
# when calculation the acceptance probability, as p(curr|prop) != p(prop|curr) in general.

ind_proposal <- function(current) rnorm(1, 2, 1) 
ind_p_proposal_given_current <- function(proposal, current) dnorm(proposal, 2, 1, log=TRUE)
ind_p_current_given_proposal <- function(current, proposal) dnorm(current, 2, 1, log=TRUE)

ind_alpha <- function(proposal, current) {
   exp(log_lik(proposal, y, x)  + log_prior(proposal) + ind_p_current_given_proposal(current, proposal) 
       - log_lik(current, y, x) - log_prior(current) - ind_p_proposal_given_current(proposal, current))
}

# Vanilla Metropolis-Hastings - the independence sampler would do here, but I'll add something
# else for the proposal distribution; a Normal(current, 0.1+abs(current)/5) - symmetric but with a different
# scale depending upon location, so can't ignore the proposal distribution when calculating alpha as
# p(prop|curr) != p(curr|prop) in general

van_proposal <- function(current) rnorm(1, current, 0.1+abs(current)/5)
van_p_proposal_given_current <- function(proposal, current) dnorm(proposal, current, 0.1+abs(current)/5, log=TRUE)
van_p_current_given_proposal <- function(current, proposal) dnorm(current, proposal, 0.1+abs(proposal)/5, log=TRUE)

van_alpha <- function(proposal, current) {
   exp(log_lik(proposal, y, x)  + log_prior(proposal) + ind_p_current_given_proposal(current, proposal) 
       - log_lik(current, y, x) - log_prior(current) - ind_p_proposal_given_current(proposal, current))
}


# Generate the chain
values <- rep(0, 10000) 
u <- runif(length(values))
naccept <- 0
current <- 1  # Initial value
propfunc <- van_proposal  # Substitute ind_proposal or rw_proposal here
alphafunc <- van_alpha    # Substitute ind_alpha or rw_alpha here
for (i in 1:length(values)) {
   proposal <- propfunc(current)
   alpha <- alphafunc(proposal, current)
   if (u[i] < alpha) {
      values[i] <- exp(proposal)
      current <- proposal
      naccept <- naccept + 1
   } else {
      values[i] <- exp(current)
   }
}
naccept / length(values)
summary(values)

For the vanilla sampler, we get:

> naccept / length(values)
[1] 0.1737
> summary(values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.843   5.153   5.388   5.378   5.594   6.628 

which is a low acceptance probability, but still... tuning the proposal would help here, or adopting a different one. Here's the random walk proposal results:

> naccept / length(values)
[1] 0.2902
> summary(values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.718   5.147   5.369   5.370   5.584   6.781 

Similar results, as one would hope, and a better acceptance probability (aiming for ~50% with one parameter.)

And, for completeness, the independence sampler:

> naccept / length(values)
[1] 0.0684
> summary(values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.990   5.162   5.391   5.380   5.577   8.802 

Because it doesn't "adapt" to the shape of the posterior, it tends to have the poorest acceptance probability and is hardest to tune well for this problem.

Note that generally speaking we'd prefer proposals with fatter tails, but that's a whole other topic.