In order to get this, and to simplify the matters, I always think first in just one parameter with uniform (long-range) a-priori distribution, so that in this case, the MAP estimate of the parameter is the same as the MLE. However, assume that your likelihood function is complicated enough to have several local maxima.
What MCMC does in this example in 1-D is to explore the posterior curve until it finds values of maximum probability. If the variance is too short, you'll most surely get stuck on local maxima, because you'll be always sampling values near it: the MCMC algorithm will "think" it is stuck on the target distribution. However, if the variance is too large, once you get stuck on one local maximum, you'll more-or-less reject values until you find other regions of maximum probability. If you happen to propose the value at the MAP (or a similar region of local maximum probability which is larger than the others), with a large variance you'll end up rejecting almost every other value: the difference between this region and the others will be too large.
Of course, all of the above will affect the convergence rate and not the convergence "per-se" of your chains. Recall that whatever the variance, as long as the probability of selecting the value of this global maximum region is positive, your chain will converge.
To by-pass this problem, however, what one can do is to propose different variances in a burn-in period for each parameter and aim at a certain acceptance rates which can satisfy your needs (say $0.44$, see Gelman, Roberts & Gilks, 1995 and Gelman, Gilks & Roberts, 1997 to learn more on the issue of selecting a "good" acceptance rate which, of course, will depende on the form of you posterior distribution). Of course, in this case the chain is non-markovian, so you DON'T have to use them for inference: you just use them to adjust the variance.
Here you go - three examples. I've made the code much less efficient than it would be in a real application in order to make the logic clearer (I hope.)
# We'll assume estimation of a Poisson mean as a function of x
x <- runif(100)
y <- rpois(100,5*x) # beta = 5 where mean(y[i]) = beta*x[i]
# Prior distribution on log(beta): t(5) with mean 2
# (Very spread out on original scale; median = 7.4, roughly)
log_prior <- function(log_beta) dt(log_beta-2, 5, log=TRUE)
# Log likelihood
log_lik <- function(log_beta, y, x) sum(dpois(y, exp(log_beta)*x, log=TRUE))
# Random Walk Metropolis-Hastings
# Proposal is centered at the current value of the parameter
rw_proposal <- function(current) rnorm(1, current, 0.25)
rw_p_proposal_given_current <- function(proposal, current) dnorm(proposal, current, 0.25, log=TRUE)
rw_p_current_given_proposal <- function(current, proposal) dnorm(current, proposal, 0.25, log=TRUE)
rw_alpha <- function(proposal, current) {
# Due to the structure of the rw proposal distribution, the rw_p_proposal_given_current and
# rw_p_current_given_proposal terms cancel out, so we don't need to include them - although
# logically they are still there: p(prop|curr) = p(curr|prop) for all curr, prop
exp(log_lik(proposal, y, x) + log_prior(proposal) - log_lik(current, y, x) - log_prior(current))
}
# Independent Metropolis-Hastings
# Note: the proposal is independent of the current value (hence the name), but I maintain the
# parameterization of the functions anyway. The proposal is not ignorable any more
# when calculation the acceptance probability, as p(curr|prop) != p(prop|curr) in general.
ind_proposal <- function(current) rnorm(1, 2, 1)
ind_p_proposal_given_current <- function(proposal, current) dnorm(proposal, 2, 1, log=TRUE)
ind_p_current_given_proposal <- function(current, proposal) dnorm(current, 2, 1, log=TRUE)
ind_alpha <- function(proposal, current) {
exp(log_lik(proposal, y, x) + log_prior(proposal) + ind_p_current_given_proposal(current, proposal)
- log_lik(current, y, x) - log_prior(current) - ind_p_proposal_given_current(proposal, current))
}
# Vanilla Metropolis-Hastings - the independence sampler would do here, but I'll add something
# else for the proposal distribution; a Normal(current, 0.1+abs(current)/5) - symmetric but with a different
# scale depending upon location, so can't ignore the proposal distribution when calculating alpha as
# p(prop|curr) != p(curr|prop) in general
van_proposal <- function(current) rnorm(1, current, 0.1+abs(current)/5)
van_p_proposal_given_current <- function(proposal, current) dnorm(proposal, current, 0.1+abs(current)/5, log=TRUE)
van_p_current_given_proposal <- function(current, proposal) dnorm(current, proposal, 0.1+abs(proposal)/5, log=TRUE)
van_alpha <- function(proposal, current) {
exp(log_lik(proposal, y, x) + log_prior(proposal) + ind_p_current_given_proposal(current, proposal)
- log_lik(current, y, x) - log_prior(current) - ind_p_proposal_given_current(proposal, current))
}
# Generate the chain
values <- rep(0, 10000)
u <- runif(length(values))
naccept <- 0
current <- 1 # Initial value
propfunc <- van_proposal # Substitute ind_proposal or rw_proposal here
alphafunc <- van_alpha # Substitute ind_alpha or rw_alpha here
for (i in 1:length(values)) {
proposal <- propfunc(current)
alpha <- alphafunc(proposal, current)
if (u[i] < alpha) {
values[i] <- exp(proposal)
current <- proposal
naccept <- naccept + 1
} else {
values[i] <- exp(current)
}
}
naccept / length(values)
summary(values)
For the vanilla sampler, we get:
> naccept / length(values)
[1] 0.1737
> summary(values)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.843 5.153 5.388 5.378 5.594 6.628
which is a low acceptance probability, but still... tuning the proposal would help here, or adopting a different one. Here's the random walk proposal results:
> naccept / length(values)
[1] 0.2902
> summary(values)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.718 5.147 5.369 5.370 5.584 6.781
Similar results, as one would hope, and a better acceptance probability (aiming for ~50% with one parameter.)
And, for completeness, the independence sampler:
> naccept / length(values)
[1] 0.0684
> summary(values)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.990 5.162 5.391 5.380 5.577 8.802
Because it doesn't "adapt" to the shape of the posterior, it tends to have the poorest acceptance probability and is hardest to tune well for this problem.
Note that generally speaking we'd prefer proposals with fatter tails, but that's a whole other topic.
Best Answer
If your proposal has very a low variance, then your proposed new state will be very similar to the current state, so $\frac{\pi(x_2)}{\pi(x_1)}$ will be close to 1 (in the limit, with 0 variance, the proposal and current state will be same and you'll have it exactly equal to 1), so acceptance rate will be close to 100%.
If your proposal has high variance, however, $\frac{\pi(x_2)}{\pi(x_1)}$ will (at least sometimes) be way smaller than 1, so your acceptance rate will get closer and closer to 0%. So the acceptance rate decreases as the variance of the proposal increases.
The problem with very low variances (which will get you higher acceptance rate) is that they take longer to explore the posterior space by never moving far away from the current state. Adaptive MCMC methods like Haario et Al. try to handle such problem by changing the variance matrix of the proposal on the fly.
To tune your acceptance rate you could just try increase and decrease the variance, a somewhat trial and error approach. But depending on the geometry of the posterior, the acceptance rate might change drastically during the sampling process. Also, for multiparameter models the proposal covariance matrix has many variance and covariance terms and such method gets impractical.
There is more sofiscated methods to handle this like the adaptative metropolis method outlined in the link above, or you might want to take a look at other methods like those listed here. You may also try software like Jags and Stan if Metropolis doesn't work for your problem.