Solved – MCMC Metropolis-Hastings’ jumping distribution for non-negative parameters

bayesianmarkov-chain-montecarlometropolis-hastings

The Metropolis-Hastings algorithm is Markov Chain Monte Carlo technique for sampling from some distribution $f(x)$ by constructing a Markov Chain whose equilibrium distribution is equal to $f(x)$. Essentially given the current state of the Markov Chain a new state is proposed and it is either accepted or rejected. This new state $X_\text{new}$ is given by:

$$X_\text{new} = X_\text{old} + \lambda$$

Let us call the distribution of $\lambda$ the `jumping distribution'. Typically $\lambda$ is taken to be a mean-zero normal random variable :

$$\lambda \sim \mathcal{N}(0,\sigma^2)$$

Usually the variance $\sigma^2$ is fine-tuned to alter the acceptance rate $\alpha$ of of new moves. My question regards sampling from a non-negative distribution $f(x)$. For instance this may arise from sampling from the Bayesian posterior distribution of the variance parameter of some model. My intial approach was to keep $\lambda$ normal however if the proposed new value is negative I set it to zero instead. However this does not seem like a particularly intelligent or efficient approach. Can you propose a better jumping distribution for non-negative $f(x)$?

Best Answer

Let $q(x^*|x_{old})$ be your proposal (jumping) distribution and let $x^*$ be a proposed value from this distribution. Then the Metropolis-Hastings acceptance probability is $$ \rho = \min \left\{ a, \frac{f(x^*)}{f(x_{old})} \frac{q(x_{old}|x^*)}{q(x^*|x_{old})} \right\} $$ and set $x_{new} = x^*$ with probability $\rho$ and otherwise $x_{new} = x_{old}$.

The Metropolis-Hastings algorithm is appealing because you have a wide range of flexibility in specifying $q$. For example, you can use $N(x_{old},\sigma^2)$ in which case you have to remember that the target $f$ for a non-negative parameter includes an indicator function that indicates the parameter must be positive. Thus any negative values for $x^*$ are rejected because $f(x^*)$ will be zero and you set $x_{new} = x_{old}$. This proposal is convenient because it is symmetric, i.e. $q(x_{old}|x^*)= q(x^*|x_{old})$ for all $x^*, x_{old}$. Thus, you never need to calculate the last ratio in the acceptance probability.

Alternatively (as suggested in the comments), you can use a truncated normal, e.g. $N(x_{old},\sigma^2)\mathrm{I}(x^*>0).$ But note that the normalizing constant for this truncated normal depends on $x_{old}$ and thus this proposal is not a symmetric proposal and thus you will need to calculate the last ratio in the acceptance probability.

The choice of setting any proposed value to zero doesn't gain you anything since (I'm guessing) either your data will indicate that the parameter cannot actually be zero and thus $f(x^*)$ will be zero when $x^*=0$. Even if this isn't the case, evaluation of the proposal distribution will be a bit annoying since it is now a mixture of a continuous distribution and a point mass at zero.

There are many other choices for the proposal distribution including independent proposals, i.e. those that do not depend on $x_{old}$. You have not provided enough information for us to give a "better" proposal distribution because this will depend on the target distribution. If this target distribution doesn't have too much mass near zero, then the normal random-walk proposal, i.e. $N(x_{old},\sigma^2)$, will likely work well even though it occasionally will reject a negative proposed value.