Metropolis-Hastings Algorithm – Using the Log of the Density in MCMC

gibbsmarkov-chain-montecarlometropolis-hastingsrejection sampling

Does Metropolis-Hastings work with the log of the proposal and the density to be sampled from?
That is, say we want to sample from a density $\pi(x)$, using a proposal $q(x|x^{old})$, will the Metropolis-Hastings work with $\log(\pi(x))$ and $\log(q(x|x^{old}))$ as well?

When constructing a Gibbs sampler, we often encounter full conditional distributions that are non-conjugate. Techniques to sample from them include ARS,1,2 ARMS,3 and Slice sampling.4 These techniques have the convenient feature that they can take the log of a density to be sampled from. They are technically Metropolis-Hastings samplers, so there are cases where my question will be answered in the affirmative. But is this a general feature of the Metropolis-Hastings algorithm?

The reason for my question is that you often store the log of the density when designing a Gibbs sampler, especially when using an object-oriented language and one of the three samplers mentioned above. If you have $\log( \pi(x) )$ stored, you could take $e^{ \log(\pi(x)) } = \pi(x)$ and use Metropolis-Hastings, but that can create issues with numerical overflow and underflow.

I have been unable to find a reference that explains or provides a proof to this question.

EDIT:

I didn't mean to ask whether I can sample from $\log(\pi(x) )$, since that makes no sense. My question was whether the Metropolis-Hastings can work if I pass it $\log(\pi(x) ) $. That is, is there a way to construct the algorithm that only uses the "better behaved" $\log(\pi(x) ) )$, rather than $\pi(x)$? The latter is usually a product of multiple other densities, and it can get really big really quick. Sums of log-densities are easier to use in algorithms than products of densities.


References:

1Gilks, W., Wild, P.: Adaptive Rejection Sampling for Gibbs sampling, Applied Statistics 41(2), 337–348, 1992

2Gilks, W.: Derivative-free Adaptive rejection sampling for Gibbs sampling, Baysian Statistics 4, Oxford University Press, Oxford, 641–649, Eds: Bernardo, J., Berger, j., Dawid, A., Smith, A., 1992

3Gilks, W., Best, N., Tan, K.: Adaptive Rejection Metropolis Sampling within Gibbs sampling, Applied Statistics 44(4), 455–472, 1995

4Neal, Radford M: Slice sampling, Annals of Statistics 31(3), 705–741, 2003

Best Answer

As hinted at by @Tim, the solution was quite simple. A function implementing the Metropolis-Hastings can take $\log(\pi(x) )$ and $\log(q(x|x^{old} ) )$, but then everything will have to happen on a log scale. Let $\alpha$ be the acceptance probability of the Metropolis-Hastings update and $x'$ be the current value of the sampler. Then we propose a new $x$ with acceptance probability of:

$$ \alpha(x | x') = \min\left(1 , \frac{\pi(x) q(x'|x) }{\pi(x') q(x|x') } \right) $$ or in log terms $$ \log\big( \alpha(x | x') \big) = \min\Big( 0 , \log(\pi(x)) + \log(q(x'|x)) - \log(\pi(x')) - \log( q(x|x') ) \Big) \text{.} $$

Then $\log(\alpha)$ can then be used to accept/reject the proposed $x$.

This is a useful interpretation of the Metropolis-hastings that has practical benefits. It is used in this textbook: http://mcmcinirt.stat.cmu.edu/archives/320

That is, an implementation of the Metropolis-Hastings that takes the log of the density to be sampled from, and the log of the proposal. Logs are convenient to use since they are not limited to the range of IEEE doubles, and they don't suffer from numerical over and underflow, as pointed out by @whuber.