Solved – Good sources for learning Markov chain Monte Carlo (MCMC)

markov-chain-montecarloreferences

Any suggestions for a good source to learn MCMC methods?

Best Answer

For online tutorials, there are

A tutorial in MCMC, by Sahut (2000)
Tutorial on Markov Chain Monte Carlo, by Hanson (2000)
Markov Chain Monte Carlo for Computer Vision, by Zhu et al. (2005)
Introduction to Markov Chain Monte Carlo simulations and their statistical analysis, by Berg (2004).
A Tutorial on Markov Chain Monte-Carlo and Bayesian Modeling by Martin B. Haugh (2021).

Practical Markov Chain Monte Carlo, by Geyer (Stat. Science, 1992), is also a good starting point, and you can look at the MCMCpack or mcmc R packages for illustrations.

Related Solutions

Markov Chain Monte Carlo – Comprehensive Summaries of MCMC Applications in Machine Learning

Yes the Handbook of MCMC is a very up-to-date collection of papers on MCMC, Also the book by Robert and Casella is a more current account than Markov Chain Monte Carlo in Practice. But I think MCMC in Practice is really a good place to start learning the subject. Here are amazon links to descriptions of the books I mentioned above.

Introducing Monte Carlo Methods with R

Handbook of Markov Chain Monte Carlo

Solved – Markov Chain Monte Carlo (MCMC) with transformed data

While Glen_b's answer is mathematically correct, it may be slightly off the mark with respect to the very unusual setting of the question here. In short, I think that the Jacobian issue may be irrelevant here from a simulation perspective.

First, if you observe $Y$ and if the Jacobian of the transform $Y = g(X; \theta)$ does not depend on $\theta$, the Jacobian vanishes from the full conditionals in the Gibbs sampler and from the Metropolis-Hastings formula. If the Jacobian does depend on $\theta$ as well, then it has to be included in the conditionals and in the Metropolis-Hastings formula.

More central to the point raised by the question, I find the question utterly interesting and my first solution would have been yours, namely to use the Dirac mass transform of $(\theta,y)$ into $x$. However, it does not work as shown by the following toy example where one starts with a normal observation$$x\sim\mathcal{N}(0,\theta^{-2})$$ and an exponential prior on $\theta^2$, $$\theta^2\sim\mathcal{E}(1)$$ It is a standard derivation to show that the posterior distribution on $\theta^2$ [or conditional of $\theta^2$ given $x$ if you prefer] is a Gamma distribution$$\theta^2|x\sim\text{Ga}(3/2,1+x^2/2)$$Now, if one considers the transform $y=\theta x$, then $y|\theta\sim\mathcal{N}(0,1)$, which means that the distribution of $y$ does not depend on $\theta$, hence that $\theta$ and $y$ are independent. In summary, my toy example involves the distributions $$\theta^2\sim\mathcal{E}(1)\quad x\sim\mathcal{N}(0,\theta^{-2})\quad y=\theta x\sim\mathcal{N}(0,1)$$ which means that the posterior on $\theta$ given $y$ is the prior

The R code following your suggested implementation is

y=rnorm(1) #observation
T=1e4 #number of MCMC iterations
the=rep(1,T) #vector of theta^2's
for (t in 2:T){ #MCMC iterations
  #step 2(a)
  #true conditional of x on θ:
  x=y/sqrt(the[t-1]) 
  #step 2(b) with no R
  #true conditional of θ on x:
  the[t]=rgamma(1,shape=1.5,rate=1+.5*x^2)}

leads to a complete lack of fit. Actually, the lack of fit can be amplified by choosing a very large value for $y$.

The intuitive (and theoretically valid) explanation for this lack of fit is that the Gibbs sampler (or equivalently another MCMC sampler) should always condition on $y$, the sole observation. Therefore, when computing $x$ as a deterministic transform of $(y,\theta)$, this has no impact on the distribution of $\theta$ given $x$ and $y$: it is the same as the distribution of $\theta$ given $y$, given the Dirac on $x$.

In conclusion, the correct version of your algorithm is as follows:

choose initial $\Theta^{(0)}$; m=0

while m $\leq$ NMaxIterations do {
(a) transform $X^{(m)} = g^{-1}(y; \Theta^{(m)})$
(b) sample
$R^{(m+1)} \sim p(R | \Theta^{(m)}, y)$ , and then
$\Theta^{(m+1)} \sim p(\Theta | R^{(m+1)}, y)$
(c) m=m+1 }

$\Theta^* = \text{mean}((\Theta^{(N_B)}, ..., \Theta^{(NMaxIterations)}))$, where $N_B$ is some burn-in period.

This clearly means that the completion step 2(a) is unnecessary.

Best Answer

Related Solutions

Markov Chain Monte Carlo – Comprehensive Summaries of MCMC Applications in Machine Learning

Solved – Markov Chain Monte Carlo (MCMC) with transformed data

Related Question