Any suggestions for a good source to learn MCMC methods?
Solved – Good sources for learning Markov chain Monte Carlo (MCMC)
markov-chain-montecarloreferences
Related Solutions
Yes the Handbook of MCMC is a very up-to-date collection of papers on MCMC, Also the book by Robert and Casella is a more current account than Markov Chain Monte Carlo in Practice. But I think MCMC in Practice is really a good place to start learning the subject. Here are amazon links to descriptions of the books I mentioned above.
While Glen_b's answer is mathematically correct, it may be slightly off the mark with respect to the very unusual setting of the question here. In short, I think that the Jacobian issue may be irrelevant here from a simulation perspective.
First, if you observe $Y$ and if the Jacobian of the transform $Y = g(X; \theta)$ does not depend on $\theta$, the Jacobian vanishes from the full conditionals in the Gibbs sampler and from the Metropolis-Hastings formula. If the Jacobian does depend on $\theta$ as well, then it has to be included in the conditionals and in the Metropolis-Hastings formula.
More central to the point raised by the question, I find the question utterly interesting and my first solution would have been yours, namely to use the Dirac mass transform of $(\theta,y)$ into $x$. However, it does not work as shown by the following toy example where one starts with a normal observation$$x\sim\mathcal{N}(0,\theta^{-2})$$ and an exponential prior on $\theta^2$, $$\theta^2\sim\mathcal{E}(1)$$ It is a standard derivation to show that the posterior distribution on $\theta^2$ [or conditional of $\theta^2$ given $x$ if you prefer] is a Gamma distribution$$\theta^2|x\sim\text{Ga}(3/2,1+x^2/2)$$Now, if one considers the transform $y=\theta x$, then $y|\theta\sim\mathcal{N}(0,1)$, which means that the distribution of $y$ does not depend on $\theta$, hence that $\theta$ and $y$ are independent. In summary, my toy example involves the distributions $$\theta^2\sim\mathcal{E}(1)\quad x\sim\mathcal{N}(0,\theta^{-2})\quad y=\theta x\sim\mathcal{N}(0,1)$$ which means that the posterior on $\theta$ given $y$ is the prior
The R code following your suggested implementation is
y=rnorm(1) #observation
T=1e4 #number of MCMC iterations
the=rep(1,T) #vector of theta^2's
for (t in 2:T){ #MCMC iterations
#step 2(a)
#true conditional of x on θ:
x=y/sqrt(the[t-1])
#step 2(b) with no R
#true conditional of θ on x:
the[t]=rgamma(1,shape=1.5,rate=1+.5*x^2)}
leads to a complete lack of fit. Actually, the lack of fit can be amplified by choosing a very large value for $y$.
The intuitive (and theoretically valid) explanation for this lack of fit is that the Gibbs sampler (or equivalently another MCMC sampler) should always condition on $y$, the sole observation. Therefore, when computing $x$ as a deterministic transform of $(y,\theta)$, this has no impact on the distribution of $\theta$ given $x$ and $y$: it is the same as the distribution of $\theta$ given $y$, given the Dirac on $x$.
In conclusion, the correct version of your algorithm is as follows:
- choose initial $\Theta^{(0)}$; m=0
- while m $\leq$ NMaxIterations do {
(a) transform $X^{(m)} = g^{-1}(y; \Theta^{(m)})$
(b) sample
$R^{(m+1)} \sim p(R | \Theta^{(m)}, y)$ , and then
$\Theta^{(m+1)} \sim p(\Theta | R^{(m+1)}, y)$
(c) m=m+1 }- $\Theta^* = \text{mean}((\Theta^{(N_B)}, ..., \Theta^{(NMaxIterations)}))$, where $N_B$ is some burn-in period.
This clearly means that the completion step 2(a) is unnecessary.
Best Answer
For online tutorials, there are
Practical Markov Chain Monte Carlo, by Geyer (Stat. Science, 1992), is also a good starting point, and you can look at the MCMCpack or mcmc R packages for illustrations.