Solved – METROPOLIS-HASTINGS with likelihood

bayesianfittinglikelihoodmetropolis-hastings

I am trying to set up a Metropolis-Hastings algorithm in Matlab in order to estimate the parameters ${\theta}$ (it is a vector of 5 elements) to fit a curve to a set of data $D={X_i,Y_i,\delta_i}$. $X$ and $Y$ are the data to relate, $\delta$ indicates if the data is censored or not.

I can estimate the log-likelihood function $log(L(\theta|D))=\delta_i log[f_y(y_i,x_i,\theta)]+(1-\delta_i) log[1-F_w(y_i,x_i,\theta)] $

Moreover, I don't have prior information on the distribution of the parameters. The only things that I know are:

$0<\theta_1<100$; $-5<\theta_2<0$; $0<\theta_3<1.5$; $1<\theta_4<6$; $0<\theta_5<1.5$.

For these reasons I pick a random estimation of $\theta$ in this boundaries, let's call it $\theta_0$. Moreover, I can define the candidate distribution $q_i$ (for each parameter) as a uniform distribution with lower and upper bound as specified. Supposing that there is no statistical correlation, the $q$ is given by the product of $q_i$.

Starting from $\theta_0$, i calculate the log-likelihood $log(L(\theta_0|D))$ and $q_0$, then I perturbate the $\theta_{0_{i}}$ using a normal distribution $N(\theta_{0_{i}};j_i)$ where $j_i=\theta_{0_{i}}*0.1$; what I find is a new vector $\theta_1$ for which I calculate the log-likelihood and the candidate distribution $q$ as specified before.

$\theta_1$ will be accepted as it is if the ratio $r=[log(L(\theta_1|D))*q_1]/[log(L(\theta_0|D))*q_0]$ is bigger than unity or if it bigger than $U(0,1)$ otherwise it will be set equal to $\theta_0$ and a new perturbation will be done to find $\theta_2$ … and so on.

If this algorithm is right, there is a problem. my log-likelihood is negative then if $\theta_1$ has a higher likelihood, the value of $r$ will be smaller than unity. if I invert the ratio, since there is the risk that $q_1=0$, the ratio can even be not defined.

The approach that I am trying to replicate is from an article in which they define the algorithm as follows:

1: set

  1. set an initial value for the chain: $\theta_c=\theta_0$ and choose $j$
  2. compute a=loglikelihood($\theta_c$)+logprior($\theta_c$)
  3. draw $\theta_p$ from $N(\theta_c;j_i)$
  4. compute b==loglikelihood($\theta_p$)+logprior($\theta_p$)
  5. let H=min(1,exp(b-a)) and draw r from U(0,1)
  6. if H>r then
  7. $\theta_c=\theta_p$
  8. a=b
  9. repeat steps from 3 to 8 until 100000 posteriors are sampled

Is it the same algorithm? Why did they use the exp(b-a) instead then the ratio that is usually used in books?

Best Answer

Since a=loglikelihood($\theta_c$)+logprior($\theta_c$) and b==loglikelihood($\theta_p$)+logprior($\theta_p$), it follows that

$$\exp(b-a) = \dfrac{likelihood(\theta_p)\times prior(\theta_p)}{likelihood(\theta_c)\times prior(\theta_c)},$$

which is the "usual ratio" you refer to. The instrumental distribution used by the authors is normal (this is where you sample the $\theta$s from).

Moreover, you don't need to come up with intricate instrumental distributions, just define your log posterior as:

if($0<\theta_1<100$; $-5<\theta_2<0$; $0<\theta_3<1.5$; $1<\theta_4<6$; $0<\theta_5<1.5$.) return log posterior

else return $-\infty$

This will do the truncation job for you.

I also recommend the following R packages for an easier implementation:

  1. mcmc. Look at the command metrop().
  2. Rtwalk. A self adaptive MCMC sampler based on Metropolis steps.