Gaussian Processes Priors – How to Select Priors for Gaussian Processes (GP)

gaussian processgibbsmetropolis-hastingspriorr

I am trying to select a prior for the covariance parameters of my Gaussian Process (GP) and have been running into numerical problems with my MCMC code. My model is the following:

$$Y = D\beta + GP(0,\sigma^2R(\psi))$$

where $R$ is the following correlation matrix:

$$R_{i,j}(\psi)=\exp(\psi(x_i-x_j)^2)$$

My priors for $\beta$ and $\sigma^2$ are the following:

$$p(\beta)\propto 1$$ and
$$\sigma^2\sim IG(a,b)$$

so what I am trying to figure out is what are possible appropriate choices of prior for $\psi$.
So far I have been trying to place an inverse gamma prior on $\psi$ but that has not been going very well.

Here is an update: Here is further problem formulation if it helps

Likelihood: $$f(\beta,\sigma^2,\psi|x)\propto|\sigma^2R(\psi)|^{-1/2}\exp(-\frac{1}{2}(Y-D\beta)'R(\psi)^{-1}(Y-D\beta))$$
Prior: $$p(\beta,\sigma^2,\psi)=p(\beta|\sigma^2,\psi)p(\sigma^2|\psi)p(\psi)\propto\sigma^{-a-1}\exp(-b/\sigma^2)p(\psi)$$

Now, based on my posterior under this prior, I have been trying to code up a Metropolis within Gibbs sampler with the following full conditions:

$$\beta|\sigma^2,\psi,Y\sim N(\hat\beta,D'(\sigma^2 R(\psi))^{-1}D)$$
$$\sigma^2|\beta,\psi,Y\sim IG(a,b)$$
$$\psi|\beta,\sigma^2,Y\propto |R(\psi)|^{-1/2}\exp(-\frac{1}{2\sigma^2}(Y-D\beta)'R(\psi)^{-1}(Y-D\beta))p(\psi)$$

So I sample $\beta$ and $\sigma^2$ within a gibbs step and then use a M-H step for sampling from $\psi$ but I am having horrible numerical problems when sampling from $\psi$ (basically my acceptance ratio is 0/0). Finally, I'll post some code to if that helps.

Code:

x = sort(runif(5,0,1))
y = exp(-1.4*x)*cos(7*pi*x/2)

plot(x,y,type='l',ylim=c(-1,1))

tau = as.matrix(dist(x,upper=T,diag=T))

corrR = function(psi,tau){
    R = exp(-tau^2*psi)
    return(R)
}

psi.cond = function(psi,r,sig,D,beta,Y){
    d = .01
    e = .01
    ans = det(r)^(-.5)*exp(-.5/sig*t(Y-D*beta)%*%solve(r)%*%(Y-D*beta))*psi^-(d+1)*exp(-e/psi)
    #ans = -.5*log(det(r))-.5/sig*t(Y-D*beta)%*%solve(r)%*%(Y-D*beta)-(d+1)*log(psi)-e/psi
    return(as.real(ans))
}

D = rep(1,length(x))
Y = as.matrix(y)

m = length(x)
a = .01
b = .01

B = 100
beta = rep(NA,B)
sigma = c(1,rep(NA,B-1))
psi = c(120,rep(NA,B-1))

for(i in 1:B){

    R = corrR(psi[i],tau)
    bhat = as.real(solve(t(D)%*%solve(R)%*%D)%*%t(D)%*%solve(R)%*%Y)

    beta[i] = rnorm(1,bhat,t(D)%*%solve(sigma[i]*R)%*%D)
    sigma[i+1] = 1/rgamma(1,(m+2*a)/2,(as.real(t(Y-D*beta[i])%*%solve(R)%*%(Y-D*beta[i]))+2*b)/2)


    log.xi = rnorm(1,log(psi[i]),.1)
    xi = exp(log.xi)

    u = runif(1)

    R.xi = corrR(xi,tau)
    R.psi = corrR(psi[i],tau)

    temp = (psi.cond(xi,R.xi,sigma[i],D,beta[i],Y)*(1/psi[i]))/(psi.cond(psi[i],R.psi,sigma[i],D,beta[i],Y)*(1/xi))
    alpha = min(1,temp)

    if(u <= alpha){
        psi[i+1] = xi
    }else{
        psi[i+1] = psi[i]
    }

}

Best Answer

I figured it out. Apparently when using the squared exponential correlation function lots of bad things numerically can happen (although theoretically everything is fine). So instead of using the power 2 in my correlation function if the value is changed to 1.9999 then everything is fine, i.e.,

$$R_{i,j}(\psi)=\exp(-\psi|x_i-x_j|^{1.9999})$$

Related Question