Solved – Kernel matrix is not positive definite

gaussian processregression

I tried to implement a Gaussian process in octave.

As a starting point I used the algorithm described on page 19 of Rasmussens GP book (http://www.gaussianprocess.org/gpml/).

As a covariance matrix I used the squared exponential function (as it is used in the book as well):

function re = k(x1, x2)
    re = exp(-(1/2.0) * abs(x1.-x2).^2);
endfunction

And calculate the covariance matrix (of the training inputs with):

# Calculate covariance matrix
s = size(X);
K = [];
for i = 1:s
    for j = 1:i
        re = k(X(i), X(j));
        K(i,j) = re;
        K(j,i) = re;
    endfor
endfor

But for some reason the resulting covariance matrix K sometimes is not positive definite (depending on inputs X).

So can anyone tell me what I'm doing wrong here, please?
And is there a way to test whether a covariance function results in a positive definite covariance matrix? Since the squared exponential function seems to be a covariance function, I assumed it should create a positive definite matrix.

Best Answer

  1. Most research in kernel methods focuses on Mercer kernels, which have two properties: (1) the function is symmetric: $K(x,y)=K(y,x)$ and (2) the function is positive semi-definite (p.s.d.). The Gaussian covariance function is certainly p.s.d., but I can't recall if it is also p.d. -- perhaps you've mistakenly omitted the "semi-" from your mental definition?

    Alternatively, this is potentially a numerical issue. I'm not sure how you've concluded that the result is not p.s.d., but, for example, spectral decomposition algorithms will sometimes commit errors due to finite-precision arithmetic, so some of the smallest eigenvalues will be slightly negative (on the order of whatever the error is in your numerical software). These minute errors can be safely ignored in most practical applications. Adding a larger, positive number to the diagonal can suppress this.

  2. Testing if an arbitrary covariance function is a valid kernel relies on Mercer's theorem, which is a bit involved for this forum. I'd recommend referring to his research.