Solved – Scikit-learn’s Gaussian Processes: How to include multiple hyperparameters in kernel/cov function

gaussian processmachine learningscikit learn

I'm using the scikit-learn's implementation of Gaussian processes. A simple thing to do is to combine multiple kernels as a linear combination to describe your time series properly. So I'd like to include both the squared exponential kernel and the periodic kernel. Linear combinations of valid kernels produce valid kernels, and same goes for multiplying valid kernels (given by Rasmussen and Williams).

Unfortunately I haven't figured out how to give the theta parameters properly to the model. For example, if we have:

$$
k_{Gauss}(x,x') = \exp{(\theta (x-x')^2)}
$$

then it is alright (this is how the squared-exponential kernel is defined in scikit-learn). But if I wanted:

$$
k_{Gauss}(x,x') = \theta_0 \exp{(\theta_1 (x-x')^2)}
$$

then it is impossible, it seems. The $\mathbf{\theta}$ thing is supposed to be an array, in case you have multiple dimensions/features (even though scikit-learn doesn't support multidimensional GPs, someone developed it, and it will be merged soon). So there is one row with the columns being the parameter in such-and-such dimension. But you cannot have more rows, otherwise it screams at you.

So question: has anyone actually been able to use kernels that use more than one hyperparameter? If so, what am I doing wrong? And if it is indeed not possible with the current code in scikit, does anyone have some tips on how to extend it so that it can? This is a really important feature that I need. Thanks.

Best Answer

On scikit-learn==0.14.1.

$\theta_0$ can be a vector. The following code works for me.

import numpy as np
from sklearn.gaussian_process import GaussianProcess
from sklearn.datasets import make_regression
X, y = make_regression()
bad_theta = np.abs(np.random.normal(0,1,100))
model = GaussianProcess(theta0=bad_theta)
model.fit(X,y)

You can pass any kernel you want as the parameter corr. The following is the radial basis function that sklearn uses for Gaussian processes.

def squared_exponential(theta, d):
    """
    Squared exponential correlation model (Radial Basis Function).
    (Infinitely differentiable stochastic process, very smooth)::

                                            n
        theta, dx --> r(theta, dx) = exp(  sum  - theta_i * (dx_i)^2 )
                                        i = 1

    Parameters
    ----------
    theta : array_like
        An array with shape 1 (isotropic) or n (anisotropic) giving the
        autocorrelation parameter(s).

    dx : array_like
        An array with shape (n_eval, n_features) giving the componentwise
        distances between locations x and x' at which the correlation model
        should be evaluated.

    Returns
    -------
    r : array_like
        An array with shape (n_eval, ) containing the values of the
        autocorrelation model.
    """

    theta = np.asarray(theta, dtype=np.float)
    d = np.asarray(d, dtype=np.float)

    if d.ndim > 1:
        n_features = d.shape[1]
    else:
        n_features = 1

    if theta.size == 1:
        return np.exp(-theta[0] * np.sum(d ** 2, axis=1))
    elif theta.size != n_features:
        raise ValueError("Length of theta must be 1 or %s" % n_features)
    else:
        return np.exp(-np.sum(theta.reshape(1, n_features) * d ** 2, axis=1))

It looks like you're doing something pretty interesting, btw.

Related Solutions

Solved – How to apply a Gaussian radial basis function kernel PCA to nonlinear data

The first problem seems to be that the sign of gamma is wrong (it should be negative: $-15$, as in the definition of the kernel, not as in your code). Alternatively, use exp(-gamma * mat_sq_dists).

The second problem is that you clobber the eigenvectors with your invocation of zip's when you sort the list. The $i$-th eigenvector is eigvecs[:,i], not eigvecs[i,:], according to scipy.linalg.eigh (also: you should prefer eigh to eig because you have a symmetric real matrix).

Replace

< gamma = 15
> gamma = -15

and (to get ordered, real eigenvalues)

< eigvals, eigvecs = np.linalg.eig(K)
> eigvals, eigvecs = scipy.linalg.eigh(K)

and

< eigvals, eigvecs = zip(*sorted(zip(eigvals, eigvecs), reverse=True))
< X_pc1 = eigvecs[0]
> X_pc1 = eigvecs[:,99]

Finally, you can examine scikit-learn's own implementation here.

Solved – What are the various basic kernels available

Neural networks are not a kernel, they are a learning algorithm.

Plenty of kernel functions exist, such as:

sigmoid, popular in the early days of kernel methods due to their influence in neural networks; not really used heavily now
Tanimoto/Jaccard/diffusion, popular for binary features
tree/graph kernels, popular in natural language processing
histogram kernel, popular in image processing -- essentially it's a very fast approximation to the RBF kernel

The right kernel depends very much on the nature of the data. Often the best kernel is a custom-made one, particularly in bioinformatics. The Gaussian/RBF and linear kernels are by far the most popular ones, followed by the polynomial one.

Best Answer

Related Solutions

Solved – How to apply a Gaussian radial basis function kernel PCA to nonlinear data

Solved – What are the various basic kernels available

Related Question