Solved – Scikit-learn’s Gaussian Processes: How to include multiple hyperparameters in kernel/cov function

gaussian processmachine learningscikit learn

I'm using the scikit-learn's implementation of Gaussian processes. A simple thing to do is to combine multiple kernels as a linear combination to describe your time series properly. So I'd like to include both the squared exponential kernel and the periodic kernel. Linear combinations of valid kernels produce valid kernels, and same goes for multiplying valid kernels (given by Rasmussen and Williams).

Unfortunately I haven't figured out how to give the theta parameters properly to the model. For example, if we have:

$$
k_{Gauss}(x,x') = \exp{(\theta (x-x')^2)}
$$

then it is alright (this is how the squared-exponential kernel is defined in scikit-learn). But if I wanted:

$$
k_{Gauss}(x,x') = \theta_0 \exp{(\theta_1 (x-x')^2)}
$$

then it is impossible, it seems. The $\mathbf{\theta}$ thing is supposed to be an array, in case you have multiple dimensions/features (even though scikit-learn doesn't support multidimensional GPs, someone developed it, and it will be merged soon). So there is one row with the columns being the parameter in such-and-such dimension. But you cannot have more rows, otherwise it screams at you.

So question: has anyone actually been able to use kernels that use more than one hyperparameter? If so, what am I doing wrong? And if it is indeed not possible with the current code in scikit, does anyone have some tips on how to extend it so that it can? This is a really important feature that I need. Thanks.

Best Answer

On scikit-learn==0.14.1.

$\theta_0$ can be a vector. The following code works for me.

import numpy as np
from sklearn.gaussian_process import GaussianProcess
from sklearn.datasets import make_regression
X, y = make_regression()
bad_theta = np.abs(np.random.normal(0,1,100))
model = GaussianProcess(theta0=bad_theta)
model.fit(X,y)

You can pass any kernel you want as the parameter corr. The following is the radial basis function that sklearn uses for Gaussian processes.

def squared_exponential(theta, d):
    """
    Squared exponential correlation model (Radial Basis Function).
    (Infinitely differentiable stochastic process, very smooth)::

                                            n
        theta, dx --> r(theta, dx) = exp(  sum  - theta_i * (dx_i)^2 )
                                        i = 1

    Parameters
    ----------
    theta : array_like
        An array with shape 1 (isotropic) or n (anisotropic) giving the
        autocorrelation parameter(s).

    dx : array_like
        An array with shape (n_eval, n_features) giving the componentwise
        distances between locations x and x' at which the correlation model
        should be evaluated.

    Returns
    -------
    r : array_like
        An array with shape (n_eval, ) containing the values of the
        autocorrelation model.
    """

    theta = np.asarray(theta, dtype=np.float)
    d = np.asarray(d, dtype=np.float)

    if d.ndim > 1:
        n_features = d.shape[1]
    else:
        n_features = 1

    if theta.size == 1:
        return np.exp(-theta[0] * np.sum(d ** 2, axis=1))
    elif theta.size != n_features:
        raise ValueError("Length of theta must be 1 or %s" % n_features)
    else:
        return np.exp(-np.sum(theta.reshape(1, n_features) * d ** 2, axis=1))

It looks like you're doing something pretty interesting, btw.