Solved – Is the Gaussian Kernel still a valid Kernel when taking the negative of the inner function

gaussian processkernel trickmachine learningsvm

In support vector machines (SVMs) and other Kernel based methods, like Gaussian processes, the Kernel replaces the inner product of two feature vectors $k(x_n,x_m)=x_n^Tx_m$. The Gaussian kernel

$$k(x_n,x_m) = \exp(- \frac{\theta}{2} \lVert x_n-x_m\rVert^2)$$ is a valid kernel function when $\theta \ge 0$. $\theta$ then plays the role of the inverse variance (precision).

My question is, is this function still a valid kernel function for SVMs and Gaussian processes when $\theta<0$?

Best Answer

This reasoning is essentially that of Sycorax's answer, but no need to resort to that theorem:

Consider two distinct points $x$ and $y$. For $\theta<0$, their Gram matrix is $$ \begin{bmatrix} k(x, x) & k(x, y) \\ k(x, y) & k(y, y) \end{bmatrix} = \begin{bmatrix} 1 & \alpha \\ \alpha & 1 \end{bmatrix} $$ where $\alpha = k(x, y) = \exp\left( - \frac{\theta}{2} \lVert x - y \rVert^2 \right) = \exp\left( \tfrac12 \lvert{\theta}\rvert \lVert x - y \rVert^2 \right) > 1$, since the argument to $\exp$ is strictly positive.

The characteristic polynomial of this Gram matrix gives $(\lambda - 1)^2 - \alpha^2 = 0$, so that $\lvert \lambda - 1 \rvert = \alpha$, and the eigenvalues of this matrix are $1 + \alpha$ and $1 - \alpha$. Since $\alpha > 1$, that second eigenvalue is negative, and the kernel is not psd.

Related Solutions

Solved – Scikit-learn’s Gaussian Processes: How to include multiple hyperparameters in kernel/cov function

On scikit-learn==0.14.1.

$\theta_0$ can be a vector. The following code works for me.

import numpy as np
from sklearn.gaussian_process import GaussianProcess
from sklearn.datasets import make_regression
X, y = make_regression()
bad_theta = np.abs(np.random.normal(0,1,100))
model = GaussianProcess(theta0=bad_theta)
model.fit(X,y)

You can pass any kernel you want as the parameter corr. The following is the radial basis function that sklearn uses for Gaussian processes.

def squared_exponential(theta, d):
    """
    Squared exponential correlation model (Radial Basis Function).
    (Infinitely differentiable stochastic process, very smooth)::

                                            n
        theta, dx --> r(theta, dx) = exp(  sum  - theta_i * (dx_i)^2 )
                                        i = 1

    Parameters
    ----------
    theta : array_like
        An array with shape 1 (isotropic) or n (anisotropic) giving the
        autocorrelation parameter(s).

    dx : array_like
        An array with shape (n_eval, n_features) giving the componentwise
        distances between locations x and x' at which the correlation model
        should be evaluated.

    Returns
    -------
    r : array_like
        An array with shape (n_eval, ) containing the values of the
        autocorrelation model.
    """

    theta = np.asarray(theta, dtype=np.float)
    d = np.asarray(d, dtype=np.float)

    if d.ndim > 1:
        n_features = d.shape[1]
    else:
        n_features = 1

    if theta.size == 1:
        return np.exp(-theta[0] * np.sum(d ** 2, axis=1))
    elif theta.size != n_features:
        raise ValueError("Length of theta must be 1 or %s" % n_features)
    else:
        return np.exp(-np.sum(theta.reshape(1, n_features) * d ** 2, axis=1))

It looks like you're doing something pretty interesting, btw.

Solved – Calculate number of support vectors in SVM

Notice that since $y_{i} = \pm 1$, you can rewrite, $$ \alpha_i = \frac{1}{y_i} \left[ \frac{1}{y_i} - \frac{1}{N} \sum_{n=1}^N \frac{1}{y_n}\right] = y_i \left[y_i - \frac{1}{N} \sum_{n=1}^N y_n\right] = 1 - y_{i}\frac{N^{+}-N^{-}}{N} $$ where $N^{+}$ and $N^{-}$ are the number of samples in each of the classes. You can check that $\sum_{n}\alpha_{n}y_{n} = 0$. Also $\alpha_{n} > 0$, that is, all vectors are support vectors.

As for the margin, $$ ||\omega|| = \sum_{n}\alpha^{2} = N\left[1-\left(\frac{N^{+}-N^{-}}{N}\right)^{2}\right] $$

Best Answer

Related Solutions

Solved – Scikit-learn’s Gaussian Processes: How to include multiple hyperparameters in kernel/cov function

Solved – Calculate number of support vectors in SVM

Related Question