Solved – Validating Kernel Functions

kernel trickmachine learning

I'd appreciate help in clarifying my understanding of how to valid kernel functions, using the following two examples:

  1. $K(x, t) = x^Tt – (x^Tt)^2$
  2. $K(x, t) = e^{(x_1t_1)}$ where $x_1\ and\ t_1$ are the first elements in the $x\ and\ t$ vectors.

As I understand it, I can either:

  1. Find a feature map $\phi(x)$ such that $K(x, t) = \phi(x)^T \phi(t)$
  2. Build a Gram matrix $K$ and check if it is positive semi-definite.

For the first question, I built a Gram matrix using vectors $\begin{bmatrix}1 \\ 2 \end{bmatrix}$ and $\begin{bmatrix}2 \\ 2 \end{bmatrix}$ $\rightarrow$ $\begin{bmatrix}-20 && -30 \\ -30 && -56 \end{bmatrix}$

In row-echelon form, a pivot is negative: $\begin{bmatrix}60 && 90 \\ 0 && -11 \end{bmatrix}$

Thus (as I understand it) the kernel is not valid. Is this understanding correct?

As for the second question, it seems likes a variation of a Gaussian kernel, but I'm unclear as to the influence, if any, of the usage of only the first element in the argument vectors. How do I address the validity of this kernel (#2)?

Best Answer

A kernel is psd if and only if all Gram matrices are psd. Thus if you find an instance of a Gram matrix which is not psd, then the kernel is not psd; but finding a single psd Gram matrix does not prove that it is always psd.

So, yes, your first kernel is not psd. (Incidentally, I don't understand your notation $(x^T - t)^2$ at all.) In fact, you can find a simpler counterexample in this case: if $k(x, x) < 0$ for any $x$, then $k$ is not psd, since the Gram matrix of the set $\{ x \}$ is not psd.

For the second kernel: I again don't understand your notation $e^{(x_1, t_1)}$ at all.

  • If you somehow meant $e^{- (x_1 - t_1)^2}$, then yes, it is a valid kernel. This is because it's a valid kernel (the Gaussian) on one-dimensional inputs, and ignoring elements of your input vectors doesn't matter. That's because, if $k$ is a valid kernel on $\mathbb R$, then $k_d(x, y) = k(x_d, y_d)$ is valid on $\mathbb R^m$: if you have a set of points $\{x^{(a)}\}_{a=1}^n$ and associated weights $\alpha_a$, then $$ \sum_{i=1}^n \alpha_i k_d(x^{(i)}, x^{(j)}) \alpha_j = \sum_{i=1}^n \alpha_i k(x_d^{(i)}, x_d^{(j)}) \alpha_j \ge 0 $$ since $k$ is a valid kernel.

  • If you meant $e^{(x_1 - t_1)^2}$, then no, it's not a valid kernel, because the Gaussian kernel is not psd with a negative multiplier.

  • If you meant $e^{x_1 t_1}$, then yes, it is valid: $(x, t) \mapsto x_1 t_1$ is a valid kernel, and in general, if $k$ is a valid kernel then so is $e^k$ (proof).