Kernel Trick – Proof and Understanding that the Linear Kernel is a Kernel

kernel trickmachine learning

I can't seem to wrap my head around the following proof.

I want to show that the linear kernel is a kernel because its Gram matrix is positive semi-definite.

There is plenty of information on the internet, but I don't understand the last step:
linear kernel: $k(x,x')=<x,x>=\sum_{a=0}^N x_ax_a'$
$$\sum_{i,j} c_i c_j k(x_i,x_j) \ge 0$$
$$\sum_{i,j} c_i c_j k(x_i,x_j) = \sum_{i,j} c_i c_j <x_i,x_j> =\sum_{i,j} c_i c_j \sum_{a=0}^N x_ax_a'=\sum_{i,j} \sum_{a=0}^N c_i x_ac_jx_a' $$
So far so good. But now, why can I say that this is equal to:
$$||\sum_{i} \sum_{a=0}^N c_i x_a||^2 \ge 0$$
The $\ge$ is clear just why is the formula transformed in that way?

A similar approach for the dot-product is shown here: click
Thanks a lot!

Best Answer

First, your definition should be corrected as $$k(x, x') = \langle x, x\color{red}{'}\rangle = \sum_{a = 1}^N x_a x_a'. $$ The problem of your derivation is that you didn't distinguish $x_i = (x_{i,1}, \ldots, x_{i,N})^T$ and $x_j = (x_{j, 1}, \ldots, x_{j, N})^T$ very clearly. Let's say you have $p$ vectors $\{x_1, \ldots, x_p\}$ under consideration. It follows that (what you provided was actually incorrect): \begin{align} & \sum_{i, j} c_i c_j k(x_i, x_j) \\ = & \sum_{i = 1}^p \sum_{j = 1}^p c_i c_j \sum_{a = 1}^N x_{i,a}x_{j, a} \\ = & \sum_{i = 1}^p \sum_{j = 1}^p \sum_{a = 1}^N c_i x_{i,a} c_j x_{j, a} \\ = & \sum_{a = 1}^N \left(\sum_{i = 1}^p c_i x_{i, a}\right) \left(\sum_{j = 1}^p c_j x_{j, a}\right) \qquad \text{ change the order of summation}\\ = & \sum_{a = 1}^N \left(\sum_{i = 1}^p c_i x_{i, a}\right)^2 \geq 0. \qquad i, j \text{ are just dummy indices} \end{align}

Best Answer

Related Solutions

Solved – Non-linear SVM classification with RBF kernel

Kernel Trick – How to Prove Hamming Distance as a Kernel?

Related Question