Kernel Trick – Proving 2nd Order Polynomial Kernel is Positive Semi-Definite in Machine Learning

kernel tricklinear algebramachine learningself-study

I'm trying to prove that the 2nd order polynomial kernel, $K(x_i, x_j) = (x_i^Tx_j + 1)^2$ is a valid kernel which satisfies the following conditions:

K is symmetric, that is, $K(x_i, x_j) = K(x_j, x_i)$.
K is positive semi-definite, that is, $\forall v \space\space v^TKv \geq 0.$

We can actually prove that second-order polynomial kernel function is a valid kernel by deriving the corresponding transformation function $\phi(x) = [1, \sqrt{2}x_1, …, \sqrt{2}x_d, x_1x_1, x_1x_2, …, x_1x_d, x_2x_1, …x_dx_d]^T$ where $d$ is the number of features (dimensionality). But I do want to prove that two conditions listed above holds for the given kernel function.

My attempts:

Symmetry is rather straightforward:
$$(x_i^Tx_j + 1)^2 = x_i^Tx_jx_i^Tx_j + 2x_i^Tx_j + 1 = A \in \mathbb{R}$$
$$(x_j^Tx_i + 1)^2 = x_j^Tx_ix_j^Tx_i + 2x_j^Tx_i + 1 = B \in \mathbb{R}$$
It can be observed that $A^T = B$, and since they are scalars, $A = A^T = B \implies A = B$.
For the second condition, my attempt is as follows:

$$v^TK = [\sum_{i=1}^{n}(x_i^Tx_1 + 1)^2 v_i \space\space … \space\space \sum_{i=1}^{n}(x_i^Tx_n + 1)^2 v_i] \\
v^TKv = \sum_{j=1}^{n}\left(\sum_{i=1}^{n}(x_i^Tx_j + 1)^2 v_i\right) v_j$$$$
v^TKv = \sum_{j=1}^{n}\sum_{i=1}^{n}(x_i^Tx_j + 1)^2 v_i v_j$$
Now I proceed with expanding the term $(x_i^Tx_j + 1)^2$:
$$v^TKv = \sum_{j=1}^{n}\sum_{i=1}^{n}(x_i^Tx_jx_i^Tx_j + 2x_i^Tx_j + 1) v_i v_j $$$$ = \sum_{j=1}^{n}\sum_{i=1}^{n}x_i^Tx_jx_i^Tx_jv_i v_j + 2x_i^Tx_jv_i v_j + v_i v_j$$
After this point, I don't know how to proceed. I feel like I have to use double sum property:
$$\sum_{i=1}^{n}\sum_{j=1}^{n}a_ib_j = \sum_{i=1}^{n}a_i \cdot \sum_{i=1}^{n}b_i$$

But I can eliminate only the term with $v_iv_j$.
$$v^TKv = (\sum_{j=1}^{n}v_i\sum_{i=1}^{n}v_i) + 2(\sum_{i=1}^{n}\sum_{j=1}x_i^Tx_jv_iv_j) + \sum_{i=1}^{n}\sum_{j=1}^{n}x_i^Tx_jx_i^Tx_jv_i v_j$$

$$ =(\sum_{i=1}^{n}v_i)^2 +2(\sum_{i=1}^{n}\sum_{j=1}x_i^Tx_jv_iv_j) + \sum_{i=1}^{n}\sum_{j=1}^{n}x_i^Tx_jx_i^Tx_jv_i v_j$$
First term is greater than or equal to zero, therefore it can be cancelled out. But, for the rest, I cannot come up with any simplification.
I have two questions:

How should I proceed further at this point?
How can one prove that any polynomial kernel with degree $p$ is PSD using this approach?

Thank you for your time.

Best Answer

Note that $K$ is the Hadamard product of the matrix $M = (m_{ij})$ with itself, where $m_{ij} = 1 + x_i^Tx_j$, which can be written as (assuming the input vector $x_i$ is a $p \times 1$ column vector): \begin{align} M = ee^T + XX^T, \end{align} where \begin{align} e = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1\end{bmatrix} \in \mathbb{R}^{n \times 1}, \quad X = \begin{bmatrix} x_1^T \\ x_2^T \\ \vdots \\ x_n^T \end{bmatrix} \in \mathbb{R}^{n \times p}. \end{align}

Hence for any $v \in \mathbb{R}^{n \times 1}$, we have \begin{align} v^TMv = v^Tee^Tv + v^TXX^Tv = (v^Te)^2 + (X^Tv)^T(X^Tv) \geq 0, \end{align} showing $M$ is positive semi-definite (PSD).

Now the result follows from $K = M \circ M$ and Schur product theorem:

If two matrices $M_1$ and $M_2$ are PSD, then their Hadamard product $M_1 \circ M_2$ is also PSD.

You attempt should also work, if assisted with some common "trace tricks" when dealing with quadratic forms (which is essentially the trick used by the first proof in the Schur product theorem link).

The second term $\sum_{i = 1}^n\sum_{j = 1}^n v_ix_i^Tx_jv_j$ is actually the Euclidean norm of the vector $v_1x_1 + \cdots + v_nx_n$, hence it is nonnegative.

To prove $\sum_{i = 1}^n\sum_{j = 1}^n v_i(x_i^Tx_j)^2v_j \geq 0$, denote the order $p$ matrix $\sum_k v_kx_kx_k^T$ by $A$, which is clearly symmetric. By the linearity of the trace operator $\operatorname{tr}$ and its property $\operatorname{tr}(M_1M_2) = \operatorname{tr}(M_2M_1)$, we have \begin{align} & \sum_{i = 1}^n\sum_{j = 1}^n v_i(x_i^Tx_j)^2v_j \\ =& \sum_{i = 1}^n\sum_{j = 1}^n v_ix_i^Tx_jx_i^Tx_jv_j \\ =& \sum_{i = 1}^n\sum_{j = 1}^n v_ix_i^Tx_jx_j^Tx_iv_j \\ =& \sum_{i = 1}^nv_ix_i^TAx_i \\ =& \sum_{i = 1}^n\operatorname{tr}(v_ix_i^TAx_i) \\ =& \sum_{i = 1}^n\operatorname{tr}(v_iAx_ix_i^T) \\ =& \operatorname{tr}\left(A\sum_{i = 1}^nv_ix_ix_i^T\right) \\ =& \operatorname{tr}(A^2) = \operatorname{tr}(A^TA) \geq 0. \end{align} This completes the proof.

Best Answer

Related Solutions

Solved – Prove that a kernel is conditionally positive definite

Solved – Is $\min(f(x)g(y),f(y)g(x))$ a positive definite kernel

Related Question