[Math] Mercer’s Theorem for some simple kernels

functional-analysislinear algebramachine learning

I am trying to better understand Mercer's Theorem, by applying it to some specific kernels.

Background

Let $D \subset \mathbb{R}^N$ be a closed, bounded subset. We associate a function $K: D \times D \rightarrow \mathbb{R}$ with an operator (the Hilbert-Schmidt Integral Operator) $T_K: L^2(D) \rightarrow L^2(D)$ being
$$
[T_K\phi](x) = \int_{D} K(x,s) \phi(s) \, ds.
$$
We say that $K$ is positive semi-definite if for every set of $m$ vectors $(x_i)_{i \in \{ 1,2,\ldots,m\}} \in D$, the corresponding Gram matrix $[K(x_i, x_j)]_{ij}$ is positive semi-definite.

Mercer's Theorem states that if $K$ is symmetric, positive semi-definite, then it holds that
$$
K(s,t) = \sum_{j = 1}^{\infty} \lambda_j e_j(s)e_j(t), \quad \quad … (1)
$$
where $\lambda_j$ is the $j$-th eigenvalue of $T_K$, and $e_j$ is the corresponding eigenfunction.

Question

While I understand this statement in an abstract sense, I have a hard time working through the theorem with an actual kernel. Take two simple examples:

  • $K_1(x,y) = x^{\intercal}y$
  • $K_2(x,y) = (x^{\intercal}y)^2$

To be clear, $x, y$ are vectors in $\mathbb{R}^N$. It's easily checked that $K_1, K_2$ are symmetric and positive semi-definite. So in this case, Mercer's Theorem should be applicable. But I have trouble determining what exactly $T_{K_1}$ or $T_{K_2}$ are, and furthermore what the corresponding $\lambda_j, e_j$ are so that (1) will hold.

Can someone please enlighten me? Thanks.

Edit (some clarifications):

To see that $K_1$ is positive semi-definite, fix $(x_i)_{i \in \{ 1,2,\ldots,m\}} \in D$, and denote the matrix $X$ whose $i$-th column is $x_i$. Then $[K_1(x_i, x_j)]_{ij} = X^T X$. Thus, for any $z \in \mathbb{R}^N$, we have
$$
z^T X^T X z = \| Xz\|^2 \geq 0,
$$
proving positive semi-definiteness.

Best Answer

Here is a solution for $N=1$ and $D=[a,b]$.

Note that in both cases, the kernel can be factorized as $K(x,y)=f(x)f(y)$. Thus the eigenvalue equation simplifies to $$ \lambda \phi(x)=f(x)\int_a^b f(y)\phi(y)\,dy. $$ Hence, if $\lambda\neq 0$ (and this is the interesting case), then $\phi(x)=\alpha f(x)$ for some $\alpha\in \mathbb{R}$ and consequently $$ \lambda=\int_a^b f(x)^2\,dx. $$ To get a normalized eigenfunction, we must have $$ 1=\int_a^b \alpha^2 f(x)^2\,dx, $$ that is, $\alpha=\left(\int_a^b f(x)^2\,dx\right)^{-1/2}$. It follows that $$ \lambda \phi(x)\phi(y)=\left(\int_a^b f(x)^2\,dx\right)\left(\int_a^b f(x)^2\,dx\right)^{-1}f(x)f(y)=K(x,y), $$ as was expected from Mercer's theorem.

In the case of $K_1$ this gives the (only) non-zero eigenvalue $\lambda=\frac 1 3(b^3-a^3)$ with corresponding eigenfunction $\phi(x)=\lambda^{-1/2}x$, for $K_2$ we get $\lambda=\frac 1 5(b^5-a^5)$ and $\phi(x)=\lambda^{-1/2}x^2$.

In higher dimension, at least if $D=[a_1,b_1]\times\dots\times[a_N,b_N]$, one can still decompose $K$ as $K(x,y)=f_1(x)g_1(y)+\dots+f_n(x)g_n(y)$ and try a similar approach, but things get messier (I don't know if you get explicit results, haven't spelled it out).

Related Question