Is the minus absolute value of the difference a kernel

machine learningreproducing-kernel-hilbert-spacesstatistics

Precisely, is $k(x_i- x_j) = -\|x_i-x_j\| \quad x_i, x_j \in \mathbb R$ a valid kernel?

I know that the absolute value of kernel formulation is not a valid kernel since it is not positive semi-definite. However, the above-mentioned kernel in the literature is referred to as an energy kernel and it is used to calculate energy distance.

https://en.wikipedia.org/wiki/Energy_distance

https://pages.stat.wisc.edu/~wahba/stat860public/pdf4/Energy/EnergyDistance10.1002-wics.1375.pdf

In brief, is it a kernel ($k$), if not why it is referred to as an energy kernel?

Best Answer

Different authors use the term "kernel" differently. In the machine learning community, some similarity measures are called kernels, such as $$k(x,y) = \begin{cases} 1 , \text{ if } \|x-y\| \le \epsilon \\ 0 , \text{ else} \end{cases}$$ that is popular in spectral graph clustering. However, by choosing a data set $\cal X = \{0,1,2\}$ and $ \epsilon = 1$, you end up with a kernel matrix that has negative eigenvalues, so this kernel is not PSD.

On the other hand, mathematics tend to use kernels as a synonym for PSD kernels. Mercer's theorem guarantees that then, and only then, does there exist a Hilbert space $\cal H$ in which the inner product is described by $k$, in the sense that there exist map $\phi$ such that $$\forall x,y \in \cal X: \langle \phi(x) , \phi(y) \rangle_\cal H = k(x,y)$$

Now to your kernel: It is not PSD. Take $\cal X = \{0,1\}$ and check the resulting kernel matrix.

Best Answer

Related Solutions

Derivative of Mahalanobis pairwise distance matrix respect to transformation matrix

Update

Convergence theorems for Kernel SVM and Kernel Perceptron

Related Question