Analysis – Transforming a Distance Function to a Kernel

analysishilbert-spacesmachine learning

Fix a domain $X$: Let $d : X \times X \rightarrow \mathbb{R}$ be a distance function on $X$, with the properties

$d(x,y) = 0 \iff x = y$ for all $x,y$
$d(x,y) = d(y,x)$ for all $x,y$

Optionally, $d$ might satisfy the triangle inequality (making it a metric), but this is not necessarily the case.

In machine learning, a kernel is a function $K : X \times X \rightarrow \mathbb{R}$ with the properties

$K(x,y) = 1 \iff x = y$
$K(x,y) = K(y,x)$
for all $n$ and all sequences $x_1, \ldots, x_n$, the Gram matrix $\mathbf{K}$ with $\mathbf{K}_{ij} = K(x_i, x_j)$ is positive definite.

A space that admits a kernel is handy in ML, and often if all you have is a distance function, you can compute a kernel-like object in the form

$$ \kappa_d(x, y) = \exp(- \gamma d^2(x,y)) $$

The question:

Under what conditions on $d$ is the resulting function $\kappa_d$ a
kernel ?

Best Answer

It seems to me you might be looking for a variant of Schoenberg's theorem:

(Think $\Psi(x,y) = d(x,y)^2$ in the following).

The function $\Psi \colon X \times X \to \mathbb{R}$ is said to be a kernel of conditionally negative type if

$\Psi(x,x) = 0$ for all $x \in X$.
$\Psi(x,y) = \Psi(y,x)$ for all $x,y \in X$.
For all $n \in \mathbb{N}$, all $x_1,\dots,x_n \in X$ and all $c_1,\dots,c_n \in \mathbb{R}$ such that $\sum_{i=1}^n c_i = 0$ the inequality $$ \sum_{i,j=1}^n c_i c_j \Psi(x_i,x_j) \leq 0 $$ holds.

A positive semidefinite kernel is a kernel in your sense but without the condition $K(x,x) = 0$ iff $x = 0$ (presumably you meant $K(x,x)=1$ here, otherwise this would be incompatible with positive definiteness) and with positive definiteness weakened to positive semi-definiteness.

Theorem. For a symmetric function $\Psi\colon X \times X \to \mathbb{R}$ with $\Psi(x,x) = 0$ for all $x$ the following are equivalent:

$\Psi$ is a kernel of conditionally negative type.

The function $K(x,y) = \exp(-\gamma \Psi(x,y))$ is a positive semidefinite kernel for all $\gamma \geq 0$.

You can find a proof of this as Theorem C.3.2 on page 370 of the book Kazhdan's Property $(T)$ by Bekka, de la Harpe, and Valette, (link goes to Bekka's homepage). The rest of this appendix might have some useful information, too.

Added: Schoenberg's original article: Metric spaces and positive definite functions, Trans. Amer. Math. Soc. 44 (1938), 522-536.

Best Answer

Related Solutions

A quadratic inequality related to the $\log$ function

Related Question