Fix a domain $X$: Let $d : X \times X \rightarrow \mathbb{R}$ be a distance function on $X$, with the properties
- $d(x,y) = 0 \iff x = y$ for all $x,y$
- $d(x,y) = d(y,x)$ for all $x,y$
Optionally, $d$ might satisfy the triangle inequality (making it a metric), but this is not necessarily the case.
In machine learning, a kernel is a function $K : X \times X \rightarrow \mathbb{R}$ with the properties
- $K(x,y) = 1 \iff x = y$
- $K(x,y) = K(y,x)$
- for all $n$ and all sequences $x_1, \ldots, x_n$, the Gram matrix $\mathbf{K}$ with $\mathbf{K}_{ij} = K(x_i, x_j)$ is positive definite.
A space that admits a kernel is handy in ML, and often if all you have is a distance function, you can compute a kernel-like object in the form
$$ \kappa_d(x, y) = \exp(- \gamma d^2(x,y)) $$
The question:
Under what conditions on $d$ is the resulting function $\kappa_d$ a
kernel ?
Best Answer
It seems to me you might be looking for a variant of Schoenberg's theorem:
(Think $\Psi(x,y) = d(x,y)^2$ in the following).
The function $\Psi \colon X \times X \to \mathbb{R}$ is said to be a kernel of conditionally negative type if
A positive semidefinite kernel is a kernel in your sense but without the condition $K(x,x) = 0$ iff $x = 0$ (presumably you meant $K(x,x)=1$ here, otherwise this would be incompatible with positive definiteness) and with positive definiteness weakened to positive semi-definiteness.
You can find a proof of this as Theorem C.3.2 on page 370 of the book Kazhdan's Property $(T)$ by Bekka, de la Harpe, and Valette, (link goes to Bekka's homepage). The rest of this appendix might have some useful information, too.
Added: Schoenberg's original article: Metric spaces and positive definite functions, Trans. Amer. Math. Soc. 44 (1938), 522-536.