Solved – Prove that a kernel is conditionally positive definite

kernel tricksvm

A kernel is called positive definite (p.d) if its Gram matrix is p.d., i.e. all eigenvalues of the Gram matrix are positive for all possible input vectors in the feature space. My understanding of conditionally positive definite (c.p.d) kernels is that the Gram matrix has to be p.d. not for all but only for a subset of the input vectors from the feature space.

If this definition of conditionally positive definiteness is correct then how do I prove that a kernel is c.p.d.? Would it suffice to show that the Gram Matrix is p.d. for some arbitrary subset of input vectors?

Best Answer

If I'm reading it correctly, your definition of cpd would include any kernel where $k(x, x) > 0$ for any point $x$, by just considering the subset $\{ x \}$.

One way to phrase a matrix being positive definite is that $c^T A c > 0$, for all vectors $c$; a kernel is positive definite if that holds for all Gram matrices $A$. The conditioning in cpd refers to reducing the "for all vectors $c$" part, not the "for all Gram matrices $A$" part. Specifically, the most common definition is reducing it to "for all vectors $c$ with $1^T c = 0$.

These are useful because, just as pd kernels correspond to inner products in some Hilbert space, cpd kernels are closely connected to distances in some Hilbert space. SVMs and kernel PCA both work with any cpd kernel, though a particular optimization algorithm might not.

Here's the full definition from Schölkopf and Smola's Learning with Kernels (2002). Here $\mathbb K$ refers to either $\mathbb R$ or $\mathbb C$.

Definition 2.20 (Conditionally Positive Definite Matrix) A symmetric $m \times m$ matrix $K$ ($m \ge 2$) taking values in $\mathbb K$ and satisfying

$$ \sum_{i, j = 1}^m c_i \bar{c}_j K_{ij} \ge 0 \text{ for all } c_i \in \mathbb K, \text{ with } \sum_{i=1}^m c_i = 0$$

is called conditionally positive definite (cpd).

Definition 2.21 (Conditionally Positive Definite Kernel) Let $\mathcal X$ be a nonempty set. A function $k : \mathcal X \times \mathcal X \to \mathbb K$ which for all $m \ge 2, x_1, \dots, x_m \in \mathcal X$ gives rise to a conditionally positive definite Gram matrix is called a conditionally positive definite (cpd) kernel.

I don't know a simple eigenvalue-based definition.

To prove a kernel is cpd, you can either prove it directly from the definition above (usually difficult), or you can try to build it out of other cpd kernels using various properties that preserve cpd-ness. Here are some useful connections to pd kernels from that book:

Proposition 2.22 (Constructing PD Kernels from CPD Kernels [42, Lemma 2.1, p. 74]) Let $x_0 \in \mathcal X$, and let $k$ be a symmetric kernel on $\mathcal X \times \mathcal X$. Then $$\tilde k(x, x') := \tfrac12 \left( k(x, x') - k(x, x_0) - k(x_0, x') + k(x_0, x_0) \right)$$ is positive definite if and only if $k$ is conditionally positive definite.

[42] is Berg, Christensen, and Ressel's Harmonic Analysis on Semigroups (1984).

Proposition 2.28 (Connection PD – CPD [465, Thm. 1, p. 527]) A kernel $k$ is conditionally positive definite if and only if $\exp(t k)$ is positive definite for all $t > 0$.

[465] is the classic paper Metric spaces and positive definite functions of Schoenberg (Transactions of the American Mathematical Society 44:522-536, 1938).

Related Question