Kernel Trick – How to Prove Hamming Distance as a Kernel?

kernel trickself-study

Given Hamming distance $H(x,z)$ between two equal length string of length $n$, define a kernel as:

$$
K(x,z) = \begin{cases}
1 – \frac{H(x,z)}{n} & \text{if both x and z have the same length $n$}
\\ 0 & \text{otherwise}
\end{cases}
$$

Question: How do I prove that it is a kernel?

Attempt: It is symmetric for sure, but proving positive semi-definite would involve proving $$\sum_{i=1}^n \sum_{j=1}^n \alpha_i K(x_i, x_j) \alpha_j \ge 0$$ for all sets of strings $\{x_i\}$ and vectors $\alpha$, and I am not sure how to proceed.

Best Answer

Step 1. For an arbitrary set of strings $\{x_i\}$, first sort them by their length. Then the kernel matrix is block-diagonal, since the kernel value between any two strings of different lengths is zero. This lets us see that we only need to consider strings of the same length; how?

Step 2. So, we only need to consider a set of strings $\{ x_i \}$ all of length $m$. Then we can see that $$K(x, z) = \frac1m \sum_{i=1}^m \mathbb{1}(x_i = z_i), \tag{*}$$ where $\mathbb{1}(x_i = z_i)$ is 1 if $x_i = z_i$, 0 otherwise.

Looking at it this way lets us find an explicit feature map from the space of strings of length $m$ to vectors of a certain length: that is, $K(x, z) = \varphi(x)^T \varphi(z)$, for some function $\varphi$. Do you see what that function is?

Hint: if $a = (0, 1, 0, 1)$ and $b = (1, 1, 0, 0)$, then $a^T b = 0 + 1 + 0 + 0$; in general, for length-$k$ vectors whose entries are all either $0$ or $1$, $a^T b = \sum_{i=1}^k a_i b_i = \sum_{i=1}^k \mathbb 1(a_i = 1, b_i = 1)$. So you might want to come up with some function $\varphi$ that turns this into something that looks like (*).

Step 3. From there, it's a standard result that we have positive definiteness pretty quickly. Supposing that $\varphi$ maps into $\mathbb R^k$, let $\Phi$ be the $n \times k$ matrix whose $i$th row is $\varphi(x_i)$, so that $(\Phi \Phi^T)_{ij} = \varphi(x_i)^T \varphi(x_j) = K(x_i, x_j)$. Then the psd condition is that for all $a \in \mathbb R^n$, we have that $$ \sum_{i=1}^n a_i K(x_i, x_j) a_j = a^T K a = a^T \Phi \Phi^T a = (\Phi^T a)^T (\Phi^T a) = \lVert \Phi^T a \rVert^2 \ge 0 .$$

Related Question