Solved – Connection between SVM and Representer theorem

kernel trickoptimizationregularizationsvm

SVM over the set of examples $\{x_n,t_n\}_{n=1}^N$ considers minimization of the following objective function:

$$\displaystyle \frac{1}{2}||\mathbf{w}||^2 + C\sum_{n=1}^N \ell(\mathbf w, x_n, t_n),$$

considering $\ell_2$ regularization. On the other hand, the Representer theorem is considering minimization of the following objective function:

$$\Omega(||f||^2_H)+L\left ( (x_1, t_1, f(x_1)),\dots , (x_N, t_N, f( x_N))\right),$$

where $f(\cdot )=\sum_{n=1}^N \alpha_n k(x_n,\cdot )$ in dual space or $f(\cdot )=\mathbf w^T\phi(\cdot)$ in primal space.


Question: How to connect representer theorem with SVM? What is $\Omega(||f||^2_H)$ and what is $L(\dots)$ in case of SVM?


I tried the following:

In primal space:

$$L\left ( (x_1, t_1, f(x_1)),\dots , (x_N, t_N, f( x_N))\right) = C\sum_{n=1}^N \ell(\mathbf w, x_n, t_n)$$
but the regularization $\Omega(||f||^2_H)$ does not match
$$\begin{align}
||f||^2_H = \int_{x \in X} f(x)f(x)dx &= \int_{x \in X} \mathbf w^T\phi(\mathbf x) \phi(\mathbf x)^T \mathbf w\, dx = \ldots?
\end{align}$$

since I cannot get something close to $||\mathbf{w}||^2$.

In dual space:
Equivalent dual SVM formulation with kernels is

$$\max_{\mathbf \alpha} \sum_{n=1}^N \alpha_n -\frac{1}{2} \sum_{n=1}^N \sum_{k=1}^N \alpha_n \alpha_k t_n t_k k(x_n, x_k), \\
\textrm{s.t.} \sum_{n=1}^N \alpha_n t_n =0$$

where I cannot recognize which part is $\Omega(||f||^2_H)$ and which one is $L(\dots)$?

$\begin{align}
||f||^2_H = \int_{x \in X} f(x)f(x)dx &= \int_{x \in X} \sum_{n=1}^N \alpha_n k(x_n,x ) \sum_{k=1}^N \alpha_k k(x_k,x )dx \\
&=\sum_{n=1}^N \sum_{k=1}^N \alpha_n \alpha_k \int_{x \in X} k(x_n,x )k(x_k,x )dx=\ldots
\end{align}$

Best Answer

I'm still not entirely sure on exactly what you're asking, but I think this might address it.

I'm going to let $k_i$ denote the representer of $x_i$ in our Hilbert space $\mathcal H$ (so $k_i = \phi(x_i)$). We know that $$ f = \sum_i \alpha_i y_i k_i $$

is the function that is normal to our hyperplane (this is our $w$, but I'm using $f$ to emphasize that it is a function).

Computing the norm: $$ \vert \vert f \vert \vert^2_\mathcal H = \int \left( \sum_i \alpha_i y_i k_i(z) \right)^2 dz = \sum_{ij} \alpha_i\alpha_j y_i y_j \int k_i(z) k_j(z) dz $$

$$ = \sum_{ij} \alpha_i\alpha_j y_i y_j \langle k_i, k_j \rangle = \sum_{ij} \alpha_i\alpha_j y_i y_j K(x_i, x_j) $$ so if $K$ is the linear kernel we have that $$ \vert \vert f \vert \vert^2_\mathcal H = \sum_{ij} \alpha_i\alpha_j y_i y_j x_i^T x_j = w^T w. $$

Does that help?

Related Question