Solved – Manifold regularization using laplacian graph in SVM

I'm trying implement Manifold Regularization in Support Vector Machines (SVMs) in Matlab.
I'm following the instructions in the paper by Belkin et al.(2006), there's the equation in it:

$f^{*} = \text{argmin}_{f \in H_k}\sum_{i=1}^{l}V\left(x_i,y_i,f\right)+\gamma_{A}\left\| f \right\|_{A}^{2}+\gamma_{I}\left\| f \right\|_{I}^{2}$

where V is some loss function and $\gamma_A$ is the weight of the norm of the function in the RHKS (or ambient norm), the enforces a smoothness condition on the possible solutions, and $\gamma_I$ is the weight of the norm of the function in the low dimensional manifold (or intrinsic norm), that enforces smoothless along the sampled M. The ambient regularizer makes the problem well-posed, and its presence can be really helpful from a practical point of view when the manifold assumption holds at a lesser degree.

It has been shown in Belkin et al. (2006) that $f^*$ admits an expansion in terms of $n$ points of S,
$f^*(x)=\sum_{i=1}^{n}\alpha_i^*k(x_i,x)$
The decision function that discriminates between class +1 and -1 is $y(x)=sign(f^*(x))$.

The problem here is, I'm trying to train SVM using LIBSVM in MATLAB but I don't want to modify the original code, so I have found the precomputed version of the LIBSVM which instead of taking input data, and output groups as parametes, gets Kernal matrix computed and the output groups and trains the SVM model. I'm trying the feed it with the regularized Kernel matrix (Gram Matrix) and let it do the rest.

I tried to find the formula which regularizes the Kernal and came to this:
Defining $I$ as the identity matrix with the same dimension as the Kernel Matrix, $K$

$G=\frac{2\gamma_AI + 2\gamma_ILK}{I}$

$Gram = KG$

In which $L$ is the Laplacian Graph Matrix, $K$ is the Kernel Matrix and $I$ is the identity matrix. And $Gram$ is computed using inner multiplication of two matrices $K$ and $G$.

Is there anyone who can help me figure out how this is computed?

Best Answer

~~While I did not test it~~, reading the article, the optimization problem, both for SVM and LapSVM, is given as:

$$\beta^*=\max_{\beta\in\mathbb R^l} \sum_{i = 1}^{l}\beta_i - {1\over 2}\beta^TQ\beta$$ subject to: $$\sum_{i = 1}^{l}\beta_iy_i = 0\\ 0 \le \beta_i \le {1\over l}\text{, with }i=1,\dots,l$$

For SVM:

$$Q_{\text{SVM}} = Y\left(K \over 2\gamma\right)Y\\ \alpha^*_{\text{SVM}}={Y\beta^* \over 2\gamma}$$

While for LapSVM we have the following (added parentheses to make the relationship clearer):

$$Q_{\text{LapSVM}} = Y\left( JK \left(2\gamma_AI+2\frac{\gamma_I}{(l+u)^2}LK\right)^{-1} J^T\right)Y\\ \alpha^*_{\text{LapSVM}}= \left(2\gamma_AI+2\frac{\gamma_I}{(l+u)^2}LK\right)^{-1}J^TY\beta^*$$

We can define $$Q_{\text{SVM*}} \equiv Q_{\text{LapSVM}}$$ if:

$$ \left\{\begin{matrix} \gamma_{\text{SVM*}} = 1/2 \\ K_{\text{SVM*}}=JK_{\text{LapSVM}}\left(2\gamma_AI+2\frac{\gamma_I}{(l+u)^2}LK_{\text{LapSVM}}\right)^{-1}J^T \end{matrix}\right. $$

Last:

$$\alpha^*_{\text{LapSVM}}= K_{\text{LapSVM}}\left(2\gamma_AI+2\frac{\gamma_I}{(l+u)^2}LK_{\text{LapSVM}}\right)^{-1}J^T \alpha^*_{\text{SVM*}}$$

I can confirm that it works. See this example with a Gaussian kernel, and how the class virginica starts creeping into the unlabelled data when $\gamma_I = 2500$ compared to $\gamma_I = 0$, which is the standard SVM.

Best Answer

Related Solutions

Solved – Plotting the decision boundary of a kernel SVM (RBF)

Machine Learning – Number of Features in SVM Kernel Matrix

Related Question