Solved – Gradient descent in SVM

gradient descentlagrange multipliersmachine learningsvm

I am studying about SVM now. Then I came across the problem.

The dual optimization problem is as follows:

\begin{align*}
&\max_\alpha~~~~~ W(\alpha) = \sum_{i=1}^{n} \alpha_i -\frac{1}{2}\sum_{i,j=1}^{n} y_i y_j \alpha_i \alpha_j \langle x_i, x_j \rangle \\
&s.t.~~~~~\alpha_i \geq 0 ~~~~\textrm{for } i=1,\cdots,n \\
& ~~~~~\sum_{i=1}^{n} \alpha_i y_i = 0
\end{align*}

This problem has some constraints. So I cannot apply the gradient descent.

There are other methods of optimization like Newton or SMO. However, I could not understand it at all. So I want to use Gradient Descent.

Then, I came up with some idea. It is to use lower bound. It means this:
\begin{align*}
&\max_\alpha~~~~~ W(\alpha) = \sum_{i=1}^{n} \alpha_i -\frac{1}{2}\sum_{i,j=1}^{n} y_i y_j \alpha_i \alpha_j \langle x_i, x_j \rangle – \lambda\left\|\sum_{i=1}^{n} \alpha_i y_i \right\|_1 \\
&s.t.~~~~~\alpha_i \geq 0 ~~~~\textrm{for } i=1,\cdots,n
\end{align*}

Can I apply GD to this problem? I could not make a progress any more than this.

Could anyone help me?

Best Answer

This is a constrained optimization problem. Practically speaking when looking at solving general form convex optimization problems, one first converts them to an unconstrained optimization problem (e.g., using the penalty method, interior point method, or some other approach) and then solving that problem - for example, using gradient descent, LBFGS, or other technique. If the constraints have a "nice" form, you can also use projection (see e.g. proximal gradient method). There are also very efficient stochastic approaches, which tend to optimize worse, but generalize better (i.e., have better performance at classifying new data).

As well, your formulation doesn't appear to be correct. Generally one has $\alpha_i \leq C$ for hinge-loss SVM. If one uses e.g. square loss, then that constraint wouldn't be present, but your objective would be different.