Solved – SVM optimization problem

classificationmachine learningoptimizationregularizationsvm

I think I understand the main idea in support vector machines. Let us assume that we have two linear separable classes and want to apply SVMs. What SVM is doing is that it searches a hyperplane $\{\mathbf{x}|\mathbf{w}^\top \mathbf{x}_i + b =0 \}$ which maximizes the margin (the distance from the hyperplane to the closest data points).

This distance is given by $\frac{1}{||w||}$. Therefore maximizing the distance is equivalent to minimizing just $||w||$ (subject to the constrains).

And here is my question: In the literature I see that $\frac{1}{2}||w||^2$ is minimized and not $||w||$.

I can see that minimizing $||w||$ is equivalent to minimizing $\frac{1}{2}||w||^2$, but why do we prefer minimizing $\frac{1}{2}||w||^2$ instead?

Why is minimizing $\frac{1}{2}||w||^2$ better than minimizing $\frac{1}{3}||w||^3$ for example?

Best Answer

It is difficult to solve the norm $||w||$ because that involves square root. That's why we can square this result to $||w||^2$ without any problem.

And we usually append $\frac{1}{2}||w||^2$ for mathematical purpose when we will derivate the function for optimize it using Lagrange multiplier to find the solution.