Solved – The Cost Parameter for Support Vector Machines

svm

I am now learning about SVMs and I learned that "cost" is one of the most important tuning parameters for building the best performing model with SVMs. But i found it very hard for me to understand the concept of "cost" because it is generally defined as "the price for misclassifications". Although i can do SVMs in R and have built some fair models, it would not do me any harm if i understand the logic behind the parameter of "cost".
Thanks
Felix

Best Answer

I assume that you are familiar with the Optimal Separated Hyperplane. In wiki for SVM Model, we define soft margin which allows "misclassification". A nice motivation for SVM is that some dataset are not linear, which means we can not find the hyperplane separating the different classes of the data.

Mathematically, we rewrite the constrain from

$$y_i(\mathbf{w}\cdot\mathbf{x_i} - b) \ge 1 \quad 1 \le i \le n. \quad\quad(2)$$

to $$ y_i(\mathbf{w}\cdot\mathbf{x_i} - b) \ge 1 - \xi_i \quad 1 \le i \le n. \quad\quad(2) $$ The slack variable $\xi$ able to states the "misclassification".

However, we can not allow too much "misclassification". In this light we rewrite the objective function, $$ \arg\min_{\mathbf{w},\mathbf{\xi}, b } \left\{\frac{1}{2} \|\mathbf{w}\|^2 + C \sum_{i=1}^n \xi_i \right\} $$ The tuning parameter $C$ which you claim "the price of the misclassification" is exactly the weight for penalizing the "soft margin".

There are many methods or routines to find the optimal parameter $C$ for specific training data, such as Cross Validation in LiblineaR.

Related Question