Solved – Coefficients in Support Vector Machine

classificationmulti-classpythonscikit learnsvm

I have a few related questions:

  1. What is the total number of fitted paramaeters in Python Support Vector Machine: sklearn.svm.SVC(kernel='linear') and sklearn.svm.SVC(kernel='rbf')?

I am trying to find out the total number of fitted parameters in linear and kernel SVM. If I understand correctly, the fitted parameters include C (penalty parameter), gamma(for kernel='rbf', none for kernel='linear'), all the coefficients (for kernel = 'linear', equal to the number of features + 1 for the intercept; for kernel = 'rbf', equal to the number of training samples), and slack variables (equal to the number of samples) – is this correct?

  1. If the above is correct, does it imply that there is a high probability of overfitting for SVM with kernel = 'rbf' since the number of fitted parameters is always greater than the number of samples?

Thank you very much for your help!

Best Answer

There are multiple misunderstandings in both the question and the answer posted by @mp85.

There are to sets of parameters, but one of them are called hyperparameters.

The SVM problem/formulation is $$ \min ||w||^2 + C \sum \xi_i $$ subject to $$ y_i(w·\phi(x_i)+b) \ge 1−\xi_i \quad \xi_i \ge 0 $$ for all data $(x_i, y_i)$. $\phi(x)$ is a transformation on the input data.

So, you must set $\phi()$ and you must set $C$, and then the SVM solver (that is the fit method of the SVC class in sklearn) will compute the $\xi_i$, the vector $w$ and the coefficient $b$. This is what is "fitted" - this is what is computed by the method. And you must set $C$ and $\phi()$ before running the svm solver.

But there is no way to set $\phi()$ directly. It turns out that one defines the transformation by defining a kernel - linear (no transformation) or rbf or poly (or others). Each of this kernels are defined by one or more parameters: rbf by the gamma, poly by coef0 and degree, and so on.

So to run the SVM you must set C, and must choose the kernel and for each kernel, set the appropriate parameter (or parameters). These are collectively known as hiper parameters and they are not computed by the SVM solver, they are set by you.

Finally, it is not 100% true that the SVM solver computes the $w$, the $b$ and the $\xi_i$. The SVC solver uses a different formulation of the svm problem, the dual of the formulation above, and it computes different variables. For the LinearSVC solver, which only works for the linear kernel, it does compute $w$, the $b$ and the $\xi_i$ (and returns $w$ and $b$).

Related Question