Solved – Meaning of Epsilon in SVM regression

e1071rregressionsvm

I did some tutorials and read few articles but still have a problem with SVM, exactly with SVR.
I'm doing analysis in R and I use e1071 library with "svm" function. Into that function I use my multivariable equation, so svm works since now like SVR.

My results:

(general: cost=1,gamma=0.1666)
-epsilon=0.1(61 SV-supported vectors) - RMSE = 4.1(on unseen data)
-epsilon=1(10 SV) - RMSE = 19(on unseen data)
-epsilon=1.3(7 SV) - RMSE = 25(on unseen data)

When epsilon is increasing I understand that we should have actually more supported vectors as we can see on the picture below:

In our case as we can see the bigger epsilon the less supported vectors.
I don't know why this happens. I would be glad if someone can explain it to me.

Best Answer

You have it backwards.

Traditional $\epsilon$-SVR works with the epsilon-insensitive hinge loss. The value of $\epsilon$ defines a margin of tolerance where no penalty is given to errors.

Remember the support vectors are the instances across the margin, i.e. the samples being penalized, which slack variables are non-zero.

The larger $\epsilon$ is, the larger errors you admit in your solution. By contrast, if $\epsilon \rightarrow 0_+$, every error is penalized: you end with many (tending to the total number of instances) support vectors to sustain that.

Related Solutions

Solved – How to obtain decision boundaries from linear SVM in R

For data point $x$ your SVM calculates decision value $d$ in the following way:

d <- sum(w * x) + b

If $d > 0$ then label of $x$ is $+1$, else it's $-1$. You can also get labels or decision values for data matrix newdata by saying

predict(m, newdata)

predict(m, newdata, decision.values = TRUE)

Be cautious when using SVM from package e1071, see Problem with e1071 libsvm? question. Several other SVM packages for R are kernlab, klaR and svmpath, see this overview: Support Vector Machines in R by A. Karatzoglou and D. Meyer.

Solved – SVM: Does C increase variance or stability (bias)

The effect of the SVM C-Parameter

While the first textbook description of an SVM always speaks of "maximizing the margin", but this is only the first step. If your data is not perfectly separable there will points on the wrong side of the separating hyperplane. To allow for such points slack variables were introduced (= soft-margin SVM). They include the problematic points into the equation and weight them using the C-Parameter. This parameter is a tradeoff between maximizing the margin and minimizing the error.

Why this?

Imagine (or draw on a paper) a perfectly separable 2D dataset with a plot similar to the above. Imagine a suitable hyperplane. Image you have a hard margin svm which does not allow for such misclassified points. Now imagine you will break the rules and place a document intentionally on the other side of the hyperplane. The hyperplane will probably change a lot and will be worse than before. If you had used a soft-margin SVM instead the old solution would still be a better one.

Your example

Increasing the value of the C-Parameter
$\iff$ Weight of misclassified points is increased
$\iff$ Margin gets smaller

And i think that is what Hastie and Tibshirani meant in terms of stable: In other words closer to the hard-margin SVM.

Best Answer

Related Solutions

Solved – How to obtain decision boundaries from linear SVM in R

Solved – SVM: Does C increase variance or stability (bias)

Related Question