Solved – e1071 svm queries regarding plot and tune

data visualizatione1071rsvm

I am new to R and I am learning the e1071 packages' svm function.

Following are the few questions I have.

  1. How does the plot function work?

I cannot understand the plotting case with more than 2 factors in the class variable. Like in the line below, why Petal.Width and Petal.Length are chosen as the two dimensions for plotting and how other dimensions affect the result? Moreover, I am also not clear with the slice parameter, what is it and why the values 3 and 4 are set?

plot(model,iris, Petal.Width ~ Petal.Length, slice= list(Sepal.Width=3,Sepal.Length=4))
  1. What is the tune function and how we choose the gamma and cost variables as shown in this example in the cran documentation.

obj <- tune.svm(Species~., data = iris, gamma = 2^(-1:1), cost = 2^(2:4))

Best Answer

I'm not familiar with the plot function of e1071 but I can help with your second question.

The purpose of the tune function is to find the optimal cost and gamma parameters, where by optimal we mean producing the smallest test error rate. In this case the selection is done by a grid search wherein the user supplies a series of values for each tuning parameter and the function estimates the test error rate for each possible combination. In the example above, there are 3 possible gammas and 3 possible costs meaning that there are 9 different combinations of cost and gamma for which the test error rate is to be evaluated. The evaluation of the test error rate is done by k-fold cross validation, with 10 folds being the default. As for what values to provide the tune function, typically one starts with a wide range and then hones in. For example, it is not uncommon to start with $log_{10}$ scale values such as cost = 10^(0:4), gamma = 10^(-4:-1). They happen to have used $log_2$ scale values in the example.