Solved – Error and Dispersion meaning in tune.out for SVM Classifier

cross-validatione1071svm

I am using a SVM to solve a binary classification problem with qualitative response as output.

To find out the best parameters for the SVM I used a 10-fold cross-validation technique. And the result of the process was (under RStudio and R):

Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation 

- best parameters:
 cost
    5

- best performance: 0.25 

- Detailed performance results:
   cost     error dispersion
1 1e-03 0.4833333  0.2415229
2 1e-02 0.4833333  0.2415229
3 1e-01 0.3500000  0.1657382
4 1e+00 0.2666667  0.1405457
5 5e+00 0.2500000  0.1416394
6 1e+01 0.2666667  0.1791613
7 1e+02 0.2666667  0.1791613

What I am asking to myself is what are the error and dispersion, and how are they calculated?

I tried to answer: Is the error the average MSE of the ten estimates of the test errors? I think not because the classification problem has a qualitative response, and the CV-error-rate should be calculated on misclassified observations.

I am bit confused about this.

Best Answer

If you dig into the code of tune, you'll find that it calculates error for each of the surrogate models, and then aggregates these per-model error estimates into a point estimate (that is reported in your summary as error) and dispersion.

  • For classification, the surrogate-model error estimate is fraction of correctly predicted among all predictions = accuracy.
  • the aggregation function for the point estimate is tunecontrol$sampling.aggregate which defaults to mean,
  • the aggregation function for the dispersion is tunecontrol$sampling.dispersion, defaulting to sd.

See also the man page of tune.control().