Solved – Grid search error in LIBSVM while optimizing C and g parameters

libsvmmachine learningMATLABsvm

I am using libsvm for a one-class classification problem. I am trying to select the ideal C and gamma parameters for different kernels(polynomial, linear and rbf) I am using the suggested matlab code that finds the the best parameters through a v-fold validation technique.

bestcv = 0;
for log2c = -1:3,
  for log2g = -4:1,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(Target_train, train, cmd);
    if (cv >= bestcv),
      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    end
    fprintf('%g %g %g (bestc=%g, g=%g, rate=%g)\n', log2c, log2g, cv, bestc, bestg, bestcv);
  end
end

In v-fold cross-validation, we first divide the training set into v subsets of equal size. Sequentially one subset is tested using the classifier trained on the remaining v − 1 subsets. Thus, each instance of the whole training set is predicted once so the cross-validation accuracy is the percentage of data which are correctly classified.

In this code, the C and gamma take values in a range of (2^-1, 2^3) and (2^-4, 2^1)

I noticed that when the svmtrain function is called there is no specified parameter for -s which controls the the type of svm. The default parameter for -s in libsvm is 0 which is for C-SVC. I have a one-class clssification problem so I should be using -s 2 according to svmtrain options. However when I modify the 4th line of the above code into

cmd = ['-s 2 -v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];

I am getting this error:

Undefined function 'ge' for input arguments of type 'struct'.

Error in ergasia (line 37) if (cv >= bestcv),

For what I know, svm returns a model of type struct. My question is, is the code I am using suitable for parameter selection in a one class classification problem?

An other question: Is there a better way of defining the best C and gamma other than that? I found this method here: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

I could really use some help with that so thank you in advance.

Best Answer

The output of cross-validation with LIBSVM is a score. LIBSVM's internal cross-validation uses accuracy as score metric, which is known to be suboptimal for model selection. It would be better to use a measure like area under the ROC curve (which you can compute using perfcurve if you have the statistics toolbox). Your code seems okay at first glance, except that one-class SVM uses the nu parameter, not C.

You could use specialized libraries like Optunity to do the hyperparameter tuning for you. It offers more efficient routines than grid search to optimize hyperparameters (e.g. which require less parameter pairs to be tested) and has built-in cross-validation with score metrics of your choosing so you don't have to bother implementing it yourself. Optunity is available in MATLAB (example).

Disclaimer: I'm the main developer of Optunity.

Related Question