Solved – SVM parameter tuning for unbalanced classes (with class weights)

classificationcross-validationmachine learningsvmunbalanced-classes

I am trying to run an SVM on an imbalanced dataset (0-90%, 1-10%) using the e1071 package, with the radial kernel. I am using cross-validation to select the best gamma and cost. Additionally, I want to use class weights ("0"=1, "1"=10) for every model.

This is the code I am using (similar to the one used in ISLR, only with class weights) with 5 gamma values and 5 cost parameters. Instead of getting 25 models in the output, I am getting 5. The cost parameter is not getting accounted for:

enter image description here

The best model output is the following:
enter image description here

What is the best way to tune the parameters (gamma and cost), including the class weights?

This is my first time running svm. This code took more than 2 days to run. Where am I going wrong?

Best Answer

The call is ignoring the cost parameter because it isn't part of the list you passed to ranges. Your call should look like this:

tune.out <- tune(svm, RESPONSE~., data = train, kernel="radial", 
                 ranges = list(gamma=c(0.1,0.5,1,2,4), 
                               cost = c(0.1,1,10,100,1000)
                               ), 
                 class.weights= c("0" = 1, "1" = 10))

A similar examples is shown in the documentation (?tune) with the iris dataset.

obj <- tune(svm, Species~., data = iris, 
              ranges = list(gamma = 2^(-1:1), cost = 2^(2:4)),
              tunecontrol = tune.control(sampling = "fix")
             )

As for why it is taking so long I don't know how large your dataset is (it may just take a while to process it all) but a cost of 1000 is really high. Increasing the cost parameter makes the model more computationally expensive and also increases the risk of losing the ability to generalize your model. I would start with a lower sequence of cost parameters and keep checking to see if you performance continues to go up with increasing the cost parameter making sure to evaluate your model on an independent test set!!!