Solved – Using AdaBoost on multi-class in R on unbalanced data

boostingrunbalanced-classes

I have a data set which is highly imbalanced and I have used the SMOTE algorithm (using the R package DMwR) to balance the binary class in the data set. I have been using the R Ada package to then train an Ada Boost model on this data set to predict the binary class, with very good results.

In the same data set, I have another class variable which has multiple values (6 in total). In this case I realise that I can't use the AdaBoost algorithm as implemented in the ada package as it only deals with the binary case.

I therefore have 2 problems:

  1. I'd like to use the SMOTE algorithm on second class variable but this also only works with binary classes. Is there an algorithm or package I can use in R to "rebalance" a data set based on a class with multiple values in a similar way to SMOTE?

  2. I'd like to use a classifier to predict the multiple class variable. I have tried using the one-vs-all approach with AdaBoost but I cannot get this to work well (my approach is below). Boosting seems to work well with this data set. Are there any other boosting algorithms or other approaches I could use in R that handle classes with multiple values. I have tried using Random Forest but one of my nominal inputs has too many discrete values to use it.

Approach for AdaBoost one-vs-all

  • Build a vector with a binary variable for each discrete class value
  • Train one AdaBoost model against each binary class vector
  • Generate probability prediction for each AdaBoost model
  • Select the class with the highest probability

Many thanks

Best Answer

You can use maboost package in R. It implements mutliclass boosting. Its multiclass boosting is in a sense the generalization of adaboost.MM. It directly solves the multiclass boosting without reduce it to binary classification problems and perhaps appropriate for your application.