Solved – multiclass classification and unbalanced dataset

classificationsvmunbalanced-classes

I have a five-class SVM multiclass problem. The dataset is small (about 160 examples) and unbalanced i.e. I have classes with few examples. So far I further limited the dataset to 110 examples in order to work with a balanced training set… Is this a correct approach? Or should I work with an unbalanced training set? What are the advantages in this latter case? Thank you in advance!

Best Answer

What is the class distribution? I would suggest that you building your classifiers in a more "stepwise" approach: for instance, take your largest class, say X, and build a classifier to classify "X" vs. "not X". Subsequently work downwards on your next largest class, until of course when the data is too small for any classification to be valid. I find this approach more much meaningful than a 5-class classification.

Alternatively, you can try approaches like SMOTE (Synthetic Minority Over-sampling), or using alternative measures of accuracies (eg. adjusted geometric mean) to make up for class imbalance.

I can't think of any advantages to using a class-imbalance data as its own without any intervention as such the above, not sure if there is.