Solved – LibSVM – Multi class classification with unbalanced data

classificationlibsvmmulti-classsvmunbalanced-classes

I tried to play with libsvm and 3D descriptors in order to perform object recognition. So far I have 7 categories of objects and for each category I have its number of objects (and its pourcentage) :

Category 1. 492 (14%)

Category 2. 574 (16%)

Category 3. 738 (21%)

Category4. 164 (5%)

Category5. 369 (10%)

Category6. 123 (3%)

Category7. 1025 (30%)

So I have in total 3585 objects.

I have followed the practical guide of libsvm.
Here for reminder :

A. Scaling the training and the testing
B. Cross validation
C. Training
D. Testing

I separated my data into training and testing.
By doing a 5 cross validation process, I was able to determine the good C and Gamma.

However I obtained poor results (CV is about 30-40 and my accuracy is about 50%).

Then, I was thinking about my data and saw that I have some unbalanced data (categories 4 and 6 for example). I discovered that on libSVM there is an option about weight. That's why I would like now to set up the good weights.

So far I'm doing this :

svm-train -c cValue -g gValue -w1 1 -w2 1 -w3 1 -w4 2 -w5 1 -w6 2 -w7 1

However the results is the same. I'm sure that It's not the good way to do it and that's why I ask you some helps.
I saw some topics on the subject but they were related to binary classification and not multiclass classification.
I know that libSVM is doing "one against one" (so a binary classifier) but I don't know to handle that when I have multiple class.

Could you please help me ?

Thank you in advance for your help.

Best Answer

You dont need to do anything special to work with multiclass problem in LibSVM. Just give the proper label to each instance (1, 2, ..., n).

Internally, LibSVM will perform a "one against one" problem for each two class. It means that for each two class, an SVM will be trained.

The probs matrix for any new prediction will be of size M = (N (N-1)) / 2, e.g, if you have 7 classes N=7, M = 21 SVM models will be created.

Please, keep in mind that Libsvm won't respect the order of your class labels, i.e, the order of the probs matrix comparison depends on the appearance order of your class labels during the training:

  • "Internally class labels are ordered by their first occurrence in the training set. For a k-class data, internally labels are 0, ..., k-1, and each two-class SVM considers pair (i, j) with i < j. Then class i is treated as positive (+1) and j as negative (-1). For example, if the data set has labels +5/+10 and +10 appears first, then internally the +5 versus +10 SVM problem has +10 as positive (+1) and +5 as negative (-1)." http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f430

========================================================================

In any case, check all the official responses http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq