Solved – A priori selection of SVM class weights

machine learningsvmunbalanced-classes

I remember seeing/reading somewhere that for multiclass SVMs with unbalanced data, there was a way to determine the class weights from the training data (rather than X validation). Does anyone know what the method is or what paper its from?

Thanks

Best Answer

For SVM that minimizes objective function $$\frac{1}{2}||w||^2 + C_1 \sum_{\xi_i: y_i=-1}^{l}\xi_i + C_2 \sum_{\xi_i: y_i=1}^{l}\xi_i $$ you can choose constants $C_1$ and $C_2$ inversely proportional to the class sizes. That is, if you have $l_1$ training samples in class 1 and $l_2$ -- in class 2, take $C_1$ and $C_2$ such that $C_1/C_2$ = $l_2/l_1$. You may need to slightly adjust them later in your experiments, but this is a good rule of thumb.

If you are using LIBSVM package, you can specify $C_1$ and $C_2$ using flags ''-w-1'' and "-w1".

P.S. I just noticed that you asked about multiclass problem. Well, maybe you will still find this answer useful.