Solved – Which classifiers work well with unbalanced data

binary dataclassificationneural networksunbalanced-classes

I have a binary classification problem which is very unbalanced – it can have 98% of data from one class. Which classifiers work well with this sort of data?

I have an unlimited supply of training data, since I produce it using a pseudo random number generator. However, I found that to get a neural network to produce decent results, I had to generate balanced (50:50) data. This is the equivalent of over-sampling. The problem with this approach is that the training data is then not representative of real life.

Best Answer

Some options:

Do not use accuracy alone as a metric. That way, we would get 98% accuracy with everything classified as the majority class, which would not mean anything. Precision & Recall might be a better one.
You could try using a Cost sensitive classifier through which you can state the cost of misclassification of the different classes.
Use an SVM but penalize one of the classes which can be done using LibSVM
boost the number of minority class training examples by artificially creating new samples from the existing samples.
resample the set, to have a proportional number of samples in both the classes (probably not an option in your case)

Best Answer

Related Solutions

Solved – How to make predictions using multiclass unbalanced data

Solved – Cross Validation with duplicates and (un)balanced data

Related Question