I've seen this term several times already, and have no idea what it means, can you explain what it means ?
Solved – an imbalanced data set
classificationdatasetunbalanced-classes
classificationdatasetunbalanced-classes
I've seen this term several times already, and have no idea what it means, can you explain what it means ?
Best Answer
Answer: A dataset is imbalanced if the distribution of classes is not uniform.
in other (paper reference) words: Imbalance problem occur where one of the two classes having more sample than other classes.
Problem: The problem arising from imbalance is that accuracy can sometimes be high for a classifier although it does perform rather bad.
Example: A simple example could be found in disease diagnostics. Assume you have a disease which only a very few people actually have. Lets assume you have a dataset of 100 people. 90 people are healthy, 10 are sick. Thus, if you construct a classifier which always predicts "healthy", you will achieve a wonderful accuracy of 90%. Yet in this specific case it is a rather bad performance because all sick people are falsely diagnosed as healthy.
What to do: Here sensitivity and specificity come in handy which are implicitly used when computing the AUC (area under the curve) for evaluation of a classifier. Wikipedia provides great further information.