This is a question in general, not specific to any method or data set. How do we deal with a class imbalance problem in Supervised Machine learning where the number of 0 is around 90% and number of 1 is around 10% in your dataset.How do we optimally train the classifier.
One of the ways which I follow is sampling to make the dataset balanced and then train the classifier and repeat this for multiple samples.
I feel this is random, Is there any framework to approach these kind of problems.
Best Answer
There are many frameworks and approaches. This is a recurrent issue.
Examples:
Some lit reviews, in increasing order of technical complexity\level of details:
Oh, and by the way, 90%/10% is not unbalanced. Card transaction fraud datasets often are split 99.97%/0.03%. This is unbalanced.