Solved – SMOTE algorithm how to select over and under percentage

machine learningrunbalanced-classes

I have a highly unbalanced binary dependent variable (i.e. cases of '1' is <5%). I am trying to implement SMOTE algorithm using R DMwR package. I wonder in general, how we determine the parameters such as perc.over and perc.under indicating how much we need to oversample or undersample the minority or majority class respectively.

Best Answer

Create a loop so that you can loop through different values of the percentage and see which gives you the best accuracy or f-score. ie 100%, 200% , ... for perc.over. For perc.under you can maj to min ratio multiplied by the inital oversampling percenatge.