Solved – Calibration after up and downsampling

calibrationclassificationisotonicunbalanced-classes

I am experimenting with different techniques to deal with imbalanced classes in a classification problem. I am comparing upsampling the minority class with downsampling the majority class. Furthermore I am doing some experiments with ROSE and SMOTE.
After training and applying Platt's calibration I noticed that the range of probabilities differ quite a bit. With upsampling the max probability is about 13% while with downsampling it is 26%. I can understand that if you use more majority data with low likelihood, the distribution of probabilities tends to have lower scores. But shouldn't calibrating afterwards take care of this and gave distributions that are more or less the same?
I am using logistic regression right now, but plan to also use a gbm in the near future.

Some general statistic about my dataset: about 250 000 observations with the minority class occurring about 0.6% of the time.

Best Answer

Maybe Platt's calibration is not flexible enough to give you a good conversion. You could try an isotonic regression model or simply an additive logit model.

Related Question