Solved – Calibration after up and downsampling

I am experimenting with different techniques to deal with imbalanced classes in a classification problem. I am comparing upsampling the minority class with downsampling the majority class. Furthermore I am doing some experiments with ROSE and SMOTE.
After training and applying Platt's calibration I noticed that the range of probabilities differ quite a bit. With upsampling the max probability is about 13% while with downsampling it is 26%. I can understand that if you use more majority data with low likelihood, the distribution of probabilities tends to have lower scores. But shouldn't calibrating afterwards take care of this and gave distributions that are more or less the same?
I am using logistic regression right now, but plan to also use a gbm in the near future.

Some general statistic about my dataset: about 250 000 observations with the minority class occurring about 0.6% of the time.

Solved – Calibration after up and downsampling

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Problem with classifier after using SMOTE to balance the data

Solved – Comparing F1 score across imbalanced data sets

Related Question