Solved – Performance Metrics for Imbalanced Classification

aucclassificationlog-losslogisticunbalanced-classes

I'm trying to fit multiple Stochastic Gradient Descent models to a dataset where the target (binary target, 0 or 1) is very imbalanced, i.e the success rate is about 0.0001.

Out of all the models I've trained, I would like to select the best model based on the validation log-loss and validation AUC. Unfortunately, the average values of the test log-loss (0.001) and the test AUC (0.99) don't allow me to differentiate the models (as all the values are almost the same).

Are these metrics (AUC and LogLoss) good performance metrics for a highly imbalanced classification task?
What metrics would allow me to differentiate the models and choose the best one?

Thanks

Best Answer

I think the best way to see performance of the classification with highly imbalanced classes is look at precision-recall curve. You can also use area under this curve as metric.

Related Question