Solved – What to do for AUC less than 0.5

aucclassificationroc

I've trained a Random Forest model on a dataset of 60 protein predictors for healthy controls (label 0) and cancer patients (label 1).

I then tested this model on a dataset of at-risk patients divided into those who later got cancer (label 1), and those who didn't (label 0).

My model's performance gave an AUC-ROC of 0.4.

Other threads and papers (linked below), say that for AUC < 0.5, a classifier has useful information but is applying it incorrectly. People seem to suggest reversing the labels, to give an AUC-ROC of 0.6
Can AUC-ROC be between 0-0.5
http://people.inf.elte.hu/kiss/13dwhdm/roc.pdf

However, would this be appropriate in this case? Reversing the test dataset labels would mean giving the at-risk individuals who stayed healthy a label of 1 (the same as the cancer patients in the training data), which doesn't seem correct to me??

Best Answer

"Reversing" the AUC by taking AUC = 1 - AUC would be appropriate if you had no a priori information about whether to expect larger or lower values for the positive group. For instance if you were measuring a molecular biomarker, it could be present with a decreased concentration in the cancer patients. Unless and until you know more about it, you can absolutely reverse it.

However it is not your case. You trained a model to detect cancer patients. I assume that you are probably obtaining as an output the probability that the patient belongs to the cancer group. This probability has to be higher for the cancer group, otherwise you have a problem.

What you are looking at is a confounding factor that you haven't identified yet. You model learned to identify risk factors that weren't present in the healthy group, but are now in your "at risk" group, just as they were in the "cancer" group or even more strongly so.

Inference is hard and your model just isn't quite good enough at it. Finding differences to a "healthy" group is easy in my experience, although usually quite useless in practice. In the future, try to collect a training sample that is as close to your target clinical question as possible. Until then, please do not state that your AUC = 0.6.

Related Solutions

Machine Learning – Why is AUC Higher for a Less Accurate Classifier?

Improper scoring rules such as proportion classified correctly, sensitivity, and specificity are not only arbitrary (in choice of threshold) but are improper, i.e., they have the property that maximizing them leads to a bogus model, inaccurate predictions, and selecting the wrong features. It is good that they disagree with proper scoring (log-likelihood; logarithmic scoring rule; Brier score) rules and the $c$-index (a semi-proper scoring rule - area under ROC curve; concordance probability; Wilcoxon statistic; Somers' $D_{xy}$ rank correlation coefficient); this gives us more confidence in proper scoring rules.

Solved – R AUC never less than 0.5

There is no auc() function in the randomForest package. But based on the argument names you used (obs and pred), I think you might have used the auc() function in the SDMTools package. And yes, this function does flip the results if the calculated AUC is less than 0.5:

> SDMTools::auc
function (obs, pred) 
{
    … code to calculate the AUC …
    if (AUC < 0.5) 
        AUC = 1 - AUC
    return(AUC)
}

This might be seen as a nice convenience feature (it’s easy to forget exactly how AUC functions want their arguments coded, and if you get an AUC < .5 in real life, you have usually just used the inverse/incorrect coding of the response vector ), but I think it’s a bad idea. If you try models that are (almost) as bad as random, e.g., in simulations, you will get estimates that are biased high (compared the ‘correct’ AUC estimator).

If you want the correct AUC estimates, you can use either

ROC() from the Epi packages. It draws a nice ROC plot, with the AUC embedded, and also returns an object with the AUC stored as the AUC element.
rcorr.sens() from the rms package. It returns a list with the AUC stored as the first element.
roc() from the pROC package if you manually specify the direction argument. It returns an object with the AUC stored as the auc element.

Best Answer

Related Solutions

Machine Learning – Why is AUC Higher for a Less Accurate Classifier?

Solved – R AUC never less than 0.5

Related Question