Searched high and low and have not been able to find out what AUC, as in related to prediction, stands for or means.
AUC in Classification – What Does AUC Stand For and What Is It?
abbreviationaucclassificationpredictionroc
Related Solutions
A binary classifier might produce at prediction time either directly the class labels for each classified instance, or some probability values for each class. In the later case, for each instance it will produce in binary case a probability value $p$ for one class, and a probability value $q=1-p$ for the second class.
If the classifier produces probabilities, one have to use a threshold value in order to obtain classification labels. Usually, this threshold value is $0.5$, but this is often not the case, and sometimes it is not the best possible value.
Now, MCC is computed directly from classification labels, which means that it was used a single threshold value to transform probabilities into classification labels, no matter what the threshold value was. AUC, on the other hand is using the whole range of threshold values.
The idea is that these two values, AUC and MCC, measure different things. While MCC measures a kind of statistical accuracy (related with Chi squared test which gives some hints on the significance of the differences), the AUC is more related with the robustness of the classifier. AUC and ROC curves gives more hints on how well the classifier separates the binary classes, for all possible threshold values. Even for the degenerate case, where AUC is computed directly on labels (not advisable since it looses a lot of information), the purpose of AUC remains the same.
Model selection is a hard problem. My advice for you would be to try yourself to answer to the question: what means a better classifier? Find an answer which includes considerations like: cost matrix, robustness to unbalanced data, an optimistic or conservative classifier, etc. Anyway, provide enough details until you find one or few criteria which can be used to select a metric before trying to measure different things related with accuracy and ask later what to do with them.
[Later edit - related with usage of robust word]
I used the term "robust" because I could not find a proper single word for "how well a classifier separates the two classes". I know that the term "robust" has some special meaning in statistics.
Generally, an AUC close to $1.0$ separates well binary cases for many values of the threshold. In this sense, the AUC close to $1.0$ means that it is less sensitive to which threshold values is used, which means its robust to this choice. However, a value which is not close to $1.0$ does not mean the contrary, which does not necessarily mean that there is not enough range of good threshold values. In most cases, a graphical inspection of ROC curve is necessary. This is one of the main reasons why AUC is often considered misleading. AUC is a measure over all possible classifiers (one for each possible threshold value), and is not a measure of a specific classifier, but in practice one can't use more than one threshold value. While AUC can give hints about the well-separation (my use of the term "robustness"), it is not used alone as a single authoritative measure of accuracy.
F1 score is applicable for any particular point of the ROC curve. This point may represent for example a particular threshold value in a binary classifier and thus corresponds to a particular value of precision and recall.
Remember, F score is a smart way to represent both recall and precision. For F score to be high, both precision and recall should be high.
Thus, the ROC curve is for various different levels of thresholds and has many F score values for various points on its curve.
Best Answer
Abbreviations
AUC is used most of the time to mean AUROC, which is a bad practice since as Marc Claesen pointed out AUC is ambiguous (could be any curve) while AUROC is not.
Interpreting the AUROC
The AUROC has several equivalent interpretations:
Going further: How to derive the probabilistic interpretation of the AUROC?
Computing the AUROC
Assume we have a probabilistic, binary classifier such as logistic regression.
Before presenting the ROC curve (= Receiver Operating Characteristic curve), the concept of confusion matrix must be understood. When we make a binary prediction, there can be 4 types of outcomes:
To get the confusion matrix, we go over all the predictions made by the model, and count how many times each of those 4 types of outcomes occur:
In this example of a confusion matrix, among the 50 data points that are classified, 45 are correctly classified and the 5 are misclassified.
Since to compare two different models it is often more convenient to have a single metric rather than several ones, we compute two metrics from the confusion matrix, which we will later combine into one:
To combine the FPR and the TPR into one single metric, we first compute the two former metrics with many different threshold (for example $0.00; 0.01, 0.02, \dots, 1.00$) for the logistic regression, then plot them on a single graph, with the FPR values on the abscissa and the TPR values on the ordinate. The resulting curve is called ROC curve, and the metric we consider is the AUC of this curve, which we call AUROC.
The following figure shows the AUROC graphically:
In this figure, the blue area corresponds to the Area Under the curve of the Receiver Operating Characteristic (AUROC). The dashed line in the diagonal we present the ROC curve of a random predictor: it has an AUROC of 0.5. The random predictor is commonly used as a baseline to see whether the model is useful.
If you want to get some first-hand experience: