Solved – Optimizing for AUC

auc

AUC is a popular classification evaluation metric.

  1. This is a measure of aggregate performance—do any of the standard loss functions (functions of an individual example's label & prediction) optimize for this?

  2. Are there any other ML approaches (not necessarily loss-function-based optimization algorithms) that are designed to optimize AUC?

Best Answer

There are actually multiple ways to do this.

Remember that the AUC is a normalized form of the Mann-Whitney-U statistic, that is, the sum of ranks in either of the class. This means that finding optimal AUC is the problem of ordering all scores $s_1,\ldots,s_N$ so that the scores are higher in one class than the other.

This can be framed for example as a highly infeasible linear-programming-problem which can be solved heuristically with appropriate relaxations, but one method that interests me more is to find approximate gradients to the AUC so that we can optimize with stochastic-gradient-descent.

There's plenty to read about this, here is a naive approach:

Using '$[]$' as the Iverson-Bracket, another way to look at the sought ordering over the scores could be that

$[s_i\leq s_j]=1$ for all $i,j$ where response $y_i=0$ and $y_j=1$

So if the scores are a function of inputs and parameters $s_i = f(x_i,\theta)$

We want to maximize $$M^*=max_\theta \sum_{i,j}[s_i\leq s_j]$$

Consider the relaxation $\tanh(\alpha(s_j-s_i)) \leq [s_i\leq s_j]$

So $$M^*\geq \sum_{i,j}\tanh(\alpha(s_j-s_i))$$

And we could then sample $i$ and $j$ from each class to get contributions $\nabla_\theta \tanh(\alpha(s_j-s_i))$ to the full gradient.