[Math] How to compute “AUC” Area under the curve number, if all I have are the TPR and FPR values

neural networkssignal processingstatistics

I am trying to rank my neural network, which is trained for binary classification. That is, given a set of input signals, it outputs either a 1 or a 0.

I have a training set, where I have the actual desired outcomes (of 1 or 0).

After I train my network, I check the output to the input. From this, I can easily see how many true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) I have.

From the TP, FP, TN, FN, I can compute the TPR and the FPR (true and false positive rates).

But I do not know how to compute the AUC score from this data.

I would appreciate any help

Thanks
Lyle

Best Answer

Often, a system is tested with a variety of parameter settings, so that you get a list of [FPR, TPR] pairs, which you plot, and then the area under the curve (AUC) is visually obvious because you can see the "curve" plotted.

When you only have one point, you have to interpolate. So:

  • Let's say you have your classifier which outputs TPR value $t$, FPR value $f$, so the coordinate is $[f,t]$.
  • As well as this, you also know that you could act stupid and always output 1, irrespective of the input data. This gives you a perfect TPR :) but it also gives you a perfect FPR :( so there's another coordinate you can plot: $[1,1]$.
  • By similar reasoning, you can always output zeroes, and achieve $[0,0]$.

You can now interpolate between these three possibilities: draw a piecewise linear curve from $[0,0]$ to $[f,t]$ to $[1,1]$. Then calculate the area under this curve.

If you do this graphically it should be straightforward to see that you get:

$$ \text{AUC} = \frac{t - f + 1}{2} $$

Note that this relies on the piecewise linear interpolation, which is plausible (see Wikipedia article linked above) but not the only way to do it.

Related Question