Model Evaluation – Reusing TPR and FPR Values for Different Test Sets

classificationmodel-evaluationself-study

Say I train a binary classifier on $n$ balanced examples, then I evaluate the model on a test set which has the same (balanced) label distribution as the train set ($P \approx N$) and I compute $TPR$ (True Positive Rate), $FPR$ (False Positive Rate) and precision ($p$) at a particular threshold $t$.

The precision will be:
$$
p=\frac{TP}{TP+FP} = \frac{TPR\cdot P}{TPR\cdot P + FPR \cdot N} = \frac{TPR}{TPR + FPR\cdot\frac{N}{P}} \approx \frac{TPR}{TPR + FPR}
$$

Now, I evaluate the same model on a different, imbalanced test set (e.g. $N' = f \cdot P'$).

Is it correct to say that the precision on this new test set will be:
$$
p'=\frac{TP'}{TP'+FP'} = \frac{TPR\cdot P'}{TPR \cdot P' + FPR \cdot N'} = \frac{TPR}{TPR + FPR\cdot\frac{N'}{P'}}
=\frac{TPR}{TPR + FPR\cdot f}
$$

that is, to reuse the $TPR$ and $FPR$ values from the previous evaluation?

Intuitively, a trained model learns to correctly classify a fraction of the positive examples (i.e. $TPR$) and incorrectly classifies a fraction the negative examples (i.e. $FPR$). That being said, one can approximate other metrics on different test sets (precision, in this case), solely by making use of these 2 values.

Is my understanding correct? What are some possible caveats to this?

Best Answer

If you assume that positives and negatives are (on average) equally likely to get a positive or negative predicted label on the two datasets, then what you say is correct. That's the big caveat, of course, unless the balanced dataset was created by stratified random sampling from an imbalanced dataset, usually things other than the class balance change.

These other changes could be e.g. in the distribution of the predictors that the classifier uses, or even the relationship between the predictors and the class label (e.g. in part to changes in the distribution in unobserved or unused predictors). Such changes would then change the probability that positive and negative cases get a positive or negative prediction from the classifier.

However, the adjustment you describe is a good first approximation for the affect of the changed class balance. How large any other effects would be and in which direction they would go is, of course, very had to say in general.

Related Question