Optimize classification rule in multinomial logistic regression

classificationlogisticmulti-classunbalanced-classes

We know that in the case of logistic regression, a classification threshold p=0.5 is generally not an optimal choice when seeking to optimise sensitivity and specificity. This is generally due to the fact that the dataset is unbalanced. To solve this problem, one can simply vary the threshold and take the one that verifies a certain criterion (e.g. that maximises sensitivity+specificity, or such that sensitivity=specificity etc.).

However, in the case of multinomial logistic regression (with 3 or more classes), I find much less literature on the subject to determine robust decision rules. All software I know does a classification using a maximum a posteriori, but I am not satisfied with this solution, in the same way that p=0.5 is rarely satisfactory in the binary case. I imagine it is much more difficult with 3 or more classes, as one can put more emphasis on the sensitivity/specificity to a particular class where it was not an issue in the 2 class case.

So let's say I have 3 classes (or N classes), with an unbalanced dataset, and I don't favour any class (i.e I give equal weight to sensitivity and specificity for each class), how should I make my decision from the posterior probabilities returned by the multinomial logistic regression?

Best Answer

The question is ill-posed. If you are trying to optimise some function of specificity and sensitivity (other than accuracy) for a logistic regression model by altering the classification threshold, you are favouring one class over another (i.e. giving unequal misclassification costs to the two classes). The 0.5 threshold optimises accuracy (i.e. the expected misclassification cost for equal misclassification costs), not specificity + sensitivity or minimise abs(specificity - sensitivity). When you maximise specificity + sensitivity, you are doing so by emphasising one class over the other (giving them extra weight).

To clarify, specificity + sensitivity is the balanced error rate, so if your dataset has more patterns of one class than another, you are downweighting that class when you look at the balanced error rate (which is the error rate if both classes are equally likely a-priori).

So for more than two classes, you can just use cost-sensitive approaches to make the decision based on the probabilities estimated by the model, but you can't do so without "favour[ing] any class", for the same reason you can't do that for a logistic regression model.

Let the risk of classifying pattern $x$ as belonging to class $i$ be

$$R(C_i|x) = \sum_{c=1}^CL_{ij}P(C_j|x)$$

Where $L_{ij}$ is the penalty (loss) suffered when classifying a pattern as belonging to class $i$ when it actually belongs to class $j$. Simply classify the pattern as belonging to the class that minimises the risk. That is the equivalent procedure for multiple classes.