Solved – Softmax regression or $K$ binary logistic regression

classificationlogisticmachine learningmulti-class

For a multi-class classification problem, we can use $K$ binary logistic classifiers, or one softmax regression classifier, so how to make the choice between the two?

IMHO, the $K$ binary logistic classifiers is just the 1-vs-all scheme for multi-class, but softmax classifier inherently handles multi-class problem. Why should I prefer one over the other?

Best Answer

The softmax function gives a proper probability for each of the possible classes:
$$ P(y=j|x,\{w_k\}_{k=1...K}) = \frac{e^{x^\top w_j}}{\sum_{k=1}^K e^{x^\top w_k}} $$

This is nice if you want to interpret your classification problem in a probabilistic setting. Benefits of using the probabilistic formulation include being able to place priors on the parameters and obtaining a posterior distribution over classes.

That said, maybe you can imagine a really good classifier that isn't of this form. Perhaps it is of a form that is generally difficult to express (e.g. SVM -- here for multi-class details). If some such complicated classifier works well for you on a given task, perhaps you don't want to use the [potentially weaker] softmax classifier. In such a setting, there may not be a clear all-way output, so you have to settle for repeated one-vs-others classification schemes.

One more counterpoint...you could also augment the expressive power of the softmax-style approach by changing the input to the exponential. For example, it would be straightforward to replace each linear component $x^\top w_j$ with a quadratic expression $x^\top w_j + x^\top A_j x$. Other such augmentations are conceivable.

Related Question