Logistic Regression – When is Logistic Regression Suitable?

classificationlogisticmachine learningregressionregression-strategies

I'm currently teaching myself how to do classification, and specifically I'm looking at three methods: support vector machines, neural networks, and logistic regression. What I am trying to understand is why logistic regression would ever perform better than the other two.

From my understanding of logistic regression, the idea is to fit a logistic function to the entire data. So if my data is binary, all my data with label 0 should be mapped to the value 0 (or close to it), and all my data with value 1 should be mapped to value 1 (or close to it). Now, because the logistic function is continuous and smooth, performing this regression requires all my data to fit the curve; there is no greater importance applied to data points near the decision boundary, and all data points contribute to the loss by different amounts.

However, with support vector machines and neural networks, only those data points near the decision boundary are important; as long as a data point remains on the same side of the decision boundary, it will contribute the same loss.

Therefore, why would logistic regression ever outperform support vector machines or neural networks, given that it "wastes resources" on trying to fit a curve to lots of unimportant (easily-classifiable) data, rather than focussing only on the difficult data around the decision boundary?

Best Answer

The resources you consider to be "wasted" are, in fact, information gains provided by logistic regression. You started out with the wrong premise. Logistic regression is not a classifier. It is a probability/risk estimator. Unlike SVM, it allows for and expects "close calls". It will lead to optimum decision making because it does not try to trick the predictive signal into incorporating a utility function that is implicit whenever you classify observations. The goal of logistic regression using maximum likelihood estimation is to provide optimum estimates of Prob$(Y=1|X)$. The result is used in many ways, e.g. lift curves, credit risk scoring, etc. See Nate Silver's book Signal and the Noise for compelling arguments in favor of probabilistic reasoning.

Note that the dependent variable $Y$ in logistic regression can be coded any way you want: 0/1, A/B, yes/no, etc.

The primary assumption of logistic regression is that $Y$ is truly binary, e.g. it was not contrived from an underlying ordinal or continuous response variable. It, like classification methods, is for truly all-or-nothing phenomena.

Some analysts think that logistic regression assumes linearity of predictor effects on the log odds scale. That was only true when DR Cox invented the logistic model in 1958 at a time when computing wasn't available to extend the model using tools such as regression splines. The only real weakness in logistic regression is that you need to specify which interactions you want to allow in the model. For most datasets this is turned into a strength because the additive main effects are generally much stronger predictors than interactions, and machine learning methods that give equal priority to interactions can be unstable, hard to interpret, and require larger sample sizes than logistic regression to predict well.