Solved – If logistic regression is a linear classifier why does it fail on linearly separable data

linear modellogisticmachine learningregressionseparation

Logistic regression is a linear model, decision boundary generated is linear.

If the data points are linearly separable, then why does Logistic regression fail? Shouldn't it perform better on data that is actually linearly separable?

Best Answer

If the data are linearly separable with a positive margin, so that it can be separated by a plane in more than two (so infinitely many ways), then all those ways will maximize the probability, so the model maximizing the likelihood is not unique. So what the iterative method used to maximize the likelihood converges to is not unique. In that sense the logistic regression is unstable.

But a well implemented algorithm will find one of those solutions, and that might be good enough. And if what you want is the estimated probabilities, they will be the same for all those solutions.

So if your goal really is to find a separating plane with maximum margin, then logistic regression is the wrong method. If your goal is to estimate risk probabilities, the problem is another one: estimated risks of zero or one might be very unrealistic. So the answer to your question really depends on your goal, and you didn't tell us that.

About this question Brian Ripley in "Pattern Recognition and Neural Networks" says

We feel to much have been made of this. The difficulty is in inappropriate parameterization and the limits for infinite $\|\beta\|$ of the fitted posterior probabilities remain perfectly suitable fits, albeit sometimes predicting probability zero or one.

See Why does logistic regression become unstable when classes are well-separated? where all this is discussed with much more detail.