Solved – How to deal with perfect separation in logistic regression

logisticrregressionseparation

If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message:

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 

We still get the model but the coefficient estimates are inflated.

How do you deal with this in practice?

Best Answer

A solution to this is to utilize a form of penalized regression. In fact, this is the original reason some of the penalized regression forms were developed (although they turned out to have other interesting properties.

Install and load package glmnet in R and you're mostly ready to go. One of the less user-friendly aspects of glmnet is that you can only feed it matrices, not formulas as we're used to. However, you can look at model.matrix and the like to construct this matrix from a data.frame and a formula...

Now, when you expect that this perfect separation is not just a byproduct of your sample, but could be true in the population, you specifically don't want to handle this: use this separating variable simply as the sole predictor for your outcome, not employing a model of any kind.