Logistic Regression – Addressing Instability in Logistic Regression When Data is Not Well Separated

logisticrregression

There are some good answers discussing convergence issues of logistic regression
when the data are well separated here and here. I am wondering what
can cause convergence issues when the data are not well separated.

As an example, I have the following data, df

   y       x1         x2
1  0 66.06402 -1.0264739
2  1 58.40813  0.2887934
3  1 58.58011  0.2626232
4  0 59.05929 -0.5286438
5  0 55.81817 -1.3184894
6  0 58.00018 -0.8445602
7  1 69.53926 -1.1018149
8  0 55.73621 -0.9000901
9  1 79.80170  0.6690657
10 0 55.40042  0.6600415
11 0 57.42124 -0.7237973
12 1 78.22012 -0.8121816
13 0 53.54296  0.2265636
14 1 56.14096  0.4216436
15 1 66.90146  0.6189839
16 0 50.40008  0.4311339

Fitting a logistic regression in R, I am getting a
glm.fit: fitted probabilities numerically 0 or 1 occurred warning message even
though the data are non-separable

> attach(df)
> safeBinaryRegression::glm(y ~ x1 + x2, family=binomial)

Call:  safeBinaryRegression::glm(formula = y ~ x1 + x2, family = binomial)

Coefficients:
(Intercept)           x1           x2  
    -82.930        1.395       10.255  

Degrees of Freedom: 15 Total (i.e. Null);  13 Residual
NullDeviance:       21.93 
Residual Deviance: 5.927    AIC: 11.93
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

A visual confirmation that the data are in fact non-separable is also included

Non Separable Data

Removing the red point seems resolve the convergence issues, however I am
at a bit of a loss for why this is.

> df2 <- df[-c(9),]
> detach(df)
> attach(df2)
> safeBinaryRegression::glm(y ~ x1 + x2, family=binomial)

Call:  safeBinaryRegression::glm(formula = y ~ x1 + x2, family = binomial)

Coefficients:
(Intercept)           x1           x2  
    -82.930        1.395       10.255  

Degrees of Freedom: 14 Total (i.e. Null);  12 Residual
Null Deviance:      20.19 
Residual Deviance: 5.927    AIC: 11.93

Best Answer

The warning about "fitted probabilities numerically 0 or 1" might be useful for diagnosing separability, but these issues are only indirectly related.

Here is a dataset and a binomial GLM fit (in gray) where there is enough overlap among the $x$ values for the two response classes that there is little concern about separability. In particular, the estimate of the $x$ coefficient of $2.35$ is modest and significant: its standard error is only $1.1$ $(p=0.03)$. The gray curve shows the fit. Corresponding to values on this curve are their log odds, or "link" function. Those I have indicated with colors; the legend gives the common (base-10) logs. The software flags fitted values that are within $2.22\times 10^{-15}$ of either $0$ or $1$. Such points have white halos around them.

Figure 1

All that's going on here is there's such a wide range of $x$ values that for some points, the fit is very, very close to $0$ (for very negative $x$) or very, very close to $1$ (for the most positive $x$). This isn't a problem in this case.

It might be a problem in the next example. Now a single outlying value of $x$ triggers the warning message.

Figure 2

How can we assess this? Simply delete the datum and re-fit the model. In this example, it makes almost no difference: the coefficient estimate does not change, nor does the p-value.

Finally, to check a multiple regression, first form the linear combinations of the coefficient estimates and the variables, $x_i\hat\beta$: this is the link function. Plot the responses against these values exactly as above and study the patterns, looking at (a) the degree to which the 1's overlap the 0's (which assesses separability) and (b) the points with extreme values of the link.

Here is the plot for your data:

Figure 3

The point at the far right corresponds to the red dot in your figure: the fitted value is $1$ because that dot is far from the area where 0's transition to 1's. If you remove it from the data, nothing changes. Thus, it's not influencing the results. This graph indicates you have obtained a reasonable fit.

You can also see that slight changes in the values of $x_1$ or $x_2$ at a couple of critical points (those near $0$) could create perfect separation. But is this really a problem? It would only mean that the software could no longer distinguish between this fit and other fits with arbitrarily sharp transitions near $x\beta=0$. However, all would produce similar predictions at all points sufficiently far from the transition line and the location of that line would still be fairly well estimated.