I am fitting a binomial logistic regression in R using glm. By chance, I have found out that if I change the order of my predictor variables, glm fails to estimate the model. The message I get is unexpected result from lpSolveAPI for primal test.
I am using the safeBinaryRegression package, so I am confident there are no separation issues between my outcome and predictor variables. However, I am not so confident that there are no quasi-separation issues among my predictor variables. Am I correct that if this is the case, then I might be running into multicolinearity, and this is the source of glm not being able to fit the model?
If so, my question is for advice on how to approach the issue. Should I look for the predictor variables highly correlated and omit one of them? Is there any convenient way of doing so for 11 categorical predictors?
What I see right now:
lModel <- glm(mob_change ~ education + gender + start_age + income + dist_change + lu_change + dou_change + marriage + student2work + wh_change,
data = regression_data,
family = binomial())
# Fine, and I can inspect the model. No predictor has std. error > 1.05
# Now if I move the last variable (or any of the last three, for what I've tested) to
# be the first in predictor...
lModel.3 <- glm(mob_change ~ wh_change + gender + education + start_age + income + dist_change + lu_change + dou_change + marriage + student2work,
data = regression_data,
family = binomial())
Error in separator(X, Y, purpose = "find") :
unexpected result from lpSolveAPI for primal test
Best Answer
The order the predictors are entered into the model is of course irrelevant to the question of whether there's separation in the data. The
safeBinaryRegression
package† masks the usualglm
function from thestats
package, which fits generalized linear models, so that, for logistic regression,glm
uses a linear programming algorithm to check for both complete & quasi-complete separation before trying to fit anything. If it finds separation it reportsor
depending on whether you've asked it just to test for separation or to find the predictors causing it. Otherwise it reports nothing.
,however, is a software error message, not a statistical one. You could perhaps try on a machine with more memory, but it's probably safe to trust the results from when you didn't get an error. Using
stats:::glm
(i.e using theglm
function fromstats
whensafeBinaryRegression
's loaded) should reach the same results regardless of the order of predictors; it will often report non-convergence or predicted probabilities of nought or one in cases of separation.Multicollinearity among the predictors is another issue entirely. Generalized variance inflation factors (see the
vif
function‡ from thecar
package) are useful for assessing its extent in models with more than one degree of freedom per predictor.† Konis (2007), "Linear programming algorithms for detecting separated data in binary logistic regression models", DPhil., U. Oxf.
‡ Fox & Monette (1992), "Generalized collinearity diagnostics", JASA, 87, pp178–183.