Regression – Logistic Regression in R with Many Predictors

logisticrregression

I have been running logistic regression in R, and have been having an issue where as I include more predictors the z-scores and respective p-values approach 0 and 1 respectively. For example if have few predictors:

> model1
b17 ~ i74 + i73 + i72 + i71
> step1<-glm(model1,data=newdat1,family="binomial")
Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -6.9461     1.8953  -3.665 0.000247 ***
i74           0.6842     0.9543   0.717 0.473384    
i73           1.7691     4.8008   0.368 0.712502    
i72           0.5134     2.0142   0.255 0.798812    
i71          -0.6753     4.9173  -0.137 0.890771    

The results appear to be fairly reasonable; however, if I have more predictors:

 > model1
b17 ~ i90 + i89 + i88 + i87 + i86 + i85 + i84 + i83 + i82 + i81 + 
i80 + i79 + i78 + i77 + i76 + i74 + i73 + i72 + i71
> step1<-glm(model1,data=newdat1,family="binomial")
Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred 
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.887e+02  3.503e+05  -0.001    0.999
i90          1.431e-01  1.009e+04   0.000    1.000
i89          8.062e+01  1.027e+05   0.001    0.999
i88          9.738e+01  7.398e+04   0.001    0.999
i87         -1.980e+01  9.469e+03  -0.002    0.998
i86          9.829e+00  1.098e+05   0.000    1.000
i85          5.917e+01  3.074e+04   0.002    0.998
i84         -2.373e+01  1.378e+05   0.000    1.000
i83          7.257e+00  2.173e+05   0.000    1.000
i82         -1.397e+01  1.894e+05   0.000    1.000
i81          6.503e+01  1.373e+05   0.000    1.000
i80          3.728e+01  4.904e+04   0.001    0.999
i79          1.010e+02  5.556e+04   0.002    0.999
i78         -2.628e+01  1.546e+05   0.000    1.000
i77          4.725e+01  3.027e+05   0.000    1.000
i76         -6.517e+01  1.509e+05   0.000    1.000
i74          1.267e+01  1.175e+05   0.000    1.000
i73          2.796e+02  5.280e+05   0.001    1.000
i72         -2.533e+02  4.412e+05  -0.001    1.000
i71         -1.240e+02  4.387e+05   0.000    1.000

I know it is hard to say exactly what is going on without seeing the data, but the predictors are all 5-point Likert Scale items. However, are there any thoughts to what is occurring here? I don't have much experience with logistic regression, so I apologize if the question seems naive, but is there a certain threshold of predictors where logistic regression falls apart due to having such a large amount of predictors what is ultimately a very small amount of variance? Is the potentially a multi-co-linearity issue? Finally, when I run OLS regression on the data I get results that make more sense (or at least appear to), is it okay/what are the consequences of running OLS regression on a binary outcome? Thank you!

Best Answer

Although the initial symptom was a type of problem seen in logistic regression, the underlying issue is that there are many predictor variables and only a comparatively small number of cases. That underlying issue needs to be addressed.

So first, if the outcome variable is binary you should not abandon logistic regression. The underlying issue will not go away by trying another type of analysis, even if it appears in a different form. For example, an ordinary least-squares model would tend to be highly over-fit (even if it were appropriate for binary outcomes) and thus highly unreliable. You said: "when I run OLS regression on the data I get results that make more sense (or at least appear to)" (emphasis added). Yes, the result of a regression on your data set might fit quite well, but in this situation your model would probably not apply beyond your initial data set.

Second, you can consider reducing the number of predictor variables based on prior knowledge of the subject matter. Likert items are often designed to be multiple questions aimed at a single opinion or personality trait, which are then combined to form a Likert scale as a better gauge of the opinion or trait. If prior knowledge of the subject matter allows combination of the 100 Likert items into 5 or 10 Likert scales as predictors, then the problem with the predictor/case ratio would be greatly diminished. The combination of multiple items into a smaller number of scales might also diminish problems resulting from a potentially incorrect assumption of equally-spaced influences of each of the 4 steps along each 5-point Likert item.

Third, although you say that you can't use PCA (for some unspecified reason; it's just a linear transformation of the original predictors) in this situation, note that the analysis of the correlation structure provided by PCA on the predictors, or clustering approaches, could well identify sets of items that are highly related, essentially measuring the same thing, and thus could be combined into a single predictor for analysis. It would seem that you would want to know these relations among the individual items in any event, so it's a bit concerning that you can't take the next obvious step into a principal-components regression (PCR).

Fourth, you can employ shrinkage methods to minimize the overfitting inevitable with a high ratio of predictors to cases. Ridge regression (unlike LASSO) would keep information from all your predictors, just weighting them differentially. If your objection to PCR is that you don't want to throw out any information from your predictors, then this might be a solution. (It's essentially a weighted principal-components regression, rather than the all-or-none selection of components in PCR.)