I'm glad you like my answer :-)
It's not that there is no valid method of detecting collinearity in logistic regression: Since collinearity is a relationship among the independent variables, the dependent variable doesn't matter.
What is problematic is figuring out how much collinearity is too much for logistic regression. David Belslely did extensive work with condition indexes. He found that indexes over 30 with substantial variance accounted for in more than one variable was indicative of collinearity that would cause severe problems in OLS regression. However, "severe" is always a judgment call. Perhaps the easiest way to see the problems of collinearity is to show that small changes in the data make big changes in the results.
[this paper http://www.medicine.mcgill.ca/epidemiology/joseph/courses/epib-621/logconfound.pdf] offers examples of collinearity in logistic regression. It even shows that R
detects exact collinearity, and, in fact, some cases of approximate collinearity will cause the same warning:
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
Nevertheless, we can ignore this warning and run
set.seed(1234)
x1 <- rnorm(100)
x2 <- rnorm(100)
x3 <- x1 + x2 + rnorm(100, 0, 1)
y <- x1 + 2*x2 + 3*x3 + rnorm(100)
ylog <- cut(y, 2, c(1,0))
m1<- glm(ylog~x1+x2+x3, family = binomial)
coef(m1)
which yields -2.55, 1.97, 5.60 and 12.54
We can then slightly perturb x1 and x2, add them for a new x3 and run again:
x1a <- x1+rnorm(100,0,.01)
x2a <- x2+rnorm(100,0, .01)
x3a <- x1a + x2a + rnorm(100, 0, 1)
ya <- x1a + 2*x2a + 3*x3a + rnorm(100)
yloga <- cut(ya, 2, c(1,0))
m2<- glm(ylog~x1a+x2a+x3a, family = binomial)
coef(m2)
this yields wildly different coefficients: 0.003, 3.012, 3.51 and -0.41
and yet, this set of independent variables does not have a high condition index:
library(perturb)
colldiag(m1)
says the maximum condition index is 3.54.
I am unaware if anyone has done any Monte Carlo studies of this; if not, it seems a good area for research
The VIF has been generalized to deal with logistic regression (assuming you mean a model with a binary dependent variable). In R, you can do this using the vif
function in the car
package.
As @RichardHardy has said, it is not a test though. At the end you will get some GVIFs and still need to make some subjective decisions. The thing to keep in mind is that if you have high VIFs, it means that your standard errors will be inflated from some of your estimates, so results that could be meaningful may not be detected as being significant. The books and writings by John Fox, who also co-wrote the car package, are a great resource for understanding multicollinearity.
Best Answer
All of the same principles concerning multicollinearity apply to logistic regression as they do to OLS. The same diagnostics assessing multicollinearity can be used (e.g. VIF, condition number, auxiliary regressions.), and the same dimension reduction techniques can be used (such as combining variables via principal components analysis).
This answer by chl will lead you to some resources and R packages for fitting penalized logistic models (as well as a good discussion on these types of penalized regression procedures). But some of your comments about "solutions" to multicollinearity are a bit disconcerting to me. If you only care about estimating relationships for variables that are not collinear these "solutions" may be fine, but if your interested in estimating coefficients of variables that are collinear these techniques do not solve your problem. Although the problem of multicollinearity is technical in that your matrix of predictor variables can not be inverted, it has a logical analog in that your predictors are not independent, and their effects cannot be uniquely identified.