First, rules of thumb should not be mistaken for rules per se. It is doubtful that anyone except you could provide a satisfactory answer to the question of how much collinearity is too much. If you look in different regression texts you'll find various rules of thumb about the VIF and indeed various degrees of willingness to put forth a rule of thumb at all.
Why should you believe me (or anyone else) if I were to say, "tripling the uncertainty around a regression coefficient is fine, but quadrupling it is out of the question"? The important thing is to know what the collinearity's consequences are for your parameter estimates and, once you settle on a regression solution, to be honest about reporting those consequences with your other model information.
Having said all that, I can think of few situations in which a VIF of 4 or 5 would not make me search for a less collinear solution! The reciprocal of VIF is tolerance; think carefully about whether a predictor is effective if all but 20% of its information content can be accounted for by the other predictors. In such a situation, attempts to assess the relative contributions of the predictors will be seriously hampered. You'll want to consider whether that matters to you in this current project.
When variables are co-linear, you can think of them sometimes as being different manifestations of the same thing. Say I had a dataset of happiness of cats, a variable of whether they were soaking wet, and a variable of whether or not there were nearby children who thought it was fun to throw cats into water. Clearly cats don't like water, yet sometimes they will fall into it on their own. More often however, they are thrown in by malevolent children. Sometimes however, malevolent children fail to thrown cats in the water.
So, wet cats
and malevolent children
are different, but can be thought of as a unitary dynamic. If a researcher was only interested in the effect of wetness on cat happiness, and didn't control for malevolent children, the estimates would be biased. Include them, and VIF goes up. This is because you simply don't have enough independent observations of wetness to know its effect apart from the effect of malevolent children.
Shrinkage estimators are one way to go. Basically, you increase the bias of your estimator in order to decrease its variance. Appealing for prediction, but not for inference.
If you're willing to put aside (or think differently about) inference on individual model terms, you could first do a principal components analysis, "interpret" your principal components somehow, and then fit your regression to the rotated dataset. Collinearity will be gone, but you're only able to conduct inference on these PC's, which may or may not have a convenient interpretation. In the case of wet cats
and malevolent children
, the first PC would increase as the probability of wetness got higher and as the probability of malevolent children increased. The other PC would be perpendicular, and relate to wetness as the probability of malevolent children decreased. If you simply wanted to know the effect of wetness absent malevolent children, you'd be interested in the coefficient on the second PC. Most PC regs don't have interpretation this straightforward however.
It is also worth emphasizing that prediction from a model with high collinearity is fine. So if your F-stat is good and you don't care about any of the coefficients individually, leave the model as it is.
Best Answer
I am not sure there are straightforward answers to these questions, but here is my attempt. Let me start with where the variance inflation factor comes from. Let's say $\hat \beta_j$ is a least squares slope coefficient (I know you ask about logistic regression, but let's use OLS as an example because the application extends to other regression models, I will return to some exceptions) and its estimated sampling variance is
$$\widehat{\text{Var}}(\hat \beta_j)=\frac{1}{1-R^2_j}\times \frac{\hat\sigma^2}{(n-1)s^2_j}$$
where $\hat \sigma^2$ is the estimated error variance, $s^2_j$ is the sample variance of $X_j$, and $R^2_j$ is the squared multiple correlation coefficient for the regression of $X_j$ on the other predictors. The left side of the multiplication, $\frac{1}{1-R^2_j}$, is called variance inflation factor because it is the factor by which variance increases based on the (multiple) correlation between predictors. If $R^2_J$ is 0, this factor is 1 and does not inflate the variance. John Fox argues that it is not until $R_j$ approaches 0.9 (or, $R^2_j$ = 0.81) that the precision of estimation is halved (Fox 2016: 342). Adapting from the figure in his book (p.343), we can see how $\sqrt{\text{VIF}}$ [1] changes as $R^2_j$ increases:
As $R^2_j$ approaches 1, VIF approaches infinity and standard errors will blow up. The cut-off value recommended here is 4 for VIF. But Gordon argues that
If you check the related questions on the right panel, you will see that how you interpret collinearity depends on your research objectives and analysis as well. Moreover, there are alternative methods to diagnose collinearity (see Peter Flom's answer here). You might want to check those too.
There is one more issue, i.e., the exceptions I mentioned above. The VIF may not be applicable to models where you have dummy regressors constructed from a polytomous categorical variable or polynomial regressors (Fox, 2016: 357). Fox and Monette (1992) introduced generalized variance inflation factor (GVIF) for these cases (Fox answered a question on GVIF here). An implementation of GVIF is avaliable in
R
packagecar
. I guess this is not the case for SPSS. Still, I think you should take this issue into account.[1]: Fox recommends using $\sqrt{\text{VIF}}$ instead of VIF, because "the precision of estimation of $B_j$ is most naturally expressed as the width of the confidence interval for this parameter, and because the width of the confidence interval is proportional to the standard deviation of $B_j$ (not its variance)" (Fox, 2016: 342).
Fox, John. 2016. Applied Regression Analysis and Generalized Linear Models. 3rd ed. Los Angeles: Sage Publications.
Fox, John and Georges Monette. 1992. “Generalized Collinearity Diagnostics.” Journal of the American Statistical Association 87(417):178–83.
Gordon, Rachel A. 2015. Regression Analysis for the Social Sciences. New York and London: Routledge.