Solved – How to know what the VIF limits should be for collinearity should be when doing binary logistic regression

multicollinearityregressionvariance-inflation-factor

Using SPSS, I have determined that for each of my IVs the VIF is between 1.2 to 5.1.

How do I know what is a good cut off and what is a bad one?
How is this determined?
Is it relative to the number of IVs or something else?

Best Answer

I am not sure there are straightforward answers to these questions, but here is my attempt. Let me start with where the variance inflation factor comes from. Let's say $\hat \beta_j$ is a least squares slope coefficient (I know you ask about logistic regression, but let's use OLS as an example because the application extends to other regression models, I will return to some exceptions) and its estimated sampling variance is

$$\widehat{\text{Var}}(\hat \beta_j)=\frac{1}{1-R^2_j}\times \frac{\hat\sigma^2}{(n-1)s^2_j}$$

where $\hat \sigma^2$ is the estimated error variance, $s^2_j$ is the sample variance of $X_j$, and $R^2_j$ is the squared multiple correlation coefficient for the regression of $X_j$ on the other predictors. The left side of the multiplication, $\frac{1}{1-R^2_j}$, is called variance inflation factor because it is the factor by which variance increases based on the (multiple) correlation between predictors. If $R^2_J$ is 0, this factor is 1 and does not inflate the variance. John Fox argues that it is not until $R_j$ approaches 0.9 (or, $R^2_j$ = 0.81) that the precision of estimation is halved (Fox 2016: 342). Adapting from the figure in his book (p.343), we can see how $\sqrt{\text{VIF}}$ [1] changes as $R^2_j$ increases:

As $R^2_j$ approaches 1, VIF approaches infinity and standard errors will blow up. The cut-off value recommended here is 4 for VIF. But Gordon argues that

A cutoff value of 4 or 10 is sometimes given for regarding a VIF as high. But, it is important to evaluate the consequences of the VIF in the context of the other elements of the standard error, which may offset it (such as sample size...) (Gordon, 2015: 451).

If you check the related questions on the right panel, you will see that how you interpret collinearity depends on your research objectives and analysis as well. Moreover, there are alternative methods to diagnose collinearity (see Peter Flom's answer here). You might want to check those too.

There is one more issue, i.e., the exceptions I mentioned above. The VIF may not be applicable to models where you have dummy regressors constructed from a polytomous categorical variable or polynomial regressors (Fox, 2016: 357). Fox and Monette (1992) introduced generalized variance inflation factor (GVIF) for these cases (Fox answered a question on GVIF here). An implementation of GVIF is avaliable in R package car. I guess this is not the case for SPSS. Still, I think you should take this issue into account.

[1]: Fox recommends using $\sqrt{\text{VIF}}$ instead of VIF, because "the precision of estimation of $B_j$ is most naturally expressed as the width of the confidence interval for this parameter, and because the width of the confidence interval is proportional to the standard deviation of $B_j$ (not its variance)" (Fox, 2016: 342).

Fox, John. 2016. Applied Regression Analysis and Generalized Linear Models. 3rd ed. Los Angeles: Sage Publications.

Fox, John and Georges Monette. 1992. “Generalized Collinearity Diagnostics.” Journal of the American Statistical Association 87(417):178–83.

Gordon, Rachel A. 2015. Regression Analysis for the Social Sciences. New York and London: Routledge.

Best Answer

Related Solutions

Solved – How to interpret VIF and Condition Index for the purpose of assessing reliability of formative measures

Solved – How to solve collinearity problems in OLS regression

Related Question