Solved – Is it appropriate to test for collinearity in a mixed model using VIF

mixed modelmulticollinearityvariance-inflation-factor

My study is examining predictors of skin lesions in pigs. I am looking at the effect of predictor variables (including weight at 4 weeks, 9 weeks and 20 weeks) and I have carried out a mixed model analysis with 2 cross-classified random factors (weaning pen and finishing pen) and several predictor variables. I have checked for collinearity using the VIF test in SPSS.

Another author on the paper (who is much more familiar with statistics than me) has commented that the variance inflation factor is not an appropriate measure to use in a multi-level model because 'it assumes the errors are independent identically distributed, with a multi-level model you have clustered errors'.
Is this correct? If so, how should collinearity be examined in a mixed model?

I also centred within pen weight as I wanted to look at how variation in weight, in relation to the pen-mean weight, affected skin lesion scores. The second author commented that I should have standardised weight rather than centred it.

Who is correct on these 2 issues?

Best Answer

First, if you are going with one of the usual methods of testing for collinearity, condition indexes are better than variance inflation factors. This was shown by David Belsley in his two books; I also wrote about it in my dissertation.

Second, I think that the methods used in the perturb package in R are very promising. The idea there is to add small amounts of noise to the data and then see how it affects results. Not only does this make intuitive sense to me, but it allows you to test for collinearity in categorical variables and makes no assumptions about the model.

Related Solutions

Solved – Why do coefficients and significance levels change so much in the OLS? Do I need MIXED model instead

The fact that additional variables make large changes in your model doesn't mean there is collinearity nor does it mean it isn't "robust" (although I guess that depends on what you mean by "robust"). Nor does a reasonable VIF and a high F mean that additional variables won't have an effect. Only completely independent variables will have no effect on the other coefficients, and variables can be a long way from independent without having high VIF.

It means that controlling for those additional variables makes a large difference in the other relationships.

You haven't said what any of these variables are, so it's hard to be specific. However, let's imagine simpler models:

Model 1: log(income) as effect of race/ethnic group Model 2: log(income) as effect of race/ethnic group + age

The coefficients in both models will likely all be significant (if it's a reasonable sample size). But the coefficient of race/ethnicity will (I bet) drop in the 2nd model because the average age of different ethnic groups is different, and income tends to be related to age.

EDIT in response to edit in post:

Since you have country level and company level variables, you should not use OLS regression as the data are not independent. You need to account for this. One way is with mixed models (aka multilevel models). I don't know how SPSS does this, but in SAS it would be PROC MIXED and in R it would be nlme or lme4.

Solved – VIF for generalized linear model

If we look at the function

library(car)
getS3method("vif", "default")
#R function (mod, ...) 
#R {
#R     v <- vcov(mod)
#R     assign <- attr(model.matrix(mod), "assign")
#R     [...]
#R     terms <- labels(terms(mod))
#R     n.terms <- length(terms)
#R     [...]
#R     R <- cov2cor(v)
#R     detR <- det(R)
#R     result <- matrix(0, n.terms, 3)
#R     rownames(result) <- terms
#R     colnames(result) <- c("GVIF", "Df", "GVIF^(1/(2*Df))")
#R     for (term in 1:n.terms) {
#R      subs <- which(assign == term)
#R      result[term, 1] <- det(as.matrix(R[subs, subs])) * det(as.matrix(R[-subs, 
#R          -subs]))/detR
#R      result[term, 2] <- length(subs)
#R     }
#R     if (all(result[, 2] == 1)) 
#R      result <- result[, 1]
#R     else result[, 3] <- result[, 1]^(1/(2 * result[, 2]))
#R     result
#R }

then it calls vcov which will differ for a glm then lm. In the glm case it depends on the outcome. Thus, you get the different results. All the above is consistent with the 1992 article

Fox, J., & Monette, G. (1992). Generalized collinearity diagnostics. Journal of the American Statistical Association, 87(417), 178-183.

in the linear model case. See particularly Equation (10) and

#R      result[term, 1] <- det(as.matrix(R[subs, subs])) * det(as.matrix(R[-subs, 
#R          -subs]))/detR

To the question

Is the variance inflation factor useful for GLM models

Then I gather that the results in the 1992 article may still hold asymptotically. However, some pen and paper is likely need to justify this claim and I am may be wrong.

Best Answer

Related Solutions

Solved – Why do coefficients and significance levels change so much in the OLS? Do I need MIXED model instead

Solved – VIF for generalized linear model

Related Question