Solved – How to test and avoid multicollinearity in mixed linear model

correlationlme4-nlmemixed modelmulticollinearityr

I am currently running some mixed effect linear models.

I am using the package "lme4" in R.

My models take the form:

model <- lmer(response ~ predictor1 + predictor2 + (1 | random effect))

Before running my models, I checked for possible multicollinearity between predictors.

I did this by:

Make a dataframe of the predictors

dummy_df <- data.frame(predictor1, predictor2)

Use the "cor" function to calculate Pearson correlation between predictors.

correl_dummy_df <- round(cor(dummy_df, use = "pair"), 2) 

If "correl_dummy_df" was greater than 0.80, then I decided that predictor1 and predictor2 were too highly correlated and they were not included in my models.

In doing some reading, there would appear more objective ways to check for multicollinearity.

Does anyone have any advice on this?

The "Variance Inflation Factor (VIF)" seems like one valid method.

VIF can be calculated using the function "corvif" in the AED package (non-cran). The package can be found at http://www.highstat.com/book2.htm. The package supports the following book:

Zuur, A. F., Ieno, E. N., Walker, N., Saveliev, A. A. & Smith, G. M. 2009. Mixed effects models and extensions in ecology with R, 1st edition. Springer, New York.

Looks like a general rule of thumb is that if VIF is > 5, then multicollinearity is high between predictors.

Is using VIF more robust than simple Pearson correlation?

Update

I found an interesting blog at:

http://hlplab.wordpress.com/2011/02/24/diagnosing-collinearity-in-lme4/

The blogger provides some useful code to calculate VIF for models from the lme4 package.

I've tested the code and it works great. In my subsequent analysis, I've found that multicollinearity was not an issue for my models (all VIF values < 3). This was interesting, given that I had previously found high Pearson correlation between some predictors.

Best Answer

For VIF calculation usdm can also be package ( I need to install "usdm")

library(usdm)
df = # Data Frame
vif(df)

If VIF > 4.0 then I generally assume multicollinearity remove all those Predictor Variables before fitting them into my model

Related Question