Solved – Does the VIF make sense for a model with categorical variables

generalized linear modelglmmmulticollinearitymultiple regressionvariance-inflation-factor

I'm trying to detect multicollinearity in my model, it has count response variable and some proportional and one categorical explanatory variable called site. In R the model looks like this:

glm(Total ~ percent_flower + Percent_typeofflower + Percent_ohtertypeoflower + Site, 
    family=poisson, data=pollinator)

I want to know whether I should calculate the vif() without Site, it doesn't seem to make sense to calculate a correlation between a numerical variable and a categorical one.

Also if I have to keep Site in to calculate the vif(), will running a GLMM instead of a GLM take care of the collinearity issue?

Best Answer

My recommendation, if you want to test for collinearity and have categorical variables as well as continuous ones and are using R is to use the perturb package. The idea here is to add small amounts of random noise to the variables - by adding uniform or normal noise to the continuous variables and by shifting some categorical ones - and seeing what happens to the coefficients.