I'm trying to detect multicollinearity in my model, it has count response variable and some proportional and one categorical explanatory variable called site. In R the model looks like this:
glm(Total ~ percent_flower + Percent_typeofflower + Percent_ohtertypeoflower + Site,
family=poisson, data=pollinator)
I want to know whether I should calculate the vif()
without Site
, it doesn't seem to make sense to calculate a correlation between a numerical variable and a categorical one.
Also if I have to keep Site
in to calculate the vif()
, will running a GLMM instead of a GLM take care of the collinearity issue?
Best Answer
My recommendation, if you want to test for collinearity and have categorical variables as well as continuous ones and are using R is to use the
perturb
package. The idea here is to add small amounts of random noise to the variables - by adding uniform or normal noise to the continuous variables and by shifting some categorical ones - and seeing what happens to the coefficients.