Solved – lme4_fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

lme4-nlmelogisticr

I have a dataset consisting of the following: one column language containing five different languages. Two other columns Canonicity and Intrinsic containing either (0, 1). One last column, useOfIntrinsic. You can view the data here.

I would like to test the use of intrinsic as a function of Language, Canonicity and useOfIntrinsic. Thus, I ran the following mixed-effect logistic regression model:

glmer(INT ~ Language * Canonicity + Language + Canonicity + useOfIntrinsic +
      (1|Picture) + (1|ID), data = data, family = "binomial")

I also tried:

glmer(INT ~ Language + Canonicity + useOfIntrinsic:Language + Canonicity:CAN +
      useOfIntrinsic + (1|Picture) + (1|ID), data = data, family = "binomial")

However, I get this error:

fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

I do not get the error when I exclude the useOfIntrinsic factor. This factor is basically is the count of intrinsic==1 for each Language. I add this factor in order to test whether overall use of intrinsic is a good predictor intrinsic.

There are other post that talk about this error (e.g. What is rank deficiency, and how to deal with it?) but I am still unable to fix the error.

Another related question is whether I should reduce the significance level when running the same model 5 times (or order to change the reference language group)?

Best Answer

In the data you link to, Language and useOfIntrinsic encode the exact same information. Think about it this way: Language gives the anova flexibility to estimate the mean for each language independently. Once this has been done, there is no additional among-language variation floating around to estimate the effect of useOfIntrinsic. Or think about it this way: imagine that the effect of useOfIntrinsic is absolutely anything you'd like. The model can't know if you're right or wrong, because whatever predictions it makes about each language based on useOfIntrinsic, it can just use the effect of Language to offset those predictions and give the correct group mean. So there's no way to estimate the useOfIntrinsic effect when Language is also in the model.

One final way to think about it. You can think of the model you are trying to fit as asking for an estimate of the effect of useOfIntrinsic while controlling for the effect of Language. But once you've controlled for the effect of Language, you've already completely dealt with the differences between languages that you might want to attribute to useOfIntrinsic. To put both variables in a single model, you either need some independent variation in the two variables (i.e. some variation in useOfIntrinsic within a single language), or you need to place some additional constraints on how you estimate the effect of Language. One possibility would be to experiment with estimating Language as a random effect, but I don't necessarily recommend this given that you only have five languages in the sample.

You do not need to apply any correction for changing which language is the reference group. This is not a situation where you are estimating five different models--this is just five different parameterizations of the exact same model. You are looking at the exact same results five different ways. The results will be the exact same each time, up to the appropriate constants involved in the reparameterization.