I have a dataset consisting of the following: one column language
containing five different languages. Two other columns Canonicity
and Intrinsic
containing either (0, 1). One last column, useOfIntrinsic
. You can view the data here.
I would like to test the use of intrinsic as a function of Language
, Canonicity
and useOfIntrinsic
. Thus, I ran the following mixed-effect logistic regression model:
glmer(INT ~ Language * Canonicity + Language + Canonicity + useOfIntrinsic +
(1|Picture) + (1|ID), data = data, family = "binomial")
I also tried:
glmer(INT ~ Language + Canonicity + useOfIntrinsic:Language + Canonicity:CAN +
useOfIntrinsic + (1|Picture) + (1|ID), data = data, family = "binomial")
However, I get this error:
fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
I do not get the error when I exclude the useOfIntrinsic
factor. This factor is basically is the count of intrinsic==1
for each Language
. I add this factor in order to test whether overall use of intrinsic is a good predictor intrinsic
.
There are other post that talk about this error (e.g. What is rank deficiency, and how to deal with it?) but I am still unable to fix the error.
Another related question is whether I should reduce the significance level when running the same model 5 times (or order to change the reference language group)?
Best Answer
In the data you link to,
Language
anduseOfIntrinsic
encode the exact same information. Think about it this way:Language
gives the anova flexibility to estimate the mean for each language independently. Once this has been done, there is no additional among-language variation floating around to estimate the effect ofuseOfIntrinsic
. Or think about it this way: imagine that the effect ofuseOfIntrinsic
is absolutely anything you'd like. The model can't know if you're right or wrong, because whatever predictions it makes about each language based onuseOfIntrinsic
, it can just use the effect ofLanguage
to offset those predictions and give the correct group mean. So there's no way to estimate theuseOfIntrinsic
effect whenLanguage
is also in the model.One final way to think about it. You can think of the model you are trying to fit as asking for an estimate of the effect of
useOfIntrinsic
while controlling for the effect ofLanguage
. But once you've controlled for the effect ofLanguage
, you've already completely dealt with the differences between languages that you might want to attribute touseOfIntrinsic
. To put both variables in a single model, you either need some independent variation in the two variables (i.e. some variation inuseOfIntrinsic
within a single language), or you need to place some additional constraints on how you estimate the effect ofLanguage
. One possibility would be to experiment with estimatingLanguage
as a random effect, but I don't necessarily recommend this given that you only have five languages in the sample.You do not need to apply any correction for changing which language is the reference group. This is not a situation where you are estimating five different models--this is just five different parameterizations of the exact same model. You are looking at the exact same results five different ways. The results will be the exact same each time, up to the appropriate constants involved in the reparameterization.