Solved – Multilevel: Can I include two dumthe variables of a 7-dumthe-set into a random slope

categorical datamultilevel-analysisrandomness

I am calculating a two-level linear multilevel analysis. A look at the random intercept random slope model showed me a significant decrease in my model deviance if I include two dummy variables. Those are from a set of 6 dummyvariables in my multilevel model (which was originally a categorical variable with 7 cagegories). The other dummies arent significant (I use the WALD statistics to judge that).

Can I put just those two dummy variables into the random part of my mixed-model or do I have to usw all six dummy variables even though the other four arent significant? Or do I run into any problem with i.e. the reference group of the six dummies or maybe the interpretation of the two dummies?

I googled, I looked into several books, but I do not find an answer to this.


My entities are countries. My dependent variable is a Likert-Scale. My independent variables on the individual level are mostly dummies, i.e. education is split into six dummy categories (with one category as the reference group). The metric variables are grand_mean centered. On the macro level I have four dummy variables (and a refernce group) and two metric grand_mean centered variables.
I calculated the empty model with just my dep.V. Then I put everything into the fixed part and after that I tested with the WALD statistics and the lr-test which variables I should let differ across the countries. My first two education categories and one metric variable were significant and I am unsure if I can let just two categories be random.

Does this help? I certainly can show some code but I do not know which part would be helpful.

Best Answer

Yes, you can. A dummy variable is no different, mathematically, from any other fixed effect you might choose to include. Presumably, if some of the categories all have the same impact on the response, it would make sense to zero them out and push their effect into the intercept term.

That said, using Wald statistics to cull variables is risky. You may get the right set of variables, but this isn't necessarily the case.

When you say "random slope", are you talking about the coefficient of the categorical variable? If so, I would do some model checking. Look at the estimated random effects and see if they are trying to cover small but real differences in your categories.

To clarify that last point: suppose I have 4 categories: A,B,C and D. I decide to omit the dummies associated with C and D. The intercept in the model now corresponds to the case where categories C or D occur. It's like I'm recoding to A, B and Other. But let's suppose that C and D really are real, but just fairly small.

Now fit the random effects model. You will get random intercepts for individuals coded "Other" ... but if you plot these against the true categories (C and D), you might find that the C effects are large and the D effects are small (say), or vice versa. When you add random effects, you are giving a bunch of extra parameters to your model, which it could use to cover up for defects of the model itself.