Solved – Multilevel: Can I include two dumthe variables of a 7-dumthe-set into a random slope

categorical datamultilevel-analysisrandomness

I am calculating a two-level linear multilevel analysis. A look at the random intercept random slope model showed me a significant decrease in my model deviance if I include two dummy variables. Those are from a set of 6 dummyvariables in my multilevel model (which was originally a categorical variable with 7 cagegories). The other dummies arent significant (I use the WALD statistics to judge that).

Can I put just those two dummy variables into the random part of my mixed-model or do I have to usw all six dummy variables even though the other four arent significant? Or do I run into any problem with i.e. the reference group of the six dummies or maybe the interpretation of the two dummies?

I googled, I looked into several books, but I do not find an answer to this.

My entities are countries. My dependent variable is a Likert-Scale. My independent variables on the individual level are mostly dummies, i.e. education is split into six dummy categories (with one category as the reference group). The metric variables are grand_mean centered. On the macro level I have four dummy variables (and a refernce group) and two metric grand_mean centered variables.
I calculated the empty model with just my dep.V. Then I put everything into the fixed part and after that I tested with the WALD statistics and the lr-test which variables I should let differ across the countries. My first two education categories and one metric variable were significant and I am unsure if I can let just two categories be random.

Does this help? I certainly can show some code but I do not know which part would be helpful.

Best Answer

Yes, you can. A dummy variable is no different, mathematically, from any other fixed effect you might choose to include. Presumably, if some of the categories all have the same impact on the response, it would make sense to zero them out and push their effect into the intercept term.

That said, using Wald statistics to cull variables is risky. You may get the right set of variables, but this isn't necessarily the case.

When you say "random slope", are you talking about the coefficient of the categorical variable? If so, I would do some model checking. Look at the estimated random effects and see if they are trying to cover small but real differences in your categories.

To clarify that last point: suppose I have 4 categories: A,B,C and D. I decide to omit the dummies associated with C and D. The intercept in the model now corresponds to the case where categories C or D occur. It's like I'm recoding to A, B and Other. But let's suppose that C and D really are real, but just fairly small.

Now fit the random effects model. You will get random intercepts for individuals coded "Other" ... but if you plot these against the true categories (C and D), you might find that the C effects are large and the D effects are small (say), or vice versa. When you add random effects, you are giving a bunch of extra parameters to your model, which it could use to cover up for defects of the model itself.

Related Solutions

Solved – Multilevel logistic regression with a random slope(s)

Do you mean that you are turning age, number of drugs, etc. into random effects? When you do that, you are assuming that the impact of age, drugs, comorbities, and so on, differs from hospital to hospital. And in some way, this impact is distinct from the basic "hospital" effect that you have already included. It doesn't surprise me that the model does not converge, since you are going to have a lot of parameters here, not all of which may be needed by the model. Useless parameters do not change the value of the likelihood much, so the optimizer could end up roaming around in a trough somewhere, not sure where to go next.

And furthermore if a particular effect is not random --- or the true variance is extremely small --- then the optimizer will be directed towards the boundary of the parameter space (since the variance can't be negative) - this would cause the optimizer to stop where the gradient was non-zero and the optimizer would throw a non-convergence error.

This is not strange. If slopes are needed in the model, but you omit them, the intercept parameter will have to cover the differences between hospitals by itself. This will force the intercept parameter to cover a wider range of values than it would otherwise need to, and hence it will have a larger variance.

enter image description here

Pretend that your data look like the figure. The true model is shown by the green lines: common intercept, but different slopes. However, if you fit a model with different intercepts and 0 slope, you will get something like the red lines, as the intercept struggles to carry the full burden of the variation in the data all by itself. That's why the variance increases when the slopes are omitted.

And by the way ... welcome to the site and good luck with your analysis.

Solved – How to interpret random intercept “BLUP” from the sjPlot package

See ranef(mymodel3), fixef(mymodel3), and coef(mymodel3). The random effects are the deviation from "global average" (i.e. the fixed effects), so when you sum up ranef + fixef you get coef.

Here's an example:

library(lme4)
fm1 <- lmer(Reaction ~ Days + (1 + Days | Subject), sleepstudy)

ranef(fm1)
#> $Subject
#>     (Intercept)        Days
#> 308   2.2585654   9.1989719
#> 309 -40.3985770  -8.6197032
#> 310 -38.9602459  -5.4488799
#> 330  23.6904985  -4.8143313
# ... truncated

fixef(fm1)
#> (Intercept)        Days 
#>   251.40510    10.46729

coef(fm1)
#> $Subject
#>     (Intercept)       Days
#> 308    253.6637 19.6662579
#> 309    211.0065  1.8475828
#> 310    212.4449  5.0184061
#> 330    275.0956  5.6529547
# ... truncatated

E.g. Subject 310: -40.3985770 (ranef) + 251.40510 (fixef) = 212.4449 (coef).

Best Answer

Related Solutions

Solved – Multilevel logistic regression with a random slope(s)

Solved – How to interpret random intercept “BLUP” from the sjPlot package

Related Question