Solved – Model specification for glmer (lme4) with varying slope

lme4-nlmemixed model

I am estimating a mixed model using lme4. I need to have varying intercept terms (because I am post-stratifying my results to census categories).

However, I also want to add a random slope term for the effect of income (allowing it to vary by state). This is the model that I estimate:

dependent.var ~ (1 | state) + (1 | race) + (1 | female) + (1 | age) + (1 | edu) + (1 | income) + (1 | region) + (1 + income | state) + percent.dem.vote + state.avg.income

Question: Is it appropriate since I include both (1|income) and (1+income|state)?

I understand it is calculating a random intercept for income, and it is also calculating random intercepts and slopes for income for each state. I am not interested in evaluating the coefficients on the variables, but rather that the model is able to produce a valid prediction of the dependent variable for different combinations of the demographics in the model.

Is there bias in estimation for a multilevel model if both a random intercept and a random slope/intercept term are included in the model?

Best Answer

To answer this question, I think that it's important here to note why the random effects are needed in the first place. If you have repeated measurements on the same individuals, or if the observations are clustered in some other sense (students sampled from a few different school, trees from the some areas, etc), your observations are not independent of each other. Since OLS regression assumes that your observations are independent, it ignores this dependency, which can lead to biased estimates.

So in order to handle this, we either need to specify a correct correlation matrix for the error term (as opposed to assuming that it is iid normal), or we need to model the correlation through use of random intercepts and slopes.

Typically, a mixed effects model would be specificed in the following way:

$Y_{it} = \beta_0 + b_{0i} + \beta_1 x_{1it} + b_{1i} x_{1it} + \beta_2 x_{2it} + b_{2i} x_{2it} + \varepsilon_{it}$

where $Y_{it}$ is the outcome for individual $i$ at measurement occasion $t$, $x_{1it}$ and $x_{2it}$ are individual $i$s covariates at occasion $t$. $\beta_0, \beta_1$ and $\beta_2$ are the (fixed) intercepts and slopes, and the random effects are the $b_{0i}, b_{1i}$ and $b_{2i}$, which are assumed to be multivariate normal and independent of the error term $\varepsilon_{it}$ as well as the covariates.

Now, if the only thing that makes the observations correlated over time, is that some individuals have a higher starting point than others (higher intercept) then a random intercept model is sufficient. However, if some individuals respond to the covariates differently, we also need the random slopes. So the choice of which random effects you include in your model depends on your assumptions about the data. That said, there is an argument for specifying a maximal amount of random effects in your model in either way (Barr et al., 2013) since the cost of having too many random effects is often times lower than the cost of missing one.

As for your specific example, I think you are modelling the random effects in a strange way. For instance, assuming that the income variable is a continuous variable in your data, then having a random intercept for income doesn't make much sense, just has having a dummy variable for every level of income wouldn't make sense either. If what causes the dependency in your data is that the observations come from the same state and region, then a better way to specify the random effects in lmer (from the lme4) package in R, is something like this for the random intercept model:

lmer(dependent.var ~ female + race + age + edu + income + percent.dem.vote + state.avg.income + (1 | region) + (1 | state))

If you want random slopes too, just add them in a similar fashion. Here's an example with random slopes for age and edu, on the region level, added to the model:

lmer(dependent.var ~ female + race + age + edu + income + percent.dem.vote + state.avg.income + (1 + edu + income| region) + (1 | state))

References

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language, 68(3), 255-278.