Solved – Varying group coefficients in lme4

lme4-nlmemultilevel-analysis

All,

I am estimating a multilevel logistic regression with group predictors, but am unclear about some of the advice given by Gelman and Hill (2007) in their book. Therein, they recommend allowing every coefficient to possibly vary, given a large enough N. Does that include group predictors as well? They weren't clear, treating "varying slope" as just another complexity you can incorporate into a mixed effects model in lme4 along with group predictors (see: p. 549 in their book).

For example, I have roughly 50,000 observations with a binary response (plenty large N). Predictors exist at two levels, such that my model looks like:

M1 <- lmer(Y ~ X1 + X2 + X3 + X4 + G1 + G2 + G3 + G4 + (1 | group), family=binomial(link="logit"))

X1:X4 are individual-level predictors and G1:G4 are group-level predictors, thus: a multilevel model. Does their recommendation of treating all coefficients as potentially variable mean including even the group predictors within the random effect, such that:

M2 <- lmer(Y ~ X1 + X2 + X3 + X4 + G1 + G2 + G3 + G4 + (1 + X1 + X2 + X3 + X4 + G1 + G2 + G3 + G4 | group), family=binomial(link="logit"))

I ran M2 and it gave sensible estimates. AIC/BIC suggest much better fit than M1. I'm just unsure if it's appropriate since, unlike individual-level predictors, the group-level predictors are not going to vary in a given group. It will obviously vary across groups, though.

Further, if this is not an incorrect way to approach it, how suspicious should I be if one of the group predictors of interest is statistically insignificant as a stand-alone fixed effect (varying intercept model like M1), but is significant as a fixed effect in a varying slope model like M2?

Thanks for any input and feedback on this topic. I really appreciate it.

Best Answer

First of all, AIC/BIC do not make sense in mixed models. I mean, if you can explain what your $n$ that goes into your BIC is (number of groups? number of observations? something in between? how about level 1 and level 2 variables that obviously have different amount information in them?)... So I wouldn't pay any attention to these.

Second, I am surprised your model with the random effects for group level variables was identified at all. Let us think about an extreme case: a binary group level variable in the a model lmer(Y ~ X + G + (1 + X + G| group). What is it that it describes? That a group has an additional random shift when G==1, i.e., group-level heteroskedasticity. So that appears to be something rather odd to estimate.

So all in all, I would run this as

M2 <- lmer(Y ~ X1 + X2 + X3 + X4 + G1 + G2 + G3 + G4 + (1 + X1 + X2 + X3 + X4 | group), family=binomial(link="logit"))

i.e., only with an individual level covariates having random effects assigned to them.

Related Solutions

r – Correct Specification of Longitudinal Model in lme4 for R

I'll imagine a concrete example, with more context, to make things easy. Assume you measure the score on test of 3k students of 200 schools and you measured each student at 4 time points (say, at each quarter). You have a covariate at student level that doesn't vary by time (like sex), that you called pred1.obs and a covariate by school that vary by time (say the number of meetings between teachers and parents until that moment in time). If this example resembles your study, than I think you have to set up a three level model (individual level, group level and time level for the groups): i = 1 ... 3000 individuals t = 1... 4 periods g = 1... 200 groups

The model would be:

y_i ~ N(a + b_[groups_g] + b.ind*pred.obs1_i, sigma^2) # 1st level
b_g = N(gamma + gamma_[time] + gamma.g[time_t]*pred2.grp, sigma.b^2) # 2nd level
gamma.g_t = N(0, sigma.gamma^2) # 3rd level

Note that you would have the slope at the second level (group level) varying by time, which makes sense, since you expect that the effect of schools on the perfomance of students may vary by time, depending of the valu of the covariate at the level of schools. I'm not that sure how to estimate this with lmer (I know how to estimate a Bayesian model using WinBugs or Jags, calling them by R). In any case, here is my suggestion.

In lme4, I'd try: First, expand pred2.grp (the covariate at group level that vary by time) to the individual level, then you would have repetead measures by individuals at the group and time level. Then:

lmer(outcome ~ pred1.obs + pred2.grp + (1|group))

Solved – Time varying predictors at higher aggregation levels in multilevel survival analysis

I think I found a solution. I read two book chapters about multilevel event history models (Courgeau, 2007; Goldstein, 2011), which discuss similar cases and suggest using a three-level structure such as time (level-1) nested within households (level-2), which are in turn nested within municipalities (level-3). Goldstein (2011, p. 221) explicitly states for this structure that “The exploratory variables can be defined at any level. They may also vary over time, allowing so-called time varying covariates.”

So here is a quick explanation why I think that such a three-level model is able to correctly incorporate time-varying predictors at the municipality-level (level-3), such as the environmental variable “Env1”. Because Env1 varies across time, the model automatically treats it as a level-1 variable. It does not know that at each time step (e.g., year 1990), the values for Env1 are the same for all households located in a particular municipality. However, I don’t think that this biases the standard errors for the Env1 variable because I have household random effects (level-2) included in the model, which estimate a separate intercept for each household. Moreover, I also include an additional variance component at level-2 that allows the slope of Env1 to vary randomly across households. In this way the effect of Env1 is uniquely computed for each household.

References:

Courgeau, D. (2007). Multilevel synthesis: From the group to the individual. Dordrecht, The Netherlands: Springer.

Goldstein, H. (2011). Multilevel statistical models (4th ed.). Chichester, U.K.: John Wiley & Sons.

Best Answer

Related Solutions

r – Correct Specification of Longitudinal Model in lme4 for R

Solved – Time varying predictors at higher aggregation levels in multilevel survival analysis

Related Question