r – Correct Specification of Longitudinal Model in lme4 for R

multilevel-analysispanel datar

I am trying to fit a multilevel longitudinal model and i have a question regarding how to specify it.

The data consist of about 8k observations collected from about 3k individuals at four time points. Individuals are nested in groups and there are about 200 groups.
I have two different types of fixed effects: (a) repeated measures at the observation level (e.g. pred1.obs ), and (b) group level predictors that also change over time (e.g. pred2.grp).
Because each group level fixed effect is also longitudinal there are 800 values (4×200 which are repeated for each member of the group at that time) but there are only 200 groups.

My question is what would be the correct specification for this model and why?
e.g:

1: lmer(outcome ~ time + pred1.obs + pred2.grp + (time|id) + (time|grp))

2: lmer(outcome ~ time + pred1.obs + pred2.grp + (time|id) + (1|grp:time))

3: lmer(outcome ~ time + pred1.obs + pred2.grp + (time|id) + (time|grp) + (1|grp:time))

Thus, would lme4 correctly estimate the model if i use (time|grp) or do i need to use (1|grp:time) or the combination?

Or something else that i haven't thought of?

Many thanks,
George

Best Answer

I'll imagine a concrete example, with more context, to make things easy. Assume you measure the score on test of 3k students of 200 schools and you measured each student at 4 time points (say, at each quarter). You have a covariate at student level that doesn't vary by time (like sex), that you called pred1.obs and a covariate by school that vary by time (say the number of meetings between teachers and parents until that moment in time). If this example resembles your study, than I think you have to set up a three level model (individual level, group level and time level for the groups): i = 1 ... 3000 individuals t = 1... 4 periods g = 1... 200 groups

The model would be:

y_i ~ N(a + b_[groups_g] + b.ind*pred.obs1_i, sigma^2) # 1st level
b_g = N(gamma + gamma_[time] + gamma.g[time_t]*pred2.grp, sigma.b^2) # 2nd level
gamma.g_t = N(0, sigma.gamma^2) # 3rd level

Note that you would have the slope at the second level (group level) varying by time, which makes sense, since you expect that the effect of schools on the perfomance of students may vary by time, depending of the valu of the covariate at the level of schools. I'm not that sure how to estimate this with lmer (I know how to estimate a Bayesian model using WinBugs or Jags, calling them by R). In any case, here is my suggestion.

In lme4, I'd try: First, expand pred2.grp (the covariate at group level that vary by time) to the individual level, then you would have repetead measures by individuals at the group and time level. Then:

lmer(outcome ~ pred1.obs + pred2.grp + (1|group))
Related Question