Solved – Testing simultaneous and lagged effects in longitudinal mixed models with time-varying covariates

lme4-nlmemixed modelr

I was recently told that it was not possible to incorporate time-varying covariates in longitudinal mixed models without introducing a time lag for these covariates. Can you confirm / deny this? Do you have any references on this situation ?

I propose a simple situation to clarify. Suppose that I have repeated measures (say over 30 occasions) of quantitative variables (y, x1, x2, x3) in 40 subjects. Each variable is measured 30 times in each subject by a questionnaire. Here the final data would be 4 800 observations (4 variables X 30 occasions X 40 subjects) nested in 40 subjects.

I would like to test separately (not for model comparison) for :

simultaneous (synchronous) effects : the influence of x1, x2, and x3 at time t on y at time t.
lagged effects : the influence of x1, x2, and x3 at time t-1 on y at time t.

I hope everything is clear (I'm not a native English speaker !).

For instance, in R lmer{lme4}, the formula with lagged-effects is :

lmer(y ~ lag1.x1 + lag1.x2 + lag1.x3 + (1|subject))

where y is the dependent variable at time t, lag1.x1 is the lagged independent variable x1 at the individual level, etc.

For simultaneous effects, the formula is :

lmer(y ~ x1 + x2 + x3 + (1|subject))

Everything is running well and it gives me interesting results. But is it correct to specify a lmer model with synchronous time-varying covariates or have I missed something ?

Edit:
Moreover, is it possible to test both simultaneous and lagged effects at the same time ? , For instance :

lmer(y ~ x1 + x2 + x3 + lag1.x1 + lag1.x2 + lag1.x3 + (1|subject))

Theoretically, it makes sense to test competition between concurrent vs. lagged effects. But is it possible with lmer{lme4} in R, for example ?

Best Answer

I know this is probably too late for your benefit, but perhaps for others I will provide an answer.

You can include time-varying covariates in a longitudinal random-effects models (see Applied Longitudinal Analysis by Fitzmaurice, Laird and Ware, 2011 and http://www.ats.ucla.edu/stat/r/examples/alda/ specifically for R – use lme). Interpretation of trends depends on if you code time as categorical or continuous and your interaction terms. So for instance, if time is continuous and your covariates x1 and x2 are binary (0 and 1) and time-dependent, the fixed model is:

$$yij = \beta_0 + \beta_1x_{1ij} + \beta_2x_{2ij} + \beta_3time_{ij} + \beta_4 \times (x_{1ij} * time_{ij}) + \beta_5 \times (x_{2ij} * time_{ij})$$

i is for ith person, j is for jth occasion

$\beta_4$ and $\beta_5$ capture the difference in trends between levels of $x_1$ and $x_2$ while accounting for change over time in $x_1$ and $x_2$. Unless you specify $x_1$ and $x_2$ as random effects, correlations between the repeated measures will not be taken into account (but this needs to be based on theory and can get messy if you have too many random effects - i.e., model won’t converge). There is also some discussion about centering time-dependent covariates to remove bias, although I have not done this (Raudenbush & Bryk, 2002). Interpretation, in general, is also more difficult if you have a continuous time-dependent covariate.

$\beta_1$ and $\beta_2$ capture the cross-sectional association between $x_1$ and $y$ and $x_2$ and $y$ at the intercept ($\beta_0$). The intercept is where time is zero (baseline or wherever you centered your time variable). This interpretation could also be changed if you have a higher order model (e.g., quadratic).

You would code this in R as something like:

model<- lme(y ~ time*x1 + time*x2, data, random= ~time|subject, method="")

Singer and Willet appear to use ML for “method” but I have always been taught to use REML in SAS for overall results but compare the fit of different models using ML. I would imagine you could use REML in R too.

You can also model the correlation structure for y by adding to the previous code:

correlation = [you’ll have to look up the options]

I am not sure I understand your reasoning for only being able to test lagged effects. I am not familiar with modeling lagged effects so I can’t really speak to that here. Perhaps I am wrong, but I would imagine that modeling lagged effects would undermine the usefulness of mixed models (e.g., being able to include subjects with missing time-dependent data)

Related Solutions

Solved – Nested mixed effects with lme4

I would say

response ~ brightness+duration+(duration|subject)

would probably be a little better. (The simpler (1|duration:subject) model is not necessarily wrong, but might be oversimplified. If I were a peer reviewer of this work I would certainly ask for a justification of the simpler model ...) The (duration|subject) model is a "random-slopes" model, more or less (although if you have coded duration as a categorical (factor or ordered factor) variable the thing that varies randomly among subjects is not a slope per se, but a between-duration difference). The specification you have ((1|subject:duration)) assumes all subject-duration effects are drawn from a single (iid) Normal distribution; (duration|subject) assumes that the duration effects for a single individual are drawn from a $3 \times 3$ multivariate Normal distribution.

More precisely: comparing the random effect specification (1|subject:duration) gives the model for the conditional modes/BLUPs of subject $s$ for duration $d$ (or duration effect $d$, depending on how the model is parameterized) $$ b_{sd} \sim \textrm{Normal}(0,\sigma_{sd}^2) $$ whereas (duration|subject) gives

$$ \begin{split} b_{s\cdot} & \sim \textrm{MVN}( \mathbf 0,\Sigma) \\ \Sigma & = \left( \begin{array}{ccc} \sigma^2_1 & \sigma_{12} & \sigma_{13} \\ \sigma_{12} & \sigma^2_{2} & \sigma_{23} \\ \sigma_{13} & \sigma_{23} & \sigma^2_3 \\ \end{array} \right) \end{split} $$ i.e., the different duration levels each have different among-subjects variances, and the among-subject variation in different duration levels is correlated ($\Sigma$ is a general symmetric positive (semi)definite matrix). To get back to the previous model you would need to restrict $\sigma_1^2=\sigma_2^2=\sigma_3^2=\sigma_{sd}^2$ and all of the off-diagonal elements would be zero.

R Software – Using the predict() Function for lmer Mixed Effects Models

It's easy to get confused by the presentation of coefficients when you call coef(fit2). Look at the summary of fit2:

> summary(fit2)
Linear mixed model fit by REML ['lmerMod']
Formula: Recall ~ (1 | Subject/Time) + Caffeine
   Data: data

REML criterion at convergence: 444.5

Scaled residuals: 
 Min       1Q   Median       3Q      Max 
-1.88657 -0.46382 -0.06054  0.31430  2.16244 

Random effects:
 Groups       Name        Variance Std.Dev.
 Time:Subject (Intercept)  558.4   23.63   
 Subject      (Intercept) 2458.0   49.58   
 Residual                  675.0   25.98   
Number of obs: 45, groups:  Time:Subject, 15; Subject, 5

Fixed effects:
Estimate Std. Error t value
(Intercept) 61.91827   25.04930   2.472
Caffeine     0.21163    0.07439   2.845

Correlation of Fixed Effects:
 (Intr)
Caffeine -0.365

There is an overall intercept of 61.92 for the model, with a caffeine coefficient of 0.212. So for caffeine = 95 you predict an average 82.06 recall.

Instead of using coef, use ranef to get the difference of each random-effect intercept from the mean intercept at the next higher level of nesting:

> ranef(fit2)
$`Time:Subject`
         (Intercept)
0:Jason    13.112130
0:Jim      33.046151
0:Ron      -3.197895
0:Tina      8.893985
0:Victor   24.392738
1:Jason    -2.068105
1:Jim      -9.260334
1:Ron      -4.428399
1:Tina      6.515667
1:Victor   17.265589
2:Jason   -18.203436
2:Jim     -19.835771
2:Ron      -3.473053
2:Tina    -17.180791
2:Victor  -25.578477

$Subject
       (Intercept)
Jason   -31.513915
Jim      17.387103
Ron     -48.856516
Tina     -7.796104
Victor   70.779432

The values for Jim at Time=0 will differ from that average value of 82.06 by the sum of both his Subject and his Time:Subject coefficients:

$$82.06+17.39+33.04=132.49$$

which I think is within rounding error of 132.46.

The intercept values returned by coef seem to represent the overall intercept plus the Subject or Time:Subject specific differences, so it's harder to work with those; if you tried to do the above calculation with the coef values you would be double-counting the overall intercept.

Best Answer

Related Solutions

Solved – Nested mixed effects with lme4

R Software – Using the predict() Function for lmer Mixed Effects Models

Related Question