Solved – Time varying covariates in longitudinal mixed effect models

lme4-nlmemixed modelpanel datastatatime-varying-covariate

I am looking for some help with my analysis of longitudinal data with time-varying covariates. I am planning to use R and the lme4 package. However, I am happy to use Stata also. I am interested in looking at the relationship between cognition and taking ACE inhibitors in longitudinal data

The example dataset is below:

df = data.frame(cognition = rnorm(200)+2,
                wave = rep(c("W1", "W2", "W3", "W4"), each = 50) ,
                hypertension = c(rep(c( "Y","N", "N", "N", "Y", "N", "N", "N", "N", "N"), 5),  
                                                rep(c( "Y","Y", "N", "N", "Y", "N", "N", "N", "N", "N"), 5) ,
                                                rep(c( "Y","Y", "N", "N", "Y", "N", "Y", "Y", "N", "N"), 5) ,
                                                rep(c( "Y","Y", "N", "N", "Y", "N", "Y", "Y", "Y", "N"), 5) ) 

                ,
                diabetes = c(rep(c( "N","N", "N", "N", "Y", "N", "N", "N", "Y", "N"), 5),  
                                 rep(c( "Y","N", "N", "N", "Y", "N", "N", "N", "Y", "N"), 5) ,
                                 rep(c( "Y","N", "N", "N", "Y", "N", "Y", "Y", "Y", "N"), 5) ,
                                 rep(c( "Y","Y", "N", "N", "Y", "N", "Y", "Y", "Y", "Y"), 5) ) 

                ,
                Smoking = c(rep(c( "Current","Never", "Never", "Former", "Current", "Former", "Former", "Former", "Never", "Current"), 5),  
                            rep(c( "Current","Never", "Never", "Former", "Current", "Former", "Former", "Former", "Never", "Former"), 5) ,
                            rep(c( "Former","Never", "Never", "Former", "Current", "Current", "Former", "Former", "Never", "Former"), 5) ,
                            rep(c( "Former","Never", "Never", "Current", "Current", "Current", "Former", "Former", "Never", "Former"), 5) ) 

                ,
                TakingACEinh =   c(rep(c( "Y","N", "N", "N", "N", "N", "N", "N", "N", "N"), 5),  
                                   rep(c( "Y","Y", "N", "N", "N", "N", "N", "N", "N", "N"), 5) ,
                                   rep(c( "Y","Y", "N", "N", "Y", "N", "Y", "Y", "N", "N"), 5) ,
                                   rep(c( "Y","N", "N", "N", "Y", "N", "Y", "Y", "Y", "N"), 5) ) ,
                id = rep(1:50, 4)
                )

Hypertension is the diagnosis of hypertension at each wave (timepoint) – once a person has been diagnosed they cannot go back to being non-hypertensive, the same is true for the variable diabetes. However, there are variables such as smoking that can differ and change over the different waves. Also Taking ACE inhibitors: someone can take this drug in one wave but then in others, they might not. How do I model these variables in my mixed effect model?

I was thinking of two approaches:
1) Keep the data as is and use lme4 but still not sure which is the correct model

library(lme4)
lmer(cognition ~ factor(wave) + hypertension + Smoking + diabetes + TakingACEinh + (1|id), data = df)
lmer(cognition ~ factor(wave) + hypertension + Smoking + diabetes + TakingACEinh + (1|id) + (1+TakingACEinh|Wave), data = df)

2) Recode the variable hypertension to indicate if a person is 0 non hypertensive, 1 = newly hypertensive, 2 = previous and currently hypertensive and perform the models again using the code above

If anyone has any suggestions on how to model and analyse this type of data please let me know and thanks for your help.

Best Answer

Dealing with time-varying covariates in mixed models but also in general is a challenging task. A few points to consider:

I would differentiate between time-varying covariates, such as smoking, and intermediate events, such as hypertension in your example.
For time-varying covariates you need first to consider if they are endogenous or exogenous. Loosely speaking, a time-varying covariate is exogenous if its current value at time, say $t$ is only associated with its previous values at times points $0 \leq s < t$, but it is not further associated with previous values of the outcome at these previous time points. The covariate will be endogenous if this is not the case. Endogenous covariates are in general more difficult to handle, and require specialized models, such as, joint models or marginal structured models.
An additional challenge with time-varying covariates is the functional form. That is, if you just include smoking as a time-varying covariate in your mixed model, then you have a type of cross-sectional relationship, namely, you say that the cognition at time $t$ is only associated with smoking at the same time point $t$. But it could be that the cognition at $t$ is also associated with smoking at previous time points. For example, cognition at $t$ depends not only on whether you smoke at time $t$ but rather on how much you have smoked up to $t$. In this case, you will need to construct a new time-varying covariate which is the cumulative smoking.
For intermediate events you also have similar considerations with endogeneity. But instead of including such an event just as a covariate in the model, it would be perhaps more logical to assume that it interacts with time, i.e., that after the intermediate event occurred you perhaps have a changed in the slope of cognition.

Related Solutions

Solved – Testing simultaneous and lagged effects in longitudinal mixed models with time-varying covariates

I know this is probably too late for your benefit, but perhaps for others I will provide an answer.

You can include time-varying covariates in a longitudinal random-effects models (see Applied Longitudinal Analysis by Fitzmaurice, Laird and Ware, 2011 and http://www.ats.ucla.edu/stat/r/examples/alda/ specifically for R – use lme). Interpretation of trends depends on if you code time as categorical or continuous and your interaction terms. So for instance, if time is continuous and your covariates x1 and x2 are binary (0 and 1) and time-dependent, the fixed model is:

$$yij = \beta_0 + \beta_1x_{1ij} + \beta_2x_{2ij} + \beta_3time_{ij} + \beta_4 \times (x_{1ij} * time_{ij}) + \beta_5 \times (x_{2ij} * time_{ij})$$

i is for ith person, j is for jth occasion

$\beta_4$ and $\beta_5$ capture the difference in trends between levels of $x_1$ and $x_2$ while accounting for change over time in $x_1$ and $x_2$. Unless you specify $x_1$ and $x_2$ as random effects, correlations between the repeated measures will not be taken into account (but this needs to be based on theory and can get messy if you have too many random effects - i.e., model won’t converge). There is also some discussion about centering time-dependent covariates to remove bias, although I have not done this (Raudenbush & Bryk, 2002). Interpretation, in general, is also more difficult if you have a continuous time-dependent covariate.

$\beta_1$ and $\beta_2$ capture the cross-sectional association between $x_1$ and $y$ and $x_2$ and $y$ at the intercept ($\beta_0$). The intercept is where time is zero (baseline or wherever you centered your time variable). This interpretation could also be changed if you have a higher order model (e.g., quadratic).

You would code this in R as something like:

model<- lme(y ~ time*x1 + time*x2, data, random= ~time|subject, method="")

Singer and Willet appear to use ML for “method” but I have always been taught to use REML in SAS for overall results but compare the fit of different models using ML. I would imagine you could use REML in R too.

You can also model the correlation structure for y by adding to the previous code:

correlation = [you’ll have to look up the options]

I am not sure I understand your reasoning for only being able to test lagged effects. I am not familiar with modeling lagged effects so I can’t really speak to that here. Perhaps I am wrong, but I would imagine that modeling lagged effects would undermine the usefulness of mixed models (e.g., being able to include subjects with missing time-dependent data)

Longitudinal Item Response – How to Apply Longitudinal Item Response Theory Models in R

As a precursor, the IRT approach to this problem is very demanding computationally due to the higher dimensionality. It may be worthwhile to look into structural equation modeling (SEM) alternatives using the WLSMV estimator for ordinal data since I imagine less issues will exist. Plus, including external covariates is much easier within that framework. Both approaches I describe here are also possible in SEM.

There are two ways that I know of which you can estimate unidimensional longitudinal IRT models that are not Rasch in nature. The the first approach requires a unique latent factor for each time block and a specific residual variation term for each item. A different approach, similar to what one would find in the SEM literature, is via a latent growth curve model whereby only a fixed number of factors are estimated (three if the relationship over time is believed to be linear). Fixed loadings are used in this approach, so computationally it may be much more stable due to the reduced number of estimated parameters, so I would tend to prefer the growth curve model for both the smaller dimensionality and fewer estimated parameters.

The idea for both approaches is to set up latent time factors indicating how person level $\theta$ values change over each test administration, and constrain the influence of their loadings across time as well so that their hyper parameters can be estimated (i.e., the latent mean and covariances). Item constraints must also be imposed across time to remain invariable so that the person differences are only captured in the hyper parameters. Since this approach can require a huge number of integration dimensions, so you'll need to use something like the dimensional reduction algorithm which is available in mirt under the bfactor() function.

Instead of going through a worked example here, which would take a lot of code, I'll simply point to a worked versions of these analyses. A word of warning though, these are very computationally demanding and may take more than an hour to converge on your computer since you have 4 dimensions of integration in the first case and 3 dimensions in the second. Or, if you don't have much RAM you could experience issues when increase the number of quadpts.

Data simulation script: https://github.com/philchalmers/mirt/blob/gh-pages/data-scripts/Longitudinal-IRT.R

Analysis output: http://philchalmers.github.io/mirt/html/Longitudinal-IRT.html

In the first example, if you save the factor scores by using fscores() you'll obtain estimates for each time point regarding how individual $\theta$ values are changing. In the second example, using the linear growth curve approach, the first column of the factor scores will represent the initial $\theta$ estimates while the second column will indicate the slope/change occurring on average over time. In the example, I set up a constant mean change of .5, so the values in fscores() should all be around 0.5 for each individual. Both analyses give roughly the same conclusions but are somewhat different approaches to the problem. However, if you are familiar with longitudinal models in SEM then these should be fairly natural to interpret.

Best Answer

Related Solutions

Solved – Testing simultaneous and lagged effects in longitudinal mixed models with time-varying covariates

Longitudinal Item Response – How to Apply Longitudinal Item Response Theory Models in R

Related Question