Solved – How to correctly model repeated-measures random effects in a linear mixed effects model

lme4-nlmemixed modelr

I have a mixed design data set where participants respond to each of three interventions and also report various demographics. The intervention is thus repeated-measures and each demographic measure is between-subjects.

I want to model this as a linear mixed effects model with random slopes and intercepts, but I can't figure out what the correct way to express that using the lme function is.

If we call the participant variable "n", the intervention "int", the dependent variable "effect", and use "sex" and "age" (categorical) as two of the demographic variables, then using R's lme function, I'm thinking one of the following should be correct:

model <- lme(effect ~ int* sex* age, random=~int|n, data=data, method="ML")

model <- lme(effect ~ int* sex* age, random=~int|n/int, data=data, method="ML")

I haven't been able to work out when to structure the random effects as nested and whether it makes any sense to have intervention on both sides of the bar as both a term in the random model and a grouping variable.

Best Answer

@Roland already answered your question in his comment, so my answer is likely redundant.

From what you describe about your study design, you have a single grouping variable: subject (or n per your notation). For each subject, you have multiple measurements of the response variable effect. If this variable can be assumed to be continuous, then you can indeed model it using a linear mixed effects model. Otherwise, you may need to use a generalized linear mixed effects model (e.g., Poisson mixed effects model for a count response variable).

Your intervention variable (int in your notation) is a categorical variable with 3 levels - presumably, these are the only levels you are interested in for your study, which justifies including this variable as a predictor variable in the fixed effects portion of your model, which is allowed to interact with the other two predictors (namely, sex and age): int * sex * age.

The only situation in which you would have treated int as a nested grouping factor - as in ~1|n/int - would have been if:

  1. The three interventions in your study were a representative subset of a larger set of interventions which could not all be included in your study, and
  2. The interventions assigned to any given subject were specific to that subject and were not used for any other subject. (If the same subset of interventions was used for all subjects, the grouping factors subject and intervention would be fully crossed.)

Even if conditions (1) and (2) listed above were satisfied for your study, you couldn't possibly have a syntax such as ~int|n/int in the random effects portion of your model. The correct syntax to have would be as ~variable|n/int where variable is such that its values change from one intervention to another within each subject.

One way to think of a grouping variable is as a 'container' for repeated observations of a response variable. In your case, the 'container' is the subject. Those repeated observations are 'grouped together' in the same container.

In addition to the response variable, you'll also have various predictor variables which you wish to relate to the response variable. The values of these predictor variables can be (i) the same for all grouped observations in the 'container' (e.g., sex) or (ii) different for different grouped observations in the 'container' (e.g., int).

If the values of a predictor variable are the same within a container, that predictor variable cannot appear in the random effects portion of your model; it can only appear in the fixed effects portion. For example, sex can only appear in the fixed effects portion of your model:

lme(effect ~ int*sex*age,             
    random=~int|n, 
    data=data, 
    method="ML")

If the values of a predictor variable are different within a container, then the predictor variable can appear both in the fixed effects portion of the model as well as in the random effects portion. For example, int can appear only in the fixed effects portion of your model:

 lme(effect ~ int*sex*age,             
    random=~1|n, 
    data=data, 
    method="ML")

or in both the fixed effects and the random effects portions of your model:

 lme(effect ~ int*sex*age,             
     random=~ 1 + int|n, 
     data=data, 
     method="ML").

A variable like sex is a between-container predictor variable. Since "container" is the same as subject, the proper terminology for sex in your study is a between-subject predictor variable. This terminology indicates that the values of sex change across subjects, but not within subjects as the intervention changes.

A variable like int is a within-container predictor variable (where "container" is subject). In other words, int is a within-subject variable, whose values change within a subject.

I concur with Roland that the first model you listed seems reasonable, though you might want to test if you can simplify the fixed effects portion of the model.

Related Question