Solved – How to correctly model repeated-measures random effects in a linear mixed effects model

lme4-nlmemixed modelr

I have a mixed design data set where participants respond to each of three interventions and also report various demographics. The intervention is thus repeated-measures and each demographic measure is between-subjects.

I want to model this as a linear mixed effects model with random slopes and intercepts, but I can't figure out what the correct way to express that using the lme function is.

If we call the participant variable "n", the intervention "int", the dependent variable "effect", and use "sex" and "age" (categorical) as two of the demographic variables, then using R's lme function, I'm thinking one of the following should be correct:

model <- lme(effect ~ int* sex* age, random=~int|n, data=data, method="ML")

model <- lme(effect ~ int* sex* age, random=~int|n/int, data=data, method="ML")

I haven't been able to work out when to structure the random effects as nested and whether it makes any sense to have intervention on both sides of the bar as both a term in the random model and a grouping variable.

Best Answer

@Roland already answered your question in his comment, so my answer is likely redundant.

From what you describe about your study design, you have a single grouping variable: subject (or n per your notation). For each subject, you have multiple measurements of the response variable effect. If this variable can be assumed to be continuous, then you can indeed model it using a linear mixed effects model. Otherwise, you may need to use a generalized linear mixed effects model (e.g., Poisson mixed effects model for a count response variable).

Your intervention variable (int in your notation) is a categorical variable with 3 levels - presumably, these are the only levels you are interested in for your study, which justifies including this variable as a predictor variable in the fixed effects portion of your model, which is allowed to interact with the other two predictors (namely, sex and age): int * sex * age.

The only situation in which you would have treated int as a nested grouping factor - as in ~1|n/int - would have been if:

The three interventions in your study were a representative subset of a larger set of interventions which could not all be included in your study, and
The interventions assigned to any given subject were specific to that subject and were not used for any other subject. (If the same subset of interventions was used for all subjects, the grouping factors subject and intervention would be fully crossed.)

Even if conditions (1) and (2) listed above were satisfied for your study, you couldn't possibly have a syntax such as ~int|n/int in the random effects portion of your model. The correct syntax to have would be as ~variable|n/int where variable is such that its values change from one intervention to another within each subject.

One way to think of a grouping variable is as a 'container' for repeated observations of a response variable. In your case, the 'container' is the subject. Those repeated observations are 'grouped together' in the same container.

In addition to the response variable, you'll also have various predictor variables which you wish to relate to the response variable. The values of these predictor variables can be (i) the same for all grouped observations in the 'container' (e.g., sex) or (ii) different for different grouped observations in the 'container' (e.g., int).

If the values of a predictor variable are the same within a container, that predictor variable cannot appear in the random effects portion of your model; it can only appear in the fixed effects portion. For example, sex can only appear in the fixed effects portion of your model:

lme(effect ~ int*sex*age,             
    random=~int|n, 
    data=data, 
    method="ML")

If the values of a predictor variable are different within a container, then the predictor variable can appear both in the fixed effects portion of the model as well as in the random effects portion. For example, int can appear only in the fixed effects portion of your model:

 lme(effect ~ int*sex*age,             
    random=~1|n, 
    data=data, 
    method="ML")

or in both the fixed effects and the random effects portions of your model:

 lme(effect ~ int*sex*age,             
     random=~ 1 + int|n, 
     data=data, 
     method="ML").

A variable like sex is a between-container predictor variable. Since "container" is the same as subject, the proper terminology for sex in your study is a between-subject predictor variable. This terminology indicates that the values of sex change across subjects, but not within subjects as the intervention changes.

A variable like int is a within-container predictor variable (where "container" is subject). In other words, int is a within-subject variable, whose values change within a subject.

I concur with Roland that the first model you listed seems reasonable, though you might want to test if you can simplify the fixed effects portion of the model.

Related Solutions

Solved – Mixed-effect model in R using lme for data count data with two fixed effects and repeated measures

I will quickly address the general use of aov. When using aov in R, type I sum of squares are used. These are sequential, which means the order of variables will affect the results if the design is unbalanced (see here: http://goanna.cs.rmit.edu.au/~fscholer/anova.php). Type III sum of squares are sometimes preferred when there is an interaction and type II when there is not a significant interaction. This can be done in the car package with the function Anova (notice the capital A). This may be why your anova results did not make sense.

Now to address the question about mixed effect models. I would first recommend lme4, as I think the formula specification is easier to understand. For instance, the random effect would be + (1|animal/time/treatment). In regards to the degrees of freedom, it is not necessarily the case that your model is wrong. Douglas Bates, the author of lme4, has wrote extensively about the difficulties in calculating degrees of freedom in mixed models (https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html). This has also been discussed on this site (getting degrees of freedom from lmer). Because of this, the lme4 package does not provide p-values and, in order to calculate a p-value, extra steps are necessary such as sampling from the posterior. I am not sure if nlme is still being maintained, but it wouldn't hurt to email the authors.

In the event that the model is right, the tricky part will be interpreting the estimates (Interpreting the regression output from a mixed model when interactions between categorical variables are included). The reference category (i.e, the intercept) is going to be the first level of each factor. From what you have provided, this would be the first time point (I assume time is categorical because random effects are always factors), treatment = CON, and genotype = M. The p-value that is significant, for instance, is comparing time to this reference category. The question is whether this is a meaningful comparison? Using a package for Bayesian multilevel models, for instance brms or rstanarm (http://www.r-bloggers.com/r-users-will-now-inevitably-become-bayesians/), you could add posterior estimates together and use simple subtraction to obtain contrasts at each level of the factors.

This might not have been much help towards your initial question, but specification of random effects will generally change the estimate little unless there is great variation between levels of a random effect. Additionally random effects are not always straight forward (Minimum number of levels for a random effects factor?) or easy to define (What is the difference between fixed effect, random effect and mixed effect models?). If you still cannot get an answer to your question about random effects, you can try a sensitivity analysis. For instance, animal ID should be included as a random effect but the others are open to debate. You could check whether the estimates (eg, coefficients and confidence intervals) change drastically by only nesting some of the variables. If they do not, this would provide confidence in your model and you could mention the potential problem with the random effects in the discussion of your paper. For a more rigorous approach, you could use a likelihood ratio test comparing models that differ in regards to random effects (Likelihood ratio tests on linear mixed effect models). You can even use this test to determine whether time is significant. For instance, compare models that differ only in the inclusion of time.

Another option would be to use a gee, generalized estimating equation (r packages: gee & geepack), which might be appropriate here because the correlations between outcomes do not need to be correctly specified. The method is robust to "unknown" correlations. This is also ideal when samples are small (see here: http://epm.sagepub.com/content/76/1/64.short; https://en.wikipedia.org/wiki/Generalized_estimating_equation).

In regards to using different distributions, you could try glmer in the lme4 package with a negative binomial or Poisson distribution. The assumptions of a Poisson distribution are often violated (variance and mean must be close to equal). When there is over dispersion (variance is larger than the mean), the negative binomial distribution is preferred. Since you have 20 potential yes/no's, you should include the number of times possible as an offset which would model the counts as rates.

I hope this information can be of use for the manuscript!

Solved – Random effects in repeated-measures design using lme

It appears that you have a case of a partially crossed, partially nested design, because if I understand correctly, day and cond are crossed (ie neither are nested in the other), while both appear to be nested within subject. measurement is an id variable that indexes the measurement occasion on each day and within each condition, and as such should not be treated as a random factor because there is only one observation of the dependent variable for each measurement occasion. Even though they are indexed as 1-4 for each day/condition, they are different measurements (that is, measurement 1 for day 1 condition 0 and measurement 1 for day 1 condition 1 are not the same measurement) and therefore there can be no random variation in it. If you specified it as random in the way you have coded the data above, it would be a mistake.

If this is the case, then lme is unable to fit such a model, and you could use something like lme4 instead. You could specify the structure in lme4 as follows:

DV ~ 1 + (1|subject) + (1|day) + (1|cond) + (1|subject:day) + (1|subject:cond)

If measurement is a measurement of time within each day or cond and you expect some temporal effect, then you could include measurement as a fixed effect (and also potentially fit random slopes, if the data supported such a model)

However, fitting a model with random intercepts for day and cond would not be a good idea because you have only 2 of each, so you would be asking the software to estimate a variance for a normally distributed variable having only 2 observations, which does not make any sense. So a better way forward is to treat day and cond as fixed effects, and simply fit random intercepts for subject:

DV ~ day + cond + (1|subject)

The fact that day and cond were randomly assigned is not relevant.

The same comment as above applies for measurement again here. That is, you might want to fit

DV ~ day + cond + measurement + (1|subject)

and again, you could also have random slopes for day and/or cond and/or measurement if suggested by the domain theory and supported by the data.

Of course, now that we have discarded day and cond as random, you can go back to the nlme package if you wish (athough lme4 is really the successor to nlme for most cases)

Best Answer

Related Solutions

Solved – Mixed-effect model in R using lme for data count data with two fixed effects and repeated measures

Solved – Random effects in repeated-measures design using lme

Related Question