R Mixed Model – Using lme4 in R for Multiple Response Data Analysis

lme4-nlmemanovamixed modelrrandom-effects-model

I have some questions about how to translate my data structure and research Qs to syntax for the lmer function (lme4 package).

I am looking to predict teenagers scores on a mental health questionnaire (0-10) from 3 variables: age (continuous), sex (M/F) and my variable of interest, let's call it X for simplicity (continuous). I demeaned both my continuous predicted variables. Both the teenager and their parent filled out the questionnaire about the teenager, adding a third (within-subject) variable rater. I restructured the dataset into long format such that each subject has 2 rows for each outcome value: parent-report & self-report.

Number of subjects ~85
Number of outcome observations ~170

I am interested in the following effects:

The fixed effect of X on outcome score (main interest)
The interaction between X and sex on the outcome ("does X affect the outcome in one sex but not the other?")
The fixed effect of sex on scores
The fixed effect of age on scores

But I would also like to know whether the effects above are dependant on who is the rater? In this sense, rater is not a nuisance grouping variable whose effect on the outcome I want to account for. I would like to perform a test similar to a MANOVA but given that some subjects are missing some observations, I would prefer to use mixed models. As I understand it, linear mixed models can be used for multiple outcome data but I do not know how to phrase the syntax such that:

I declare non-independant observations within subjects (rater falls within subject)
I do not have a random slope for every subject (I have a relatively small sample size)

Using some specific examples, I'd like to know which (if any!) of the following capture my needs…

m1 <- lmer(score ~ X*sex+ age + (1+rater), data = mydata ) 

m2 <- lmer(score ~ X*sex + age + (1|rater), data = mydata )  
     # same as m1?

m3 <- lmer(score ~ X*sex + age + (1|ID/rater), data = mydata ) 
     # Error: number of levels of each grouping factor must be < number of   observations 
     # An issue related to missing data??

Any help (for any part of the above) is appreciated!

Best Answer

Using some specific examples, I'd like to know which (if any!) of the following capture my needs...

m1 <- lmer(score ~ X*sex+ age + (1+rater), data = mydata ) 

m2 <- lmer(score ~ X*sex + age + (1|rater), data = mydata )  
     # same as m1?

m3 <- lmer(score ~ X*sex + age + (1|ID/rater), data = mydata ) 
     # Error: number of levels of each grouping factor must be < number of   observations 
 # An issue related to missing data??

Model m1 does not contain random effects. It should produce an error.

Model m2 fits fixed effects for X, sex,age and the X:sex interaction. This will not provide an answer the question of "whether the effects above are dependant on who is the rater". This also fits random intercepts for rater, but, since rater appears to be a binary indicator, this does not make much sense, as the software would be estimating a variance for a normally distributed variable from only 2 observations.

Model m3 fits the same fixed effects as m2 so also does not provide answer the question of "whether the effects above are dependant on who is the rater". The random effects term (1|ID/rater) is equivalent to (1|ID) + (1|ID:rater). For the first part there are 85 levels, and for the 2nd part there are 170 levels, the sum of which is 255 and since there are only 170 total observations, this produces the error above.

The model :

m4 <- lmer(score ~ X*sex*rater + age*rater + (1 | ID))

given in the @ErikRuzek solves most of the problems. However, this will provide only fixed effects for age, sex and rater (ie for when it is 0 or 1) along with fixed effects for the interactions age:rater, sex:rater X:sex and sex:rater and X: sex:rater, but will not allow any of these to vary by ID. Regarding the comment to Erik's answer about whether it "capture the fact that rater is a within-subject variable that is nested within subject", yes it does, by fitting fixed effects for rater. If you wanted to allow all of these to vary by ID then you would want to fit random slopes. The maximal model would be

m5 <- lmer(score ~ X*sex*rater + age*rater + (1 + X*sex*rater + age*rater| ID))

However, it is very unlikely that the data would support such a complex random structure. You might instead try something like:

m6 <- lmer(score ~ X*sex*rater + age*rater + (1 + rater + sex:rater + X:sex:rater + age:rater| ID))

although again, I would expect this to be overfitted, so simplifying again to something like:

m6 <- lmer(score ~ X*sex*rater + age*rater + (1 + rater | ID))

If this, too, will not converge, or results in a singular fit, then you would need to stick with model m4. If it does converge and is not singular, then you could try adding the 2-way interactions as random slopes. The choice of random slopes should also be informed by the underlying theory of the data generation process.

Regarding missing values, I would recommend multiple imputation. In addition to Erik's suggestion, I would also suggest looking at the mice package in R.

Related Solutions

Solved – Maximal model for linear mixed-effects model for repeated mesaures design

The maximal structure would need to include also a random effect for the interaction between color and shape, that is:

Y ~ color * shape + (color + shape + color:shape | subject)

This will result in all your predictors (color, shape and their interaction) having a fixed effect (constant for all subjects), and a random effect (individual fluctuations around the estimated fixed effect). In this sense the model is the maximal one. Note that it might not be fully equivalent to a repeated-measures ANOVA as it doesn't make equally strict assumptions on the correlational structure (see Tom's answer).

If you don't include the interaction in the random effect part of the formula, individual variation in the interaction effect will not be considered as "random", and the model will not be equivalent to a repeated-measures ANOVA. Of course, the variance of the random deviates for the interaction (or any other random effect) might be so small that including it in the model do not improve much the fit. You can check this not only with the AIC, but with a likelihood ratio test, as model with vs without one random effect are nested one another. In principle if the likelihood ratio test is not significant, it means that you can safely remove that random effect. Simplifying the random effect structures by removing negligible components would be an example of what in the article you linked is called data-driven approach.

You can simplify the model in this way, and it would still be equivalent to a repeated-measures ANOVA:

Y ~ color*shape + (1|subject) + (0+color|subject) + (0+shape|subject) + (0+color:shape|subject)

This syntax tells lmer to not estimate the correlations of random deviates across subjects. The drawback here is that, for example, you won't be able to tell whether subjects that have a large effect of color tend to have also a larger effect of shape (or smaller effect, in case of negative correlation).

You can easily include a between-subjects predictor, the only difference is that you can't add a random effect for it. "gender" for example cannot have a random effect grouped according to subject, but it can interact with the other fixed effects, e.g.:

Y ~ color * shape * gender + (color + shape + color:shape | subject)

Solved – Nesting random effect within fixed effect using lmer() of lme4 in R

I'm assuming that you have multiple subjects, that each subject gets exactly one treatment, and that each subject gets multiple words and tones (every subject getting all of the words and all of the tones is the cleanest design, but a somewhat unbalanced design will lose power but not mess things up too badly)
There is no way to determine whether the effect of tone varies across words, since each word uses a single tone, but you can tell whether the effect of tone varies across subject, since each subject gets multiple tones
The opposite situation holds for treatments: each word is observed under multiple treatments, but each subject is observed under a single treatment
Since (assuming) there is a single measurement per Subject-Word combination, you don't need/shouldn't use a (1|Subject:Word) term - it will be handled by the residual variance term
Since words are unique, you don't need to code word as explicitly nested within tone - see this answer
Since response time is an intrinsically positive variable you might want to consider log(RT) (for a linear model), or a GLMM with a Gamma response distribution (this depends on your data, though - the conditional distribution might be adequately Normal/homoscedastic)

Thus I think the maximal model (allowing for all interactions that can be estimated from the design) is

RT ~ Treatment*Tone+(Treatment|Word)+(Tone|Subject)

(you can include the implicit intercept terms, e.g. (1+Treatment|Word), if it's clearer for you, but you'll get the same model either way)

Note that there is some controversy about whether you should start with the maximal model, or whether you should try to cut the model down to something that is reasonable given the size of the experiment (in this case, the number of subjects is important - the variance-covariance matrix for tone effects is 4x4 (10 parameters), if you had fewer than 50 subjects you might not want to try that ...) Barr et al 2013 believe you should go with the maximal model, Bates et al. think you shouldn't ...

Best Answer

Related Solutions

Solved – Maximal model for linear mixed-effects model for repeated mesaures design

Solved – Nesting random effect within fixed effect using lmer() of lme4 in R

Related Question