R Mixed Model – Using lme4 in R for Multiple Response Data Analysis

lme4-nlmemanovamixed modelrrandom-effects-model

I have some questions about how to translate my data structure and research Qs to syntax for the lmer function (lme4 package).

I am looking to predict teenagers scores on a mental health questionnaire (0-10) from 3 variables: age (continuous), sex (M/F) and my variable of interest, let's call it X for simplicity (continuous). I demeaned both my continuous predicted variables. Both the teenager and their parent filled out the questionnaire about the teenager, adding a third (within-subject) variable rater. I restructured the dataset into long format such that each subject has 2 rows for each outcome value: parent-report & self-report.

Number of subjects ~85
Number of outcome observations ~170

I am interested in the following effects:

  • The fixed effect of X on outcome score (main interest)
  • The interaction between X and sex on the outcome ("does X affect the outcome in one sex but not the other?")
  • The fixed effect of sex on scores
  • The fixed effect of age on scores

But I would also like to know whether the effects above are dependant on who is the rater? In this sense, rater is not a nuisance grouping variable whose effect on the outcome I want to account for. I would like to perform a test similar to a MANOVA but given that some subjects are missing some observations, I would prefer to use mixed models. As I understand it, linear mixed models can be used for multiple outcome data but I do not know how to phrase the syntax such that:

  • I declare non-independant observations within subjects (rater falls within subject)
  • I do not have a random slope for every subject (I have a relatively small sample size)

Using some specific examples, I'd like to know which (if any!) of the following capture my needs…

m1 <- lmer(score ~ X*sex+ age + (1+rater), data = mydata ) 

m2 <- lmer(score ~ X*sex + age + (1|rater), data = mydata )  
     # same as m1?

m3 <- lmer(score ~ X*sex + age + (1|ID/rater), data = mydata ) 
     # Error: number of levels of each grouping factor must be < number of   observations 
     # An issue related to missing data??

Any help (for any part of the above) is appreciated!

Best Answer

Using some specific examples, I'd like to know which (if any!) of the following capture my needs...

m1 <- lmer(score ~ X*sex+ age + (1+rater), data = mydata ) 

m2 <- lmer(score ~ X*sex + age + (1|rater), data = mydata )  
     # same as m1?

m3 <- lmer(score ~ X*sex + age + (1|ID/rater), data = mydata ) 
     # Error: number of levels of each grouping factor must be < number of   observations 
 # An issue related to missing data??

Model m1 does not contain random effects. It should produce an error.

Model m2 fits fixed effects for X, sex,age and the X:sex interaction. This will not provide an answer the question of "whether the effects above are dependant on who is the rater". This also fits random intercepts for rater, but, since rater appears to be a binary indicator, this does not make much sense, as the software would be estimating a variance for a normally distributed variable from only 2 observations.

Model m3 fits the same fixed effects as m2 so also does not provide answer the question of "whether the effects above are dependant on who is the rater". The random effects term (1|ID/rater) is equivalent to (1|ID) + (1|ID:rater). For the first part there are 85 levels, and for the 2nd part there are 170 levels, the sum of which is 255 and since there are only 170 total observations, this produces the error above.

The model :

m4 <- lmer(score ~ X*sex*rater + age*rater + (1 | ID)) 

given in the @ErikRuzek solves most of the problems. However, this will provide only fixed effects for age, sex and rater (ie for when it is 0 or 1) along with fixed effects for the interactions age:rater, sex:rater X:sex and sex:rater and X: sex:rater, but will not allow any of these to vary by ID. Regarding the comment to Erik's answer about whether it "capture the fact that rater is a within-subject variable that is nested within subject", yes it does, by fitting fixed effects for rater. If you wanted to allow all of these to vary by ID then you would want to fit random slopes. The maximal model would be

m5 <- lmer(score ~ X*sex*rater + age*rater + (1 + X*sex*rater + age*rater| ID)) 

However, it is very unlikely that the data would support such a complex random structure. You might instead try something like:

m6 <- lmer(score ~ X*sex*rater + age*rater + (1 + rater + sex:rater + X:sex:rater + age:rater| ID))

although again, I would expect this to be overfitted, so simplifying again to something like:

m6 <- lmer(score ~ X*sex*rater + age*rater + (1 + rater | ID))

If this, too, will not converge, or results in a singular fit, then you would need to stick with model m4. If it does converge and is not singular, then you could try adding the 2-way interactions as random slopes. The choice of random slopes should also be informed by the underlying theory of the data generation process.

Regarding missing values, I would recommend multiple imputation. In addition to Erik's suggestion, I would also suggest looking at the mice package in R.

Related Question