Solved – Mixed, repeated measure model specification and results interpretation using LMER in R

lme4-nlmemixed modelrrepeated measures

I am working with data from a computer task which has 288 total trials, each of which can be categorically classified according to Trial Type, Number of Stimuli, and Probe Location. Because I want to also examine a continuous variable, the total Cartesian Distance between stimuli per trial (divided by number of stimuli to control for varying numbers), I have opted to use a mixed linear model with repeated measures. In addition to each of these task variables, I am also interested in whether folks in various diagnostic groups perform differently on the task, as well as whether or not there is a Dx interaction with any of the above variables. Thus (if I'm not mistaken), I have the following effects in my model:

Trial Type, a fixed effect
Number of Stimuli, a fixed effect
Probe Location, a fixed effect
Dist(ance), a fixed effect
Dx, a fixed effect
Dx*Trial Type, a fixed effect
Dx*Number of Stimuli, a fixed effect
Dx*Probe Location, a fixed effect
Dx*Dist, a fixed effect
Trial, a random effect, nested within
SubID, a random effect

Based on my examination of documentation, it seems that the nesting of random effects does not seem to be important to lme4, and so I specify my model as follows:

tab.lmer <- lmer(Correct ~ Dx+No_of_Stim+Trial_Type+Probe_Loc+Dist+Dx*No_of_Stim+Dx*Trial_Type+Dx*Probe_Loc+Dx*Dist+(1|Trial)+(1|SubID),data=bigdf)

This would be my first question: 1) Is the above model specification correct?

Assuming so, I am a bit troubled by my results, but as I read and recall my instruction on such models, I am wondering if interpretation of particular coefficients is bad practice in this case:

Linear mixed model fit by REML ['merModLmerTest']
Formula: Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist + Dx *  
    No_of_Stim + Dx * Trial_Type + Dx * Probe_Loc + Dx * Dist +  
    (1 | Trial) + (1 | SubID)
   Data: bigdf

REML criterion at convergence: 13600.4

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.89810 -0.03306  0.27004  0.55363  2.81656 

Random effects:
 Groups   Name        Variance Std.Dev.
 Trial    (Intercept) 0.013256 0.11513 
 SubID    (Intercept) 0.006299 0.07937 
 Residual             0.131522 0.36266 
Number of obs: 15840, groups:  Trial, 288; SubID, 55

Fixed effects:
                         Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)             4.196e-01  4.229e-02  4.570e+02   9.922  < 2e-16 ***
DxPROBAND               8.662e-02  4.330e-02  2.920e+02   2.000  0.04640 *  
DxRELATIVE              9.917e-02  4.009e-02  2.920e+02   2.474  0.01394 *  
No_of_Stim3            -9.281e-02  1.999e-02  4.520e+02  -4.642 4.53e-06 ***
Trial_Type1             3.656e-02  2.020e-02  4.520e+02   1.810  0.07097 .  
Probe_Loc1              3.502e-01  2.266e-02  4.520e+02  15.456  < 2e-16 ***
Probe_Loc2              3.535e-01  3.110e-02  4.520e+02  11.369  < 2e-16 ***
Dist                    1.817e-01  2.794e-02  4.520e+02   6.505 2.06e-10 ***
DxPROBAND:No_of_Stim3  -1.744e-02  1.759e-02  1.548e+04  -0.992  0.32144    
DxRELATIVE:No_of_Stim3 -2.886e-02  1.628e-02  1.548e+04  -1.773  0.07628 .  
DxPROBAND:Trial_Type1  -9.250e-03  1.777e-02  1.548e+04  -0.521  0.60267    
DxRELATIVE:Trial_Type1  1.336e-02  1.645e-02  1.548e+04   0.812  0.41682    
DxPROBAND:Probe_Loc1   -8.696e-02  1.993e-02  1.548e+04  -4.363 1.29e-05 ***
DxRELATIVE:Probe_Loc1  -4.287e-02  1.845e-02  1.548e+04  -2.323  0.02018 *  
DxPROBAND:Probe_Loc2   -1.389e-01  2.735e-02  1.548e+04  -5.079 3.83e-07 ***
DxRELATIVE:Probe_Loc2  -8.036e-02  2.532e-02  1.548e+04  -3.173  0.00151 ** 
DxPROBAND:Dist         -3.920e-02  2.457e-02  1.548e+04  -1.595  0.11066    
DxRELATIVE:Dist        -1.485e-02  2.275e-02  1.548e+04  -0.653  0.51390    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In general, these results make sense to me. The troubling portion, however, comes in the positive, significant (yes, I am using LmerTest) p-value for DxProband, particularly in light of the fact that in terms of performance means, Probands are performing worse than Controls. So, this mismatch concerns me. Examining the corresponding ANOVA:

> anova(tab.lmer)
Analysis of Variance Table of type 3  with  Satterthwaite 
approximation for degrees of freedom
               Sum Sq Mean Sq NumDF   DenDF F.value    Pr(>F)    
Dx             0.8615  0.4308     2   159.0   1.412   0.24662    
No_of_Stim     0.6984  0.6984     1   283.5  37.043 3.741e-09 ***
Trial_Type     8.3413  8.3413     1   283.5   4.456   0.03565 *  
Probe_Loc     25.7223 12.8612     2   283.5 116.405 < 2.2e-16 ***
Dist           5.8596  5.8596     1   283.5  43.399 2.166e-10 ***
Dx:No_of_Stim  1.4103  0.7051     2 15483.7   1.590   0.20395    
Dx:Trial_Type  2.0323  1.0162     2 15483.7   0.841   0.43128    
Dx:Probe_Loc   3.5740  0.8935     4 15483.7   7.299 7.224e-06 ***
Dx:Dist        0.3360  0.1680     2 15483.7   1.277   0.27885    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

…the results seem to more or less line up with the regression, with the exception of the Dx variable. So, my second question is 2) Can anyone clarify what is going on with the Dx variable? Is interpreting individual coefficients from the regression model bad practice in this case?

Finally, as a basic (and somewhat embarrassing) afterthought, 3) If I attempt to reduce the model, I should favor the model with the lower REML, yes?

In summation,
1) Is the above model specification correct?
2) Can anyone clarify what is going on with the Dx variable? Is interpreting individual coefficients from the regression model bad practice in this case?
3) If I attempt to reduce the model, I should favor the model with the lower REML, yes?

ADDENDUM

By request, I'll describe the task and data a little further. The data come from a computer task in which participants are presented a number of stimuli, either two or three, in various locations about the screen. These stimuli can either be "targets" or "distractors". After these stimuli, a probe stimulus is presented; if it appears in a position where a previous target has appeared, participants should respond "yes"; if it appears in the position of a previous distractor or elsewhere, the correct answer is "no." There are 288 trials of this nature; some have two stimuli and some have three, and some lack distractors entirely. The variables in my model, then, can be elaborated as follows:

Number of Stimuli: 2 or 3 (2 levels)

Trial Type: No Distractor (0) or Distractor (1) (2 levels)

Probe Location: Probe at Target (1), Probe at Distractor (2), or Probe Elsewhere (0) (3 levels)

Distance: Total Cartesian distance between stimuli, divided by number of stimuli per trial (Continuous)

Dx: Participant's clinical categorization

Sub ID: Unique subject identifier (random effect)

Trial: Trial number (1:288) (random effect)

Correct: Response classification, either incorrect (0) or correct (1) per trial

Note that the task design makes it inherently imbalanced, as trials without distractors cannot have Probe Location "Probe at Distractor"; this makes R mad when I try to run RM ANOVAs, and it is another reason I opted for a regression.

Below is a sample of my data (with SubID altered, just in case anyone might get mad):

     SubID      Dx Correct No_of_Stim Trial_Type Probe_Loc      Dist Trial
1 99999999 PROBAND       1          3          0         1 0.9217487     1
2 99999999 PROBAND       0          3          0         0 1.2808184     2
3 99999999 PROBAND       1          3          0         0 1.0645292     3
4 99999999 PROBAND       1          3          1         2 0.7838786     4
5 99999999 PROBAND       0          3          0         0 1.0968788     5
6 99999999 PROBAND       1          3          1         1 1.3076598     6

Hopefully, with the above variable descriptions these data should be self-explanatory.

Any assistance that people can provide in this matter is very much appreciated.

Sincerely,
peteralynn

Best Answer

This mainly pertains to your first question.

You want to specify a "Dx interaction with any of the above variables", and seem to think this is done with $Dx*var$. But it is not, $Dx*var$ expands to $Dx + var + Dx:var$. The $Dx:var$ specifies the interaction.

So let's change your model to

Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
          Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
          (1 | Trial) + (1 | SubID)

(I'm not sure what lmer made out of your model specification.) So this fits a model with the fixed predictors you asked for, plus it allows for a individual intercept for each Trial, and additionally for each SubID.

Because I want to also examine a continuous variable, the total Cartesian Distance > between stimuli per trial (divided by number of stimuli to control for varying > numbers), I have opted to use a mixed linear model with repeated measures.

I don't get that. So you would like to include a continuous predictor - why does that make it a repeated measures design? Do you measure the same subject on the same tasks more than once? On differing tasks more than once?

3) If I attempt to reduce the model, I should favor the model with the lower REML, yes?

There are various ways to compare two models. The simplest is to set REML=FALSE for both, and compare them via a likelihood ratio test by anova(mod1, mod2). Another way is to use information criteria such as the AIC (Akaike information criterion) or BIC (Bayesian information criterion). Those try to balance model fit with model complexity, and a lower number indicates a "better" model. I don't think it's a good idea to use the REML criterion as a model comparison metric, afaik this is, at least in general, not valid (perhaps someone more knowledgeable can provide an explanation for that).

Update after ADDENDUM

Remove $(1 | Trial)$ from the random effects specification (unless you intend to model temporal correlation between trial runs; but even then you would need something other than the running trial index number as a grouping variable). It's pointless to include this as a grouping variable, since you only have one observation per level:

Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
          Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
          (1 | SubID)

This has the minimal random effects structure justified by your experiment: It fits a model with the fixed effects predictors, and adds an additional "random" (read: individual) intercept per individual subject, effectively shifting the regression line determined by the fixed effects predictors up and down by a subject-specific amount. The fact that subjects were repeatedly measured is thereby accounted for.

You can build upon this model significantly. The maximal random effects structure would allow for correlated random intercepts and slopes of all within-subject predictors, grouped by subject. Barr et al., (2013) recommend, for confirmatory hypothesis testing of fixed effects, to include for all tested fixed effects correlated random intercepts and slopes per subject.

Presuming you intend to test all your within-subjects variables number of stimuli, trial type, probe location, distance, and omit any interactions between them, this would be

Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
          Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
          (No_of_Stim + Trial_Type + Probe_Loc + Dist | SubID)

where intercepts and slopes for No_of_Stim, Trial_Type, Probe_Loc, and Dist are allowed to correlate. A simpler model, imposing independence between intercept and slope, would be

Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
          Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
          (1 | SubID) + (0 + No_of_Stim | SubID) + (0 + Trial_Type | SubID) +
          (0 + Probe_Loc | SubID) + (0 + Dist | SubID))

Relevant google terms for finding "the best" model are model building, top-down strategy, bottom-up strategy, but there is a lot of controversy regarding the best strategy.

Regarding interpretation of the coefficients in the output

By default, R uses treatment contrasts (see $help(contr.treatment)$ and http://talklab.psy.gla.ac.uk/tvw/catpred/). This means that one level of a factor is chosen as the reference factor, and the coefficients indicate the contrast to this reference level. If you are instead asking about the interpretation of the coefficients of fixed within-subject predictors that are additionally allowed to vary randomly per subject: It's essentially the same as in a simple linear model. They tell you about whether there's any systematic effect across subjects of this predictor on the outcome, after accounting for random variation by subjects.

You need to use a logistic regression model!

The elephant in the room, so to speak, is that you are trying to predict a binary outcome: correct/incorrect. You need to use a logistic mixed regression model (try $glmer$ and look at http://www.ats.ucla.edu/stat/r/dae/melogit.htm).

So, I hope this helped you a bit. Maybe also have a look at http://glmm.wikidot.com/faq and R's lmer cheat-sheet.

(I am by no means an expert on this, and in a state of learning as well, so if there's something to disagree, please do so freely.)

Related Question