Solved – Analyzing a 2×3 repeated measures design using a logit mixed model

logisticmixed modelrrepeated measures

An experiment I conducted recently used a 2 (between participants) x 3 (within participants) design. That is, participants were randomly allocated to one of two conditions, and then completed three similar tasks each (in a counterbalanced order).

In each of these tasks, participants made binary choices (2AFC) in a number of trials, each of which had a normatively correct answer.
In every trial, participants were presented a distractor, which was assumed to bias responses towards one of the alternatives. The tasks differed only in presence and magnitude of this distractor (i.e. no distractor vs. distractor of small and large magnitude).

I would like to examine the error rates (deviations from the normatively correct answer) across these conditions. I hypothesize that the error rate will increase when a distractor is present, but will not increase further when the magnitude of the distractor is increased. Also, I expect this increase to differ between the between-subjects conditions. The latter interaction is the central interest.

From discussions here and from the literature (Dixon, 2008; Jaeger, 2008), I gather that logit mixed-models are the appropriate analysis method, and that, in R, the lme4 package is the tool of choice.
While I could compute some basic analyses (e.g. random intercept model, random effects ANCOVA) with lme4, I am stuck as to how to apply the models to the design in question — I have the feeling that I am very much thinking in terms of HLMs, and have not yet quite understood the entirety of mixed effects models. Therefore, I would be very grateful for your help.

I have two basic questions:

In a first analysis, I would like to consider the error rates in only those trials in which participants were biased towards the wrong answer. The first model would therefore look only at trials in which the bias would point away from the correct answer.

If my observations were independent, i would probably use a model like this:

correct ~ condition + distractor + condition:distractor

… but obviously, they aren't: Observations are grouped within a task (with a constant presence of a distractor) and within participants. My question, then, is this: How do I add the random effects to reflect this?
(If I haven't lost you already 🙂 ) Would it be possible to include all trials (those where the bias would be into the direction of the wrong and of the correct answer), and include this difference (i.e. direction of the bias) as a trial-level predictor?

In my imagination of HLMs, such a predictor (at the level of the trial) would depend on on the magnitude of the distractor present (at the level of the block), which again would depend on the condition of the participant (plus, possibly, a unique factor for each participant).
The interactions would then emerge ›automatically‹ as cross-level interactions. How would such a model be specified in the ›flat‹ lme4 syntax? (Would such a model make sense at all?)

Ok, I hope all of this makes sense — I will gladly elaborate otherwise. Again, I would be most grateful for any ideas and comments regarding this analysis, and would like to thank you for taking the time and trouble to respond.

References

Dixon, P. (2008). Models of accuracy in repeated-measures designs. Journal of Memory and Language, 59(4), 447-456. doi: 10.1016/j.jml.2007.11.004

Jaeger T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434-446. doi: 10.1016/j.jml.2007.11.007

Best Answer

I would definitely use all your data and add direction of intended bias induction as a variable. Since you already have variables in the model describing the difference between the tasks, I don't believe that adding task as a random effect is necessary. The model would be:

my_model = lmer(
    correct ~ (1|subject) + condition*distractor*direction
    , family = 'binomial'
    , data = my_data
)

Check out the ezMixed function from the ez package for an automated way of evaluating evidence for each effect in the model.

Related Solutions

Solved – Linear mixed model for repeated measures analysis with missing values

Here is a background paper, "Normal distribution based pseudo ML for missing data: With applications to mean and covariance structure analysis" with full text available at http://www.sciencedirect.com/science/article/pii/S0047259X09001079 .

Per this reference to quote some pertinent comments:

"When the population follows a confirmatory factor model, and data are missing due to the magnitude of the factors, the MLE may not be consistent even when data are normally distributed. When data are missing due to the magnitude of measurement errors/uniqueness, MLEs for many of the covariance parameters related to the missing variables are still consistent."

The aforementioned paper identifies and discusses factors that impact the asymptotic biases of the MLE for data that are not missing at random.

A technique that I believe has value in providing a solution is to replace the missing value, see "What to Do about Missing Values in Time-Series Cross-Section Data", full text available at https://www.google.com/url?sa=t&source=web&rct=j&ei=Ecb4U5rgEfLfsAS_BQ&url=http://gking.harvard.edu/files/pr.pdf&cd=10&ved=0CDcQFjAJ&usg=AFQjCNGpnhU_8okmEeNqnvRLCppKDEromw&sig2=Q5pAcvzVVH4IWmzFQFPe8A

To quote the author, the paper suggests the "concept of “multiple imputation,” a well-accepted and increasingly common approach to missing data problems in many fields. The idea is to extract relevant information from the observed portions of a data set via a statistical model,to impute multiple (around five) values for each missing cell,and to use these to construct multiple “completed” datasets."

Solved – Mixed, repeated measure model specification and results interpretation using LMER in R

This mainly pertains to your first question.

You want to specify a "Dx interaction with any of the above variables", and seem to think this is done with $Dx*var$. But it is not, $Dx*var$ expands to $Dx + var + Dx:var$. The $Dx:var$ specifies the interaction.

So let's change your model to

Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
          Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
          (1 | Trial) + (1 | SubID)

(I'm not sure what lmer made out of your model specification.) So this fits a model with the fixed predictors you asked for, plus it allows for a individual intercept for each Trial, and additionally for each SubID.

Because I want to also examine a continuous variable, the total Cartesian Distance > between stimuli per trial (divided by number of stimuli to control for varying > numbers), I have opted to use a mixed linear model with repeated measures.

I don't get that. So you would like to include a continuous predictor - why does that make it a repeated measures design? Do you measure the same subject on the same tasks more than once? On differing tasks more than once?

3) If I attempt to reduce the model, I should favor the model with the lower REML, yes?

There are various ways to compare two models. The simplest is to set REML=FALSE for both, and compare them via a likelihood ratio test by anova(mod1, mod2). Another way is to use information criteria such as the AIC (Akaike information criterion) or BIC (Bayesian information criterion). Those try to balance model fit with model complexity, and a lower number indicates a "better" model. I don't think it's a good idea to use the REML criterion as a model comparison metric, afaik this is, at least in general, not valid (perhaps someone more knowledgeable can provide an explanation for that).

Update after ADDENDUM

Remove $(1 | Trial)$ from the random effects specification (unless you intend to model temporal correlation between trial runs; but even then you would need something other than the running trial index number as a grouping variable). It's pointless to include this as a grouping variable, since you only have one observation per level:

Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
          Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
          (1 | SubID)

This has the minimal random effects structure justified by your experiment: It fits a model with the fixed effects predictors, and adds an additional "random" (read: individual) intercept per individual subject, effectively shifting the regression line determined by the fixed effects predictors up and down by a subject-specific amount. The fact that subjects were repeatedly measured is thereby accounted for.

You can build upon this model significantly. The maximal random effects structure would allow for correlated random intercepts and slopes of all within-subject predictors, grouped by subject. Barr et al., (2013) recommend, for confirmatory hypothesis testing of fixed effects, to include for all tested fixed effects correlated random intercepts and slopes per subject.

Presuming you intend to test all your within-subjects variables number of stimuli, trial type, probe location, distance, and omit any interactions between them, this would be

Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
          Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
          (No_of_Stim + Trial_Type + Probe_Loc + Dist | SubID)

where intercepts and slopes for No_of_Stim, Trial_Type, Probe_Loc, and Dist are allowed to correlate. A simpler model, imposing independence between intercept and slope, would be

Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
          Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
          (1 | SubID) + (0 + No_of_Stim | SubID) + (0 + Trial_Type | SubID) +
          (0 + Probe_Loc | SubID) + (0 + Dist | SubID))

Relevant google terms for finding "the best" model are model building, top-down strategy, bottom-up strategy, but there is a lot of controversy regarding the best strategy.

Regarding interpretation of the coefficients in the output

By default, R uses treatment contrasts (see $help(contr.treatment)$ and http://talklab.psy.gla.ac.uk/tvw/catpred/). This means that one level of a factor is chosen as the reference factor, and the coefficients indicate the contrast to this reference level. If you are instead asking about the interpretation of the coefficients of fixed within-subject predictors that are additionally allowed to vary randomly per subject: It's essentially the same as in a simple linear model. They tell you about whether there's any systematic effect across subjects of this predictor on the outcome, after accounting for random variation by subjects.

You need to use a logistic regression model!

The elephant in the room, so to speak, is that you are trying to predict a binary outcome: correct/incorrect. You need to use a logistic mixed regression model (try $glmer$ and look at http://www.ats.ucla.edu/stat/r/dae/melogit.htm).

So, I hope this helped you a bit. Maybe also have a look at http://glmm.wikidot.com/faq and R's lmer cheat-sheet.

(I am by no means an expert on this, and in a state of learning as well, so if there's something to disagree, please do so freely.)

Best Answer

Related Solutions

Solved – Linear mixed model for repeated measures analysis with missing values

Solved – Mixed, repeated measure model specification and results interpretation using LMER in R

Related Question