Solved – How to account for repeated measures in glmer

glmmlme4-nlmerrepeated measures

My design is as follows.

  • $y$ is Bernoulli response
  • $x_1$ is a continuous variable
  • $x_2$ is a categorical (factor) variable with two levels

The experiment is completely within subjects. That is, each subject receives each combination of $x_1$ and $x_2$.

This is a repeated measures logistic regression set-up. The experiment will give two ogives for $p(y=1)$ vs $x_1$, one for level1 and one for level2 of $x_2$. The effect of $x_2$ should be that for level2 compared to level1, the ogive should have a shallower slope and increased intercept.

I am struggling with finding the model using lme4. For example,

glmer(y ~ x1*x2 + (1|subject), family=binomial)

So far as I understand it, the 1|subject part says that subject is a random effect. But I do not see how to specify that $x_1$ and $x_2$ are repeated measures variables. In the end, I want a model that includes a random effect for subjects, and gives estimated slopes and intercepts for level1 and level2.

Best Answer

tl;dr: Your model already accounts for the fact that you have repeated measures. Nonetheless, if it fits, you would do best to use:

glmer(y ~ x1*x2 + (x1:x2|subject), family=binomial)

but if that isn't tractable, you could try:

glmer(y ~ x1*x2 + (1|subject) + (0+x1|subject) + (0+x2|subject), family=binomial)

   For an explanation of the syntax here, see: R's lmer cheat-sheet.


Full version: You don't need to "tell" R that $x_1$ and $x_2$ are repeated measures variables. (This is really just a small semantic distinction, but) I wouldn't say that variables can be "repeated measures variables" vs. "non-repeated measures variables". Variables are just variables. I would say that, e.g., 'variable 1 is measured within patients, and variable 2 is measured between patients' or something like that. Of course, your phrasing is fine, you just don't want it to lead to some confusion where you think of repeated measures-ness as some ontological status intrinsic to the variable.

At any rate, instead of telling R that a variable is measured within people, you simply need to formulate a model using random and/or effects fixed to account for the non-independence of the data that come from the same person. (Yes, you can use a fixed effect to account for this: every person would be a level of a categorical variable that is included. However, this will answer a slightly different question—almost certainly not the one you are interested in—and unless you have many measurements on the same person in every combination of conditions, the model will not be tractable.) In practice, you will use random effects to account for this. Specifically, you will have a random effect for each subject.

Next you need to specify what you want random effects for. The syntax you used, (1|subject), will cause R to include a random intercept for each person. This will shift someone's line of best fit up or down relative to the mean. You should think about whether people are also likely to differ in their slopes—i.e., how strongly they respond to changes in your variables. You should also think about whether the random effects are correlated with each other, e.g., maybe people who start off higher when $x_1=0$ tend to also respond more strongly to increases in $x_1$. Common advice is to include all possible random effects and intercorrelations (Barr et al., 2013, "Keep it maximal", pdf). However, bear in mind that GLMMs are more difficult computationally than LMMs, so such a model may not be tractable.