Solved – Nested random effects model in lme4

lme4-nlmemixed modelrandom-effects-model

I am analyzing some data in R using the lmer function provided in the lme4 package. The experiment involves assigning a number of students to different trials of exam questions, and everone is assigned to the same three blocks of questions. The response of interests are Correct and RT, and I am considering doing two separate analyses with Correct and RT as the dependent variable respectively. The data looks like:

    SubjectID Block.No TrialNo IsTrial Correct   RT
136    332216        1       1       0       1 8306
137    332216        1       2       0       1 2076
138    332216        1       3       0       0 1051
139    332216        1       4       0       1 2864
140    332216        1       5       0       1 3516
141    332216        1       6       0       1 2494
142    332216        1       7       0       1 2260
143    332216        1       8       0       1 1852
144    332216        1       1  FASTER       0 1514
145    332216        1       2  FASTER       1  850
146    332216        1       3  FASTER       1  919
147    332216        1       4  FASTER       1  855
148    332216        1       1       1       0 1514
149    332216        1       2       1       1 1480
150    332216        1       3       1       1  863
151    332216        1       4       1       1 1270
152    332216        1       5       1       1  701
153    332216        1       6       1       1  835
154    332216        1       7       1       1 1317
155    332216        1       8       1       1  626

where the variable IsTrial indicates whether the trial is an actual trial (some are practice), and observations with IsTrial labeled other than 1 will be excluded from analysis. The variable TrialNo is nested within Block.No which is nested within SubjectID. Questions in different blocks are different in terms of difficulty, so the same TrialNo in different blocks refers to different questions.

I am considering a linear mixed effects model, with SubjectID, Block.No, and TrialNo as random intercepts and some other variables as fixed effects, i.e.,

fit <- lmer(RT ~ 1 + some fixed effects + (1|SubjectID/Block.No/TrialNo)).

Now I am wondering what happens if I create a new variable that uniquely defines the grouping structure, taking into account all SubjectID, Block.No, and TrialNo. For example, in the first row, this new variable has value: 332216_1_1, in the second row, 332216_1_2, etc. In my new model, which looks like:

fit1 <- lmer(RT ~ 1 + some fixed effects + (1|new variable)),

I use only the new variable instead of the nested one as a random effect. I am wondering whether this is something plausible and what difference does the new model make?

Best Answer

As you've described the study, trial is nested within block, but block isn't nested within subject. That is, trial 3 is a different question in blocks 1 and 2, but block 3 is the same set of 8 questions for each subject. Hence, a natural way to structure the random effects would be to have one random intercept effect per subject plus 8N random intercepts nested into N batches of 8, where N is the number of blocks. Or, if N is small, you could treat block as a fixed effect and have a single batch of 8N per-trial random intercepts (plus the aforementioned per-subject intercepts).

You asked what the difference is between fancy random-effects structures like these and Cartesian-producting all the dummy variables in a study together to get one big batch of random effects (new variable). The difference is that each batch of random effects has its variance estimated separately, and that orthogonal effects are obliged to behave consistently. (And, of course, the more random effects you have, the harder it is to estimate each.) To use a simpler example, imagine you have a model where each subject is a child and you have dummy variables for the child's father and mother. Assume the dataset has a lot of half-siblings in it, so that mother and father effects are distinguishable. If you say

lmer(outcome ~ 1 + fixed effects + (1|Mother) + (1|Father))

then the model is allowed to believe, e.g., that the effects of father vary more than the effects of mothers. On the other hand, if you make each mother–father pair its own value of a single dummy variable, and say

lmer(outcome ~ 1 + fixed effects + (1|new variable))

then new variable gets only one variance. Also, whereas this model allows for arbitrarily complicated interactions between mother and father, the first model postulates that the effects are purely additive. And if $M$ is the number of mothers and $F$ the number of fathers, the first model has $M + F$ different random effects and the second has $MF$.

Finally, I don't think it's wise to consider RT and Correct in completely separate models. Shouldn't whether people answer a question correctly be related to how quickly they answer it?