Solved – Include nesting factor as fixed effect in a GLMM

fixed-effects-modelglmmnested datar

I have the following GLMM:

success ~ age + gender + group/task + (1 + group/task|school/subject), family = binomial

I want to know whether participants' probability to succeed in certain problem-solving tasks can be predicted by the type of task.
I have 6 tasks which can be categorized into 3 groups (A, B, C) with 2 tasks in each group. Each participant received 3 tasks (one from A, one from B, one from C; combination and order counterbalanced).
Cochran's Q- and post-hoc McNemar tests revealed that the three groups differ in their success rates: A is easier than B and C, and B and C are equally difficult.
I used crosstabs to analyze whether the tasks within each group differ in their success rates and found that they are different for A and B.

Now I would like to do a comparison of all individual tasks (not just those within one category).

My question is: Is the equation above correct in terms of the fixed effects or is there any reason to include group as an extra fixed effect (e.g. to see whether task has an effect on top of the group effects found in the McNemar tests)? Would that be unnecessary?

What does it mean to include both group/task and group?

Best Answer

First, let's try to simulate some data with a similar structure of your problem's data. My understanding from your description is that we have something like this:

Say we have 4 schools with 8 subjects in each school, thus a total of 32 subjects. Each subject is posed with 3 tasks (one from each group), thus a total of 96 observations. As we have 6 different tasks, we will ask 16 times each task (6*16=96), so 32 tasks from each group (32*3=96).

n.schools <- 2
n.subject <- 20
n         <- n.subject*3
subject   <- rep(1:n.subject, 3)
school    <- rep(1:n.schools, (n/n.schools))

age    <- rnorm(n.subject, 30, 5)
age    <- rep(age,3)
gender <- sample(c(0,1),n.subject, replace=TRUE)
gender <- rep(gender,3)

A     <- c(rep(1,n/3),rep(0,2*n/3))
B     <- c(rep(0,n/3),rep(1,n/3), rep(0, n/3))
C     <- c(rep(0,2*n/3),rep(1,n/3))
group <- c(rep("A",n/3), rep("B",n/3),rep("C",n/3))
task  <- c(rep("a",n/6),rep("b",n/6),rep("c",n/6),rep("d",n/6),rep("e",n/6),rep("f",n/6))

u.subject <- rnorm((n/3), 0, 1)
u.school  <- rnorm(n.schools, 0 ,1)

lattent <- -20 + 0.5*age + 2*gender + 2*A -3*B -3.2*C + u.subject + u.school + rnorm(n,0,1)
pr      <- 1/(1+exp(-lattent))
success <- rbinom(n, 1, pr)

Now let's try a GLMM model as the one you propose:

library(lme4)
> glmer(success ~ age + gender + group/task + (1 + group/task | school/subject), family=binomial)
Error: number of observations (=60) < number of random effects (=360) 
for term (1 + group/task | subject:school); 
the random-effects parameters are probably unidentifiable

So what does this mean? In general, you need your mixed effects model to be identifiable, and one of the conditions for this to happen is $\sum_{i=1}^N (n_i-k) >0$, where $k$ is the number of random effects. In the case of balanced data, this can be written equivalently $N n_1 > N k$ or $n_1 > k$. In other words you need to have less random parameters than the number of observations in each cluster/group, subject in your case. This cannot be the case if you add the group/task term in the random part.

What group/task or school/subject means in the formula?

This is simply an equivalent way of writing group + group:task so you actually have group and its interaction with task. So the last question doesn't actually makes sense. The same principle is used when you put this term as a grouping factor in the right part of (1 | ...) which actually translates to (1| school) + (1 | school:subject) hence the one factor nested within the other.

So, I suggest you read Chapter 2 from Bates book and pay extra attention to model specification. It is really important to define what actually makes sense to include as fixed or random effects and what the grouping factors will be. This depends on how you designed the study and what you want to extract from your models.