Solved – Include nesting factor as fixed effect in a GLMM

fixed-effects-modelglmmnested datar

I have the following GLMM:

success ~ age + gender + group/task + (1 + group/task|school/subject), family = binomial

I want to know whether participants' probability to succeed in certain problem-solving tasks can be predicted by the type of task.
I have 6 tasks which can be categorized into 3 groups (A, B, C) with 2 tasks in each group. Each participant received 3 tasks (one from A, one from B, one from C; combination and order counterbalanced).
Cochran's Q- and post-hoc McNemar tests revealed that the three groups differ in their success rates: A is easier than B and C, and B and C are equally difficult.
I used crosstabs to analyze whether the tasks within each group differ in their success rates and found that they are different for A and B.

Now I would like to do a comparison of all individual tasks (not just those within one category).

My question is: Is the equation above correct in terms of the fixed effects or is there any reason to include group as an extra fixed effect (e.g. to see whether task has an effect on top of the group effects found in the McNemar tests)? Would that be unnecessary?

What does it mean to include both group/task and group?

Best Answer

First, let's try to simulate some data with a similar structure of your problem's data. My understanding from your description is that we have something like this:

Say we have 4 schools with 8 subjects in each school, thus a total of 32 subjects. Each subject is posed with 3 tasks (one from each group), thus a total of 96 observations. As we have 6 different tasks, we will ask 16 times each task (6*16=96), so 32 tasks from each group (32*3=96).

n.schools <- 2
n.subject <- 20
n         <- n.subject*3
subject   <- rep(1:n.subject, 3)
school    <- rep(1:n.schools, (n/n.schools))

age    <- rnorm(n.subject, 30, 5)
age    <- rep(age,3)
gender <- sample(c(0,1),n.subject, replace=TRUE)
gender <- rep(gender,3)

A     <- c(rep(1,n/3),rep(0,2*n/3))
B     <- c(rep(0,n/3),rep(1,n/3), rep(0, n/3))
C     <- c(rep(0,2*n/3),rep(1,n/3))
group <- c(rep("A",n/3), rep("B",n/3),rep("C",n/3))
task  <- c(rep("a",n/6),rep("b",n/6),rep("c",n/6),rep("d",n/6),rep("e",n/6),rep("f",n/6))

u.subject <- rnorm((n/3), 0, 1)
u.school  <- rnorm(n.schools, 0 ,1)

lattent <- -20 + 0.5*age + 2*gender + 2*A -3*B -3.2*C + u.subject + u.school + rnorm(n,0,1)
pr      <- 1/(1+exp(-lattent))
success <- rbinom(n, 1, pr)

Now let's try a GLMM model as the one you propose:

library(lme4)
> glmer(success ~ age + gender + group/task + (1 + group/task | school/subject), family=binomial)
Error: number of observations (=60) < number of random effects (=360) 
for term (1 + group/task | subject:school); 
the random-effects parameters are probably unidentifiable

So what does this mean? In general, you need your mixed effects model to be identifiable, and one of the conditions for this to happen is $\sum_{i=1}^N (n_i-k) >0$, where $k$ is the number of random effects. In the case of balanced data, this can be written equivalently $N n_1 > N k$ or $n_1 > k$. In other words you need to have less random parameters than the number of observations in each cluster/group, subject in your case. This cannot be the case if you add the group/task term in the random part.

What group/task or school/subject means in the formula?

This is simply an equivalent way of writing group + group:task so you actually have group and its interaction with task. So the last question doesn't actually makes sense. The same principle is used when you put this term as a grouping factor in the right part of (1 | ...) which actually translates to (1| school) + (1 | school:subject) hence the one factor nested within the other.

So, I suggest you read Chapter 2 from Bates book and pay extra attention to model specification. It is really important to define what actually makes sense to include as fixed or random effects and what the grouping factors will be. This depends on how you designed the study and what you want to extract from your models.

Related Solutions

Solved – Random effect nested under fixed effect model in R

It doesn't make sense to both include tank as a random effect and nest tank within the pop/temp fixed effect. You only need one of these, depending on how tank is coded.

If tank is coded 1-8, you only need the tank random effect. Nesting it within the pop/temp fixed effect results in the same 8 units, so is not necessary.

If tank is coded 1-2 (that is, which rep it was), you only need to nest tank within the pop/temp fixed effect, because that gives you your 8 unique tanks. Including the tank random effect is only desired if the tanks were first divided into two groups and then randomized to treatment; if the eight tanks were completely randomized to treatment, this is not necessary.

You could do this with likelihood based solutions such those in nlme and lme4 but if everything is balanced, it might be simpler to use the traditional ANOVA approach using aov.

Creating some sample data:

set.seed(5)
d <- within(expand.grid(pop=factor(c("A","B")),
                        temp=factor(c("warm", "cold")),
                        rep=1:2,
                        fish=1:100), {
                          tank <- factor(paste(pop, temp, rep, sep="."))
                          tanke <- round(rnorm(nlevels(tank))[unclass(tank)],1)
                          e <- round(rnorm(length(pop)),1)
                          m <- 10 + 2*as.numeric(pop)*as.numeric(temp)
                          growth <- m + tanke + e
                        })

Using aov like this:

a0 <- aov(growth ~ pop*temp + Error(tank), data=d)
summary(a0)

or lme like this:

library(nlme)
m1 <- lme(growth ~ pop*temp, random=~1|tank, data=d)
anova(m1)

Solved – Should I consider time as a fixed or random effect in GLMM

I think it may be a little more complex than just "fixed" or "random" effect. What you seem to be suggesting is that there is a known decline in bird abundance over the years. What is perhaps not known is whether that can be explained by values of existing variables in your regression. Ideally, you would include all the variables that could be influencing abundance and not the year, but it seems perhaps you have some unmeasured variables that are time-dependent.

If you include a coefficient for every possible year (treating it as a factor) this will lead to a fairly saturated model, and you may get biased coefficient estimates for other variables.

If you instead treat year as a random effect (i.e., for each year the effect is sampled randomly from a fixed Normal distribution) you are ignoring the requirement for random effects to be exchangeable, so that does not appear to be legitimate either.

If you instead include year as a linear predictor (i.e., have a single coefficient for the year, perhaps centred around the study midpoint year) you might run into problems if the actual effect of the unmeasured variables is non-linear. This could be checked by examining the prediction residuals versus the year covariate.

My advice would be to do the following:

Plot abundance (log transformed) versus year, to see what the overall structure looks like. If it seems to be linear then try adding year as a linear predictor (fixed effect) and examine the relationship between the residuals and year.
Run your model without year as a predictor and examine the relationship between the residuals from this model and year - if there is some form of structure then you need to account for it somehow.
Perhaps consider the use of fractional polynomials in your regression, as these can be quite flexible without increasing model complexity too dramatically. In this case you will need to rescale year so it is always positive but not too large.

Hope that is of some help...

Best Answer

Related Solutions

Solved – Random effect nested under fixed effect model in R

Solved – Should I consider time as a fixed or random effect in GLMM

Related Question