GLMM – glmer Fitting Different Models for Binary Variable as Integer 0,1 or Factor

glmmlme4-nlmemixed modelr

This seems like a relatively basic question, but I can't find good pointers after two days of searching.

I am trying to fit a generalized linear mixed model to data obtained in an experiment. The data are repeated measures on subjects over two conditions (one main effect), and I am interested in measuring that main effect.

The data is the following:

subject <- factor(rep(c(1,2,3,4,5,6,7), each=2))
n <- rep(85, 14)
Ncorrect <- c(67, 70, 65, 80, 70, 76, 67, 62, 65, 71, 63, 75, 58, 71)
Out <- rep(c(0,1), 7) #condition encoded as Integer
OutF <- factor(Out) #condition encoded as logical
expData <- data.frame(subject, n, Ncorrect, Out, OutF)

I want to fit a mixed model to take into consideration the variation of the main effect (slope) between subjects. The problem I have is that I get different models fitted depending on whether I encode experimental conditions as factor vs encoding them as 0,1 integers.

fitInt <- glmer(cbind(Ncorrect, n-Ncorrect) ~ Out + (Out+0|subject),
              data=expData, family=binomial) #Out is an integer

fitFac <- glmer(cbind(Ncorrect, n-Ncorrect) ~ OutF + (OutF+0|subject),
             data=expData, family=binomial) #OutF is factor

First the similarity: they both return almost identical fixed effects (Out and Intercept), which is what I'm interested in.

Then, fitFac shows df=9, while fitInt shows df=11, and they differ in AIC and BIC. In this example fitInt seems more in line with the 3 parameters I want: (Intercept), Out (main effect), and the standard deviation of Out among subjects.

The outputs for the random factors for each fit are the following:

summary(fitInt)
Random effects:
 Groups  Name Variance Std.Dev.
 subject Out  0.1466   0.3829  

summary(fitLog)
Random effects:
 Groups  Name      Variance  Std.Dev.  Corr 
 subject OutLFALSE 9.417e-08 0.0003069      
         OutLTRUE  1.466e-01 0.3828849 -0.96

It seems to me like fitLog is fitting a different random effect for each level of Out, and a correlation between those two? That would explain the Random effects summary and the differences in degrees of freedom.

As asked for in a comment, the Z matrices for both fits are the following:

getME(fitFac,"Z")
14 x 14 sparse Matrix of class "dgCMatrix"
   [[ suppressing 14 column names ‘1’, ‘1’, ‘2’ ... ]]

1  1 . . . . . . . . . . . . .
2  . 1 . . . . . . . . . . . .
3  . . 1 . . . . . . . . . . .
4  . . . 1 . . . . . . . . . .
5  . . . . 1 . . . . . . . . .
6  . . . . . 1 . . . . . . . .
7  . . . . . . 1 . . . . . . .
8  . . . . . . . 1 . . . . . .
9  . . . . . . . . 1 . . . . .
10 . . . . . . . . . 1 . . . .
11 . . . . . . . . . . 1 . . .
12 . . . . . . . . . . . 1 . .
13 . . . . . . . . . . . . 1 .
14 . . . . . . . . . . . . . 1


getME(fitInt,"Z")
14 x 7 sparse Matrix of class "dgCMatrix"
   1 2 3 4 5 6 7
1  . . . . . . .
2  1 . . . . . .
3  . . . . . . .
4  . 1 . . . . .
5  . . . . . . .
6  . . 1 . . . .
7  . . . . . . .
8  . . . 1 . . .
9  . . . . . . .
10 . . . . 1 . .
11 . . . . . . .
12 . . . . . 1 .
13 . . . . . . .
14 . . . . . . 1

Since I want just the random effect of subject on the main effect (Out), matrix 2 seems more in line with that.

The problem seems to be that I want one effect for Out, but when it's coded as a factor (even if binary), an effect for each level seems to be fitted, leading to problems with all the variance falling into one of the levels. So having Out as an integer seems to give me the model I want, but it also seems to be messier according to responses/comments and some previous experience of mine? This looks like a basic issue, maybe I'm missing something obvious I have somehow managed to do without so far.

The main effects I'm interested in don't change between the fits, but I have to apply this analysis to other data too, and I want to understand what's happening before publishing the results.

Thanks for any help.

Best Answer

What you observe is due to how the formula interface in R works when you give it a factor, and you exclude the intercept. That is, the formulas ~ Out + 0 and ~ OutF + 0 are not equivalent. To see this, compare the output of the model.matrix() function that is used internally to construct the corresponding design matrices, i.e., compare

 model.matrix(~ OutF + 0, data = DF)

with

model.matrix(~ Out + 0, data = DF)

To make it equivalent you need to replace the second by

model.matrix(~ I(1 - Out) + Out + 0, data = DF)

Hence, compare

glmer(cbind(Ncorrect, n - Ncorrect) ~ Out + (I(1 - Out) + Out + 0 | subject),
      data = expData, family = binomial())

with

glmer(cbind(Ncorrect, n - Ncorrect) ~ OutF + (OutF + 0 | subject),
      data = expData, family = binomial())
Related Question