GLMM – glmer Fitting Different Models for Binary Variable as Integer 0,1 or Factor

glmmlme4-nlmemixed modelr

This seems like a relatively basic question, but I can't find good pointers after two days of searching.

I am trying to fit a generalized linear mixed model to data obtained in an experiment. The data are repeated measures on subjects over two conditions (one main effect), and I am interested in measuring that main effect.

The data is the following:

subject <- factor(rep(c(1,2,3,4,5,6,7), each=2))
n <- rep(85, 14)
Ncorrect <- c(67, 70, 65, 80, 70, 76, 67, 62, 65, 71, 63, 75, 58, 71)
Out <- rep(c(0,1), 7) #condition encoded as Integer
OutF <- factor(Out) #condition encoded as logical
expData <- data.frame(subject, n, Ncorrect, Out, OutF)

I want to fit a mixed model to take into consideration the variation of the main effect (slope) between subjects. The problem I have is that I get different models fitted depending on whether I encode experimental conditions as factor vs encoding them as 0,1 integers.

fitInt <- glmer(cbind(Ncorrect, n-Ncorrect) ~ Out + (Out+0|subject),
              data=expData, family=binomial) #Out is an integer

fitFac <- glmer(cbind(Ncorrect, n-Ncorrect) ~ OutF + (OutF+0|subject),
             data=expData, family=binomial) #OutF is factor

First the similarity: they both return almost identical fixed effects (Out and Intercept), which is what I'm interested in.

Then, fitFac shows df=9, while fitInt shows df=11, and they differ in AIC and BIC. In this example fitInt seems more in line with the 3 parameters I want: (Intercept), Out (main effect), and the standard deviation of Out among subjects.

The outputs for the random factors for each fit are the following:

summary(fitInt)
Random effects:
 Groups  Name Variance Std.Dev.
 subject Out  0.1466   0.3829  

summary(fitLog)
Random effects:
 Groups  Name      Variance  Std.Dev.  Corr 
 subject OutLFALSE 9.417e-08 0.0003069      
         OutLTRUE  1.466e-01 0.3828849 -0.96

It seems to me like fitLog is fitting a different random effect for each level of Out, and a correlation between those two? That would explain the Random effects summary and the differences in degrees of freedom.

As asked for in a comment, the Z matrices for both fits are the following:

getME(fitFac,"Z")
14 x 14 sparse Matrix of class "dgCMatrix"
   [[ suppressing 14 column names ‘1’, ‘1’, ‘2’ ... ]]

1  1 . . . . . . . . . . . . .
2  . 1 . . . . . . . . . . . .
3  . . 1 . . . . . . . . . . .
4  . . . 1 . . . . . . . . . .
5  . . . . 1 . . . . . . . . .
6  . . . . . 1 . . . . . . . .
7  . . . . . . 1 . . . . . . .
8  . . . . . . . 1 . . . . . .
9  . . . . . . . . 1 . . . . .
10 . . . . . . . . . 1 . . . .
11 . . . . . . . . . . 1 . . .
12 . . . . . . . . . . . 1 . .
13 . . . . . . . . . . . . 1 .
14 . . . . . . . . . . . . . 1


getME(fitInt,"Z")
14 x 7 sparse Matrix of class "dgCMatrix"
   1 2 3 4 5 6 7
1  . . . . . . .
2  1 . . . . . .
3  . . . . . . .
4  . 1 . . . . .
5  . . . . . . .
6  . . 1 . . . .
7  . . . . . . .
8  . . . 1 . . .
9  . . . . . . .
10 . . . . 1 . .
11 . . . . . . .
12 . . . . . 1 .
13 . . . . . . .
14 . . . . . . 1

Since I want just the random effect of subject on the main effect (Out), matrix 2 seems more in line with that.

The problem seems to be that I want one effect for Out, but when it's coded as a factor (even if binary), an effect for each level seems to be fitted, leading to problems with all the variance falling into one of the levels. So having Out as an integer seems to give me the model I want, but it also seems to be messier according to responses/comments and some previous experience of mine? This looks like a basic issue, maybe I'm missing something obvious I have somehow managed to do without so far.

The main effects I'm interested in don't change between the fits, but I have to apply this analysis to other data too, and I want to understand what's happening before publishing the results.

Thanks for any help.

Best Answer

What you observe is due to how the formula interface in R works when you give it a factor, and you exclude the intercept. That is, the formulas ~ Out + 0 and ~ OutF + 0 are not equivalent. To see this, compare the output of the model.matrix() function that is used internally to construct the corresponding design matrices, i.e., compare

 model.matrix(~ OutF + 0, data = DF)

with

model.matrix(~ Out + 0, data = DF)

To make it equivalent you need to replace the second by

model.matrix(~ I(1 - Out) + Out + 0, data = DF)

Hence, compare

glmer(cbind(Ncorrect, n - Ncorrect) ~ Out + (I(1 - Out) + Out + 0 | subject),
      data = expData, family = binomial())

with

glmer(cbind(Ncorrect, n - Ncorrect) ~ OutF + (OutF + 0 | subject),
      data = expData, family = binomial())

Related Solutions

Solved – Why is Poisson regression different with glmer and gamlss

It may be that it has changed since this question was written, but it looks as though the random effect is not coded correctly for gamlss. You have it written as "random=~1|Trial," but when I try to run that through gamlss it states that the "|" is not valid for factors. More details on how to code random effects for gamlss is in the manual: http://www.gamlss.org/wp-content/uploads/2013/01/gamlss-manual.pdf

The benefit of gamlss is that you can model different aspects (I use a zero-inflated beta distribution and can model both the distribution of non-zero values and the probability of zeroes). It could be that, as coded, the random effect is not influencing the same part of the model as it is in glmer. When I use a ~ for the random effect in my gamlss models, it doesn't contribute to the distribution of non-zero values anymore; it is transferred to the probability of zero.

Unfortunately I have not figured out how to properly code mixed models for gamlss, but hopefully this at least clears up why you're getting different results.

Solved – Random slopes for interactions without random slopes for main effects

I'm not sure this makes sense.

First of all, it won't work the way you expect: A:B will expand to a model of the same dimensionality as A*B (and lme4 isn't quite as smart as it might be about handling the redundancies

set.seed(101)
dd <- expand.grid(A=factor(1:2),B=factor(1:2),Subject=1:40,rep=1:20)
dd$Y <- simulate(~A*B+(A*B|Subject),
                 newdata=dd, family=binomial,
                 weights=rep(1,nrow(dd)),
                 newparams=list(beta=rep(1,4),
                                theta=rep(1,10)))[[1]]
library(lme4)

Your original model:

m1 <- glmer(Y ~ A*B + (A*B|Subject), dd, family=binomial)

This gives convergence warnings, and ends up with a log-likelihood 0.003 less than the original (basically, numeric fuzz)

m2 <- update(m1, . ~ A*B+(A+A:B|Subject))

This is also equivalent:

m3 <- update(m1, . ~ A*B+(A:B-1|Subject))
all.equal(logLik(m1),logLik(m3),tolerance=1e-4) ## TRUE

If you really wanted to leave out the main effect of B you'd have to use dummy variables.

For the meaning of this: can you explain what the model would mean? In particular, the model that you intend (not what lme4 actually does) when you use (A+A:B|Subject) would have among-individual variation in the effects of A, no effects of B at the baseline value of A (assuming you are using the standard treatment contrasts), but variation in the effect of adding B to A. Another analogy would be to a model with fixed intercepts but variable slopes.

Best Answer

Related Solutions

Solved – Why is Poisson regression different with glmer and gamlss

Solved – Random slopes for interactions without random slopes for main effects

Related Question