Solved – Binomial GLMM: Model validation & ceiling effect

binomial distributionglmmlme4-nlmemixed modelvalidation

My data has a binary response acc(correct/incorrect), one continuous predictor score, three categorical predictors (race, sex, emotion) and a random factor subj. All predictors are within-subject.

By selecting the random effects first and then the fixed effects, I ended up with this model:
M<-glmer(acc ~ race + sex + emotion + sex:emotion + race:emotion + score +(1+sex|subj), family=binomial, data=subset)

I need help on interpreting validation plots, figuring out if they show a "ceiling effect" in acc, and fix any problems that need to be fixed.


To validate the model I get the residuals and fitted values

 fitted<-predict(M,type="response")
 resid<-resid(M,type="pearson")

And plot the residuals against the categorical predictors

 plot(subset$race,resid)
 plot(subset$sex,resid)
 plot(subset$emotion,resid)

Those three plots show a slight pattern of more negative and dispersed residuals in "easy" conditions. The pattern looks slight to me (i may be wrong).

I plot the residuals against the continuous predictor

 plot(subset$score,resid)

enter image description here

This plot of residuals against the continuous predictor is worrying and shows a clear pattern of more negative and dispersed residuals when score increases (the task becomes easier).

 plot(fitted,resid) 

enter image description here

This plot is also worrying showing a clear pattern of more negative and dispersed residuals when the probability of a correct answer increases (either for y=0 or y=1, not sure which one).

Apparently these patterns may simply be coming from the log() in the link function.

I further tried to plot a regression line as shown in here: link.

enter image description here

Supposedly it should be straight.

Are these patterns strong enough to abandon the model? I would think that they are not, since the plots look very much like the ones from the links, except there is a general tendency to predict more "y=1" i gather.

I know there is a ceiling effect in my data, with some easy conditions having almost only correct responses (y=1). This is why I am being maybe overly skeptical about my model. Are these patterns a symptom of this?

Best Answer

This looks fairly reasonable to me; I don't think the inference based on this model is likely to be far off. However, to take a more positive attitude, any deviation in your residuals also implies a chance to improve the model (i.e., there is further information that could be modeled).

  • Does the full model show the same deviations? That is, even though the variables you've discarded were non-significant, they might help address the (slight) pattern in the residuals.
  • You might be able to improve the model fit by modifying the link function (or equivalently transforming the predictor variables/linear predictor). In How to assess the fit of a binomial GLMM fitted with lme4 (> 1.0)? , where a similar pattern of residuals is discussed, I show how to construct a power-logit family of link functions that can be used for testing goodness-of-link and/or improving the model. (Existing goodness-of-link tests such as Pregibon's test use linearization and score tests to evaluate goodness of fit in an efficient way by comparing the existing fit to a family of link functions; the procedure at the linked question does the same thing in a much more brute-force way.) You might also find similar families of alternate link functions provided in the glmx package.
Related Question