Solved – Assessing fit of binomial glmer() in R with only categorical predictors

binomial distributionlme4-nlmeresampling

I am trying to validate a mixed effects logit regression model with a categorical dependent variable and categorical predictor variables – I have nothing that is continuous. One of my predictor variables is binary, and the other has three possible values (not ordinal). My formula is something like this:

glmer(Response~Handedness+Color+(1|Subject)+(1|Item)+(1|NumResponses),data=r1,family="binomial")

Where Response is something like "yes/no", Handedness is "left/right" and Color is "green/blue/red". I have something like 2000 responses and 500 subjects, though some subjects gave more responses than others, and there is not an even split between lefties and righties (more righties than lefties), and there are more green/red items than blue ones (this is not what the actual data is about, but the point is we weren't able to sample from the equivalent of Handedness and color evenly). Although I have the possibility of using some other predictors and/or random effects, I've already gone through the process of comparing possible models (using AIC/BIC values) by dropping variables and this is the best one. Results look something like this:

AIC  BIC logLik deviance
1733 1755 -828.9     1702
Random effects:
Groups     Name        Variance Std.Dev.
Subject         (Intercept) 2.1680   1.47243 
NumResponses (Intercept) 0.0000   0.00000 
Item    (Intercept) 0.4183   0.64676 
Number of obs: 1708, groups: Subject, 560; NumResponses, 48; Item, 48

Fixed effects:
          Estimate Std. Error z value Pr(>|z|)    
          (Intercept)               1.2097     0.2310   5.238 1.63e-07 ***
ColorGreen  -0.2254     0.3271  -0.689    0.491    
ColorRed  -1.2007     0.2285  -5.254 1.49e-07 ***
HandednessLeft          1.2248     0.2189   5.595 2.20e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

I'm very new to these sorts of models (I'm sure this is glaringly obvious), but if I'm interpreting it correctly and it's not a garbage model, lefties have increased odds of providing a yes relative to righties, and red decreases the odds of providing a yes relative to blue. So the model tells me there is something in my data, but how do I tell if it's even a good model? As far as I understand it, the AIC and BIC tell me it's a better model than some other ones I've tried, but it could still be a horrible fit. I can't figure out how to do much with plotting diagnostics because all of the variables are categorical (although the proportion of yes/no responses for these groups clearly agrees with the results of the model). From here (http://www.ats.ucla.edu/stat/r/dae/melogit.htm), it seems like I should do some sort of bootstrap, but for all categorical variables this seems overly complex to me (or perhaps I'm just recoiling at the idea of implementing it). Is this the best way or is there another approach?

Best Answer

This is a pretty broad question.

  • Most simply, you could assess the accuracy of predictions; make predictions on the probability (type="response") scale, dichotomize by rounding up or down to 0 or 1, and cross-tabulate the 2x2 table ("predicted yes","predicted no" $\times$ "observed yes", "observed no"). You can calculate overall accuracy, or sensitivity and specificity if you like, and figure out how those measures vary across categories.
  • If you allow the cut-point to vary from 50%, then you're looking at ROC/AUC measures.