Solved – GLMM validation: weird qq & fit vs residual plots

glmmmixed modelqq-plotrvalidation

I'm encountering problems with the results of a glmer model (lme4-package).
Im trying to answer the question, whether a beaver is more likely to be present (Status == 1) or absent (Status == 0) with changing geomorphic and vegetation variables. My model formula looks like this:

model1 <- glmer(Status ~ SlopecatCentered + Canal_width + Distance:Resource_biotopes + 
                         (1 | Location), family="binomial", data=Daten12, 
                control=glmerControl(optimizer="Nelder_Mead"))

My output looks OK, as far as I can tell, the only peculiar thing being the high estimates of slopecatCentered:

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) 
  ['glmerMod']
Family: binomial  ( logit )
Formula: Status ~ SlopecatCentered + Canal_width + Distance:Resource_biotopes + 
                  (1 | Location)
Data: Datentest
Control: glmerControl(optimizer = "Nelder_Mead")

AIC      BIC     logLik    deviance   df.resid 
62.7     77.4    -25.3     50.7       80 

Scaled residuals: 
  Min        1Q    Median        3Q       Max 
-0.095917 -0.003971  0.000000  0.002706  0.079395 

Random effects:
Groups   Name        Variance Std.Dev.
Location (Intercept) 3682     60.68   
Number of obs: 86, groups:  Location, 43

Fixed effects:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                 -18.5782     7.0847  -2.622 0.008734 ** 
SlopecatCentered             20.4162     5.6060   3.642 0.000271 ***
Canal_width                   0.4763     0.1584   3.007 0.002638 ** 
Distance1:Resource_biotopes   1.0442     0.4717   2.214 0.026861 *  
Distance2:Resource_biotopes   1.0379     0.4662   2.226 0.026010 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) SlpctC Cnl_wd Ds1:R_
SlopctCntrd -0.632                     
Canal_width -0.902  0.698              
Dstnc1:Rsr_ -0.663  0.560  0.458       
Dstnc2:Rsr_ -0.677  0.538  0.461  0.787    

My qqplot looks weird, though, and so does my residual vs. fitted plot:

qqnorm plot with sjp.glmer(model,...)

fitted vs. residual plot using plot(model)

edit: I just had a closer look on my data: The SlopecatCenteredvariable is not a perfect predictor, but my random factor Locationis causing this problem. In my raw data set, it denotes 43 different locations. One location has two distance in which most of the variables were measured, so my locationvariable has 43 * 2 = 86 entrys (in fact, that's the length of the data frame):

 >Daten12$Loc
[1] 1  1  2  2  3  3  4  4  5  5  6  6  7  7  8  8  9  9  10 10 11 11 12 12 13 13 14 14 15 15 16 16 17
[34] 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33
[67] 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43
43 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 ... 43

I changed that to 1-86 and ran a test model and the plot looked ok (I know that the random effect was futile in that test model, but I wanted to get to the root of the problem).

So apparantly, my raw data frame layout is wrong. But I got samples online to compare, and their layout looks similar, so I just don't know how to fix it.

Best Answer

Without looking at your data, I'm guessing that you have complete separation on your response. An estimate of 20 on the logit scale is effectively infinity, and translates to fitted probabilities of zero or one. You might want to double-check your SlopecatCentered variable to make sure it's not related to the response somehow.

Related Question