I'm encountering problems with the results of a glmer
model (lme4
-package).
Im trying to answer the question, whether a beaver is more likely to be present (Status == 1
) or absent (Status == 0
) with changing geomorphic and vegetation variables. My model formula looks like this:
model1 <- glmer(Status ~ SlopecatCentered + Canal_width + Distance:Resource_biotopes +
(1 | Location), family="binomial", data=Daten12,
control=glmerControl(optimizer="Nelder_Mead"))
My output looks OK, as far as I can tell, the only peculiar thing being the high estimates of slopecatCentered
:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation)
['glmerMod']
Family: binomial ( logit )
Formula: Status ~ SlopecatCentered + Canal_width + Distance:Resource_biotopes +
(1 | Location)
Data: Datentest
Control: glmerControl(optimizer = "Nelder_Mead")
AIC BIC logLik deviance df.resid
62.7 77.4 -25.3 50.7 80
Scaled residuals:
Min 1Q Median 3Q Max
-0.095917 -0.003971 0.000000 0.002706 0.079395
Random effects:
Groups Name Variance Std.Dev.
Location (Intercept) 3682 60.68
Number of obs: 86, groups: Location, 43
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.5782 7.0847 -2.622 0.008734 **
SlopecatCentered 20.4162 5.6060 3.642 0.000271 ***
Canal_width 0.4763 0.1584 3.007 0.002638 **
Distance1:Resource_biotopes 1.0442 0.4717 2.214 0.026861 *
Distance2:Resource_biotopes 1.0379 0.4662 2.226 0.026010 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) SlpctC Cnl_wd Ds1:R_
SlopctCntrd -0.632
Canal_width -0.902 0.698
Dstnc1:Rsr_ -0.663 0.560 0.458
Dstnc2:Rsr_ -0.677 0.538 0.461 0.787
My qqplot looks weird, though, and so does my residual vs. fitted plot:
edit: I just had a closer look on my data: The SlopecatCentered
variable is not a perfect predictor, but my random factor Location
is causing this problem. In my raw data set, it denotes 43 different locations. One location has two distance
in which most of the variables were measured, so my location
variable has 43 * 2 = 86 entrys (in fact, that's the length of the data frame):
>Daten12$Loc
[1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17
[34] 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33
[67] 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43
43 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 ... 43
I changed that to 1-86 and ran a test model and the plot looked ok (I know that the random effect was futile in that test model, but I wanted to get to the root of the problem).
So apparantly, my raw data frame layout is wrong. But I got samples online to compare, and their layout looks similar, so I just don't know how to fix it.
Best Answer
Without looking at your data, I'm guessing that you have complete separation on your response. An estimate of 20 on the logit scale is effectively infinity, and translates to fitted probabilities of zero or one. You might want to double-check your
SlopecatCentered
variable to make sure it's not related to the response somehow.