Solved – What post-hoc test should be used for a glmer model with a binary response, and a continuous and categorical predictor

lme4-nlmelogisticmixed type datapost-hoc

I'm a bit of a newbie with stats and R, so need a bit of direction to find a suitable post-hoc test for my glmer model. I'm trying to find if presence is affected by environmental factors for each species. 24 surveys were completed at each site.

The model has a binary dependent variable (absent/present) and the predictor variables are interactive terms between a multiple continuous variables(eg temp and pH) and a categorical variable (species, n=3). The random effect is the survey site ID data was collected from. Only interactive terms, rather than the continuous factor in isolation, produce significant results when an anova is run on the model. Species by itself has a large effect because one species is much rarer than the others.

I'm trying to tease apart how the presence of these species varies across pH and between species.
I've tried lsmeans test with Tukey, and Firth's Bias-Reduced Logistic Regression, emmeans based on some other posts I read where people had similar questions. I ran the effects function on the interactive terms, so had a rough expectation of what a post hoc could show, but the results logistf (firth's) have produced I was not expecting. Emmeans and tukey both gave the same results and ignored the continuous variable I assume because it's not a factor.

When I run firth's regression it produces chi-squared and p values that are either infinity for chi values, some with infinite degrees of freedom, or the p values astronomically small, even though what I saw through effects suggested no significant difference. I can't tell with the interactive term if there truly is an effect of the environmental variable or if the significant effect is because of the difference in species.

If I wasn't clear enough about something please let me know and if anyone has any suggestions or advice, they would be massively appreciated. Thanks!

Glmer code, anova output and the logistf and effects output for pH are below. Based on effects, I was not expecting significant difference but there was one in logistif.

 ###glmer model
> Large<-glmer(Abs.Pres~ Species:Q.Depth+Species:Conductivity+Species:Temp+Species:pH+Species:DO.P+(1|QID),
+              nAGQ=0,
+              family=binomial,
+              data=Stacked_Pref)
####Anova output
> anova(Large)
Analysis of Variance Table
                     npar  Sum Sq Mean Sq F value
Species:Q.Depth         3 234.904  78.301 78.3014
Species:Conductivity    3  32.991  10.997 10.9970
Species:Temp            3  39.001  13.000 13.0004
Species:pH              3  25.369   8.456  8.4562
Species:DO.P            3  34.930  11.643 11.6434

####logistf run on pH
> Lp<-logistf(Abs.Pres~Species:pH, data=Stacked_Pref, contrasts.arg=list(pH="contr.treatment", Species="contr.sum"))
> Lp
logistf(formula = Abs.Pres ~ Species:pH, data = Stacked_Pref, 
    contrasts.arg = list(pH = "contr.treatment", Species = "contr.sum"))
Model fitted by Penalized ML
Confidence intervals and p-values by Profile Likelihood 

                         coef   se(coef) lower 0.95 upper 0.95    Chisq            p
(Intercept)         1.9711411 0.57309880  0.8552342  3.1015114 12.09107 5.066380e-04
SpeciesGoby:pH     -0.3393185 0.07146049 -0.4804047 -0.2003108 23.31954 1.371993e-06
SpeciesMosquito:pH -0.3001385 0.07127771 -0.4408186 -0.1614419 18.24981 1.937453e-05
SpeciesRFBE:pH     -0.4771393 0.07232469 -0.6200179 -0.3365343 45.73750 1.352096e-11

Likelihood ratio test=267.0212 on 3 df, p=0, n=3945

###effect function output on pH
> SpE<-effect("Species:pH", Large)
> summary(SpE)

 Species*pH effect
          pH
Species             7        7.7        8.5       9.3         10
  Goby     0.22239538 0.23898961 0.25896972 0.2800056 0.29924424
  Mosquito 0.36689425 0.34004541 0.31057990 0.2825744 0.25936811
  RFBE     0.09393222 0.09413637 0.09437017 0.0946045 0.09480996

 Lower 95 Percent Confidence Limits
          pH
Species             7        7.7       8.5        9.3         10
  Goby     0.13722030 0.16103685 0.1753282 0.17341519 0.16408392
  Mosquito 0.24476920 0.23994376 0.2148559 0.17474573 0.13820850
  RFBE     0.05387189 0.05905686 0.0588516 0.05251263 0.04504883

 Upper 95 Percent Confidence Limits
          pH
Species            7      7.7       8.5       9.3        10
  Goby     0.3396283 0.339411 0.3648593 0.4189090 0.4815962
  Mosquito 0.5088941 0.456809 0.4258216 0.4228475 0.4333341
  RFBE     0.1587829 0.146802 0.1479552 0.1645751 0.1886773
```

Best Answer

My suggestion is to do something like this:

library(emmeans)

emt <- emtrends(Large, "Species", var = "pH")
emt          # estimated slopes for each species
pairs(emt)   # pairwise comparisons of slopes

... and similar with var = "Q.Depth", var = "Conductivity", etc. Note the slopes are on the logit scale, i.e., (change in logit(p)) / (change in pH).

You can visualize these trends using

emmip(Large, Species ~ pH, cov.reduce = range)

or

emmip(Large, Species ~ pH, type = "response",
    at = list(pH = c(... several pH values ...))

The latter would show the trends on the probability scale (they will be curves), while the former shows the trends on the logit scale (which are straight lines).

Caution

However, I seriously question the appropriateness of the model Large. It includes only the terms Species:Q.Depth, Species:Conductivity, etc., without main effects. This model forces all the fitted lines to go through the origin, which in this context means that your estimated probability is 0.5 when Q.Dept, Conductivity, etc. are all equal to zero. This seems highly unrealistic. It is almost always a mistake to leave the intercept and main effects out of a model. I suggest you fit the model with this formula:

glmer(Abs.Pres ~ Species * (Q.Depth + Conductivity + Temp + pH +
                            DO.P) + (1 | QID), ...)

(I am also a big fan of whitespace in one's code...). I have every reason to believe that your estimates and tests will be radically different.