Solved – What post-hoc test should be used for a glmer model with a binary response, and a continuous and categorical predictor

lme4-nlmelogisticmixed type datapost-hoc

I'm a bit of a newbie with stats and R, so need a bit of direction to find a suitable post-hoc test for my glmer model. I'm trying to find if presence is affected by environmental factors for each species. 24 surveys were completed at each site.

The model has a binary dependent variable (absent/present) and the predictor variables are interactive terms between a multiple continuous variables(eg temp and pH) and a categorical variable (species, n=3). The random effect is the survey site ID data was collected from. Only interactive terms, rather than the continuous factor in isolation, produce significant results when an anova is run on the model. Species by itself has a large effect because one species is much rarer than the others.

I'm trying to tease apart how the presence of these species varies across pH and between species.
I've tried lsmeans test with Tukey, and Firth's Bias-Reduced Logistic Regression, emmeans based on some other posts I read where people had similar questions. I ran the effects function on the interactive terms, so had a rough expectation of what a post hoc could show, but the results logistf (firth's) have produced I was not expecting. Emmeans and tukey both gave the same results and ignored the continuous variable I assume because it's not a factor.

When I run firth's regression it produces chi-squared and p values that are either infinity for chi values, some with infinite degrees of freedom, or the p values astronomically small, even though what I saw through effects suggested no significant difference. I can't tell with the interactive term if there truly is an effect of the environmental variable or if the significant effect is because of the difference in species.

If I wasn't clear enough about something please let me know and if anyone has any suggestions or advice, they would be massively appreciated. Thanks!

Glmer code, anova output and the logistf and effects output for pH are below. Based on effects, I was not expecting significant difference but there was one in logistif.

 ###glmer model
> Large<-glmer(Abs.Pres~ Species:Q.Depth+Species:Conductivity+Species:Temp+Species:pH+Species:DO.P+(1|QID),
+              nAGQ=0,
+              family=binomial,
+              data=Stacked_Pref)
####Anova output
> anova(Large)
Analysis of Variance Table
                     npar  Sum Sq Mean Sq F value
Species:Q.Depth         3 234.904  78.301 78.3014
Species:Conductivity    3  32.991  10.997 10.9970
Species:Temp            3  39.001  13.000 13.0004
Species:pH              3  25.369   8.456  8.4562
Species:DO.P            3  34.930  11.643 11.6434

####logistf run on pH
> Lp<-logistf(Abs.Pres~Species:pH, data=Stacked_Pref, contrasts.arg=list(pH="contr.treatment", Species="contr.sum"))
> Lp
logistf(formula = Abs.Pres ~ Species:pH, data = Stacked_Pref, 
    contrasts.arg = list(pH = "contr.treatment", Species = "contr.sum"))
Model fitted by Penalized ML
Confidence intervals and p-values by Profile Likelihood 

                         coef   se(coef) lower 0.95 upper 0.95    Chisq            p
(Intercept)         1.9711411 0.57309880  0.8552342  3.1015114 12.09107 5.066380e-04
SpeciesGoby:pH     -0.3393185 0.07146049 -0.4804047 -0.2003108 23.31954 1.371993e-06
SpeciesMosquito:pH -0.3001385 0.07127771 -0.4408186 -0.1614419 18.24981 1.937453e-05
SpeciesRFBE:pH     -0.4771393 0.07232469 -0.6200179 -0.3365343 45.73750 1.352096e-11

Likelihood ratio test=267.0212 on 3 df, p=0, n=3945

###effect function output on pH
> SpE<-effect("Species:pH", Large)
> summary(SpE)

 Species*pH effect
          pH
Species             7        7.7        8.5       9.3         10
  Goby     0.22239538 0.23898961 0.25896972 0.2800056 0.29924424
  Mosquito 0.36689425 0.34004541 0.31057990 0.2825744 0.25936811
  RFBE     0.09393222 0.09413637 0.09437017 0.0946045 0.09480996

 Lower 95 Percent Confidence Limits
          pH
Species             7        7.7       8.5        9.3         10
  Goby     0.13722030 0.16103685 0.1753282 0.17341519 0.16408392
  Mosquito 0.24476920 0.23994376 0.2148559 0.17474573 0.13820850
  RFBE     0.05387189 0.05905686 0.0588516 0.05251263 0.04504883

 Upper 95 Percent Confidence Limits
          pH
Species            7      7.7       8.5       9.3        10
  Goby     0.3396283 0.339411 0.3648593 0.4189090 0.4815962
  Mosquito 0.5088941 0.456809 0.4258216 0.4228475 0.4333341
  RFBE     0.1587829 0.146802 0.1479552 0.1645751 0.1886773
```

Best Answer

My suggestion is to do something like this:

library(emmeans)

emt <- emtrends(Large, "Species", var = "pH")
emt          # estimated slopes for each species
pairs(emt)   # pairwise comparisons of slopes

... and similar with var = "Q.Depth", var = "Conductivity", etc. Note the slopes are on the logit scale, i.e., (change in logit(p)) / (change in pH).

You can visualize these trends using

emmip(Large, Species ~ pH, cov.reduce = range)

emmip(Large, Species ~ pH, type = "response",
    at = list(pH = c(... several pH values ...))

The latter would show the trends on the probability scale (they will be curves), while the former shows the trends on the logit scale (which are straight lines).

Caution

However, I seriously question the appropriateness of the model Large. It includes only the terms Species:Q.Depth, Species:Conductivity, etc., without main effects. This model forces all the fitted lines to go through the origin, which in this context means that your estimated probability is 0.5 when Q.Dept, Conductivity, etc. are all equal to zero. This seems highly unrealistic. It is almost always a mistake to leave the intercept and main effects out of a model. I suggest you fit the model with this formula:

glmer(Abs.Pres ~ Species * (Q.Depth + Conductivity + Temp + pH +
                            DO.P) + (1 | QID), ...)

(I am also a big fan of whitespace in one's code...). I have every reason to believe that your estimates and tests will be radically different.

Related Solutions

Solved – post-hoc test after logistic regression with interaction. Risk higher for type 1 or type 2 error

Impossible to know because you don't know the base rate at which H0 is actually true or false. That being said...

You are already 'protected' against type I error when you restrict yourself to doing posthoc analyses only on elements of models that were statistically significant overall. In addition, posthoc correction procedures tend towards draconian criteria for evidence. Therfore, I'd tend to think that you'd be drifting towards increasing your risk of notable Type II error. So, I'd suggest that (if your peers will allow it) you look at the magnitude of the effects from your overall analysis and not fiddle around with posthocs.

Binomial Distribution – Post Hoc Test of Interaction Factor in Binomial GLMM with Proportions

You might try the lsmeans package, as it makes some of this stuff easier and clearer. To get the results on the back-transformed scale (include option type="response"), do

require(lsmeans)
lsm <- lsmeans(binomial.glmm, ~ factor1 * factor2)
summary(lsm, type = "response")

To see the results graphically, do

plot(lsm, by = "factor2", intervals = TRUE, type = "response")

or, for an interaction-plot style,

lsmip(lsm, factor1 ~ factor2, type = "response")

In a mixed model like this, it is often the case that the SEs of these least-squares means (AKA predictions) will be much larger than those of some or all of the pairwise differences, because the between-subjects variations cancel out in those comparisons. So the displayed CIs can be very misleading for comparing the predictions (and you shouldn't use CIs to do comparisons in any case).

To get the Tukey-adjusted comparisons, do

summary(pairs(lsm), type = "response")

(This actually computes the differences on the logit scale, then back-transforms, so that the results are odds ratios. If you want differences of proportions instead, do pairs(regrid(lsm)) instead.)

Best Answer

Caution

Related Solutions

Solved – post-hoc test after logistic regression with interaction. Risk higher for type 1 or type 2 error

Binomial Distribution – Post Hoc Test of Interaction Factor in Binomial GLMM with Proportions

Related Question