Solved – Post Hoc Test of interaction factor in binomial glmm with proportions

binomial distributionglmmpost-hoc

I've performed a binomial glmm, because my data are proportions of a species in a sample of +/- 100 Individuals.
I test the interaction of two factors and use car::Anova to get the p-values.
My random factor is the ID of the field (subject) that was sampled. I sampled 12 fields from 4 different classes (factor1), therefore i think the levels of the random factor is 12. The reason I use the random factor is that i want to correct for repeated measurements. Each field was measured only two times. These two timepoints are the levels of factor 2.

my model:

binomial.glmm <- glmer(cbind(species1, not species1) ~ factor1*factor2 + (1|field), family=binomial(link="logit"), data)

On the one side, I'm using the glht() function from the multcomp package to perform a post-Hoc Tukey test with bonferroni adjustment (all pairwise comparisons).
And on the other side, I addtionally plot fitted values with confidence intervals.

My problem ist, that
A: the a,b,c,… letters from the post hoc test do not make sense in my opinion.
And B: The confidence intervals are incredibly large.

Now I'm thinking that the umber of real replicates (3) in each class is too small? Could that be the reason? Am i siply unable to test for the interaction effect?

I get the confidence intervals with the following R code:

testdata = expand.grid(factor1=unique(data$factor1),
                       factor2 = unique(data$factor2))


X <- model.matrix(~ factor1*factor2, data = testdata)
testdata$fit <- X %*% fixef(binomial.glmm)
testdata$SE <- sqrt(  diag(X %*%vcov(binomial.glmm) %*% t(X))  )
testdata$upr=testdata$fit+1.96*testdata$SE
testdata$lwr=testdata$fit-1.96*testdata$SE

Then i plot the fit, upr and lwr, after calculating the exp(), using ggplot:

ggplot(testdata, aes(x = factor1, y = exp(fit))) + 
          #geom_bar(stat="identity",position = position_dodge(1), col="454545", size=0.15, fill="grey") +
          geom_point(aes(x=as.numeric(factor1)+0.3),pch=23, bg="aquamarine2") + 
          geom_errorbar(aes(x=as.numeric(factor1)+0.3, ymin = exp(lwr), ymax = exp(upr)),position = position_dodge(1),col="black",width=0.15, size=0.15) + 
          geom_boxplot(aes(y=response), data=data) +
          facet_grid(.~factor2) +
          geom_hline(xintercept = 1, size=0.15) +
          ylab("Species1?") +
          xlab("Factor1") +
          scale_x_discrete(labels=c("A", "B", "C", "D")) 

Here are the plots:

Results from the post Hoc tukey Test, unfortuneately i wasn't able to make a pretty plot. The letters can't be true!!??

Predicitions plots. The blue points represent the fit with confidence interval, The boxplots are obtained from the real data

Results from the post Hoc tukey Test, unfortuneately i wasn't able to make a pretty plot. The letters can't be true!!??

Predicitions plots. The blue points represent the fit with confidence interval, The boxplots are obtained from the real data

Interestingly, the prediction plots get much better, when i use a lmm with arcsine transformed fractions of Species1 as response. However everybody argues that this would not be gould practice,…

The residual plots to validate the models are a bit misbehaved, although i included the random term (1|ID) with ID = 1:nrow(data).
The problem is, that if this does not improve the model, nothing could improve the model and at least there is only one single package (nparLD) that is able to perform non-parametric test with interactions with data that have repeated measurements. And i I'm afraid that procedure would become too difficult to explain in a matherial and methods section.

Best Answer

You might try the lsmeans package, as it makes some of this stuff easier and clearer. To get the results on the back-transformed scale (include option type="response"), do

require(lsmeans)
lsm <- lsmeans(binomial.glmm, ~ factor1 * factor2)
summary(lsm, type = "response")

To see the results graphically, do

plot(lsm, by = "factor2", intervals = TRUE, type = "response")

or, for an interaction-plot style,

lsmip(lsm, factor1 ~ factor2, type = "response")

In a mixed model like this, it is often the case that the SEs of these least-squares means (AKA predictions) will be much larger than those of some or all of the pairwise differences, because the between-subjects variations cancel out in those comparisons. So the displayed CIs can be very misleading for comparing the predictions (and you shouldn't use CIs to do comparisons in any case).

To get the Tukey-adjusted comparisons, do

summary(pairs(lsm), type = "response")

(This actually computes the differences on the logit scale, then back-transforms, so that the results are odds ratios. If you want differences of proportions instead, do pairs(regrid(lsm)) instead.)

Related Question