Solved – GLM with a Poisson distribution, how to conduct a post-hoc test

biostatisticsgeneralized linear modelmodeling

I have a dataset in which I compare the total number (the summed number of birds over a couple of weeks) of birds across different distances (continuous factor). I hypothesize that the numbers of birds is the highest at the first distance and declines. The model I used was a GLM.

I now have a significant effect of the Distance, which is what I was expecting. But what can I use to see which of the distances differ from each other? I've tried this but that didn't work.

post1 <- glht(m5, family = poisson())
summary(m5, test = adjusted())

The emmeans package does not work.

This question can be deleted, the answer did not work and I did not conduct a post-hoc.

Best Answer

You want the emmeans package. It is remarkably broad and effective. Below, I simulate data that might look like yours. Then I calculate all pairwise comparisons and then just the contrast of interest that you are interested in.

# generate fake data that might look like what you want ------------------------
set.seed(1839) # for replicability
n <- 400 # sample size
Distance <- factor(sample(1:5, n, TRUE)) # make distance from 1 to 5 and factor
# make rain and method covariates that are unrelated to outcome: 
Rain <- rbeta(400, 4, 2)
Method <- rbinom(n, 1, .4)
# calculate number that distance 1 is higher than the other 4:
Number <- rpois(n, ifelse(Distance == "1", 4, 2))

# fit model --------------------------------------------------------------------
model <- glm(Number ~ Distance + Rain + Method, family = "poisson")

# use emmeans ------------------------------------------------------------------
library(emmeans)
# all post-hoc comparisons:
pairs(emmeans(model, ~ Distance))

# just the contrast you wanted:
contrast(emmeans(model, ~ Distance), method = "trt.vs.ctrl", ref = 1)
# see ?contrast-methods for more information on types of methods of contrasts
# as well as how to specify the reference (that is, it should be the level value
# and not the label)

There are tons of great vignettes to help learn emmeans, e.g.:

Comparisons and contrasts https://cran.r-project.org/web/packages/emmeans/vignettes/comparisons.html
Models supported by emmeans https://cran.r-project.org/web/packages/emmeans/vignettes/models.html
Sophisticated models in emmeans https://cran.r-project.org/web/packages/emmeans/vignettes/sophisticated.html
Basics https://cran.r-project.org/web/packages/emmeans/vignettes/basics.html
Interactions https://cran.r-project.org/web/packages/emmeans/vignettes/interactions.html

Related Solutions

Zero-Inflated Poisson Regression – When to Use Zero-Inflated Poisson Regression and Negative Binomial Distribution

I suspect that your problem may be that the default behavior of predict.glm isn't what you think it is.

Specifically, predict used on a glm object will by default gives a response on the scale of the linear predictors, not the response.

This is quite clearly stated in the help (?predict.glm) but seems to trip people up very often (suggesting the default ought to be changed, perhaps; you might like to raise it on the relevant mailing list).

To get the values you want, try predict(model1,type="response")

Solved – Post-hoc test after a GLM with binary data

The problem you haveg is that are abusing (misusing) the formula notation. You never use a formula like this:

Periosna$Periostitis ~ Periosna$Cemetery

because, as you've found out it, it totally breaks the logic that you want to then use for subsequent operations.

You asked mcp() to set up Tukey contrasts for the covariate Cemetery, yet you fitted a model with covariate name Periosna$Cemetery, hence it quite rightly told you that a variable of the stated name was not involved in the model fit.

Instead, the call should have been

glm(Periostitis ~ Cemetery, data = Periosna, family = binomial)

noting that we use the variable names directly, as they exist in Periosna, and we tell R where to locate the variables in the formula via the data argument.

Now when you call glht() it should be able to find the Cemetery variable.

Best Answer

Related Solutions

Zero-Inflated Poisson Regression – When to Use Zero-Inflated Poisson Regression and Negative Binomial Distribution

Solved – Post-hoc test after a GLM with binary data

Related Question