Interpret GLM without intercept

biostatisticsgeneralized linear modelhypothesis testingrregression

I have a question about the output of my glm model WITHOUT an intercept. I am comparing the number of infected leaves on plants in different months. In the case of a model WITH an intercept (using the default log link of the Poisson model), the (Intercept) should represent the log of the mean number of infected leaves in the reference month. The regression coefficients for the non-reference months are the differences in the log of the mean counts of each month from the reference month. I don't want to include a reference group because the output doesn't make much sense. So I removed the intercept from the model using -1.

Here is the model

dat_lambsburg$month <- factor(dat_lambsburg$month, 
    levels = c("May", "November", "June", "July", "August", 
               "September", "October"))

mod_9 <-
  glm(total_count ~  month - 1, family = quasipoisson, 
       data = dat_lambsburg)

summary(mod_9)

Call:
glm(formula = total_count ~ month - 1, family = quasipoisson, 
    data = dat_lambsburg)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-10.8743   -7.6599   -2.2361    0.8373   22.0828  

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
monthMay        -13.3026  4304.2345  -0.003  0.99755    
monthNovember   -13.3026  3043.5534  -0.004  0.99654    
monthJune         0.9163     2.0507   0.447  0.65802    
monthJuly         2.5649     0.8993   2.852  0.00755 ** 
monthAugust       4.5512     0.4711   9.661 5.23e-11 ***
monthSeptember    3.9195     0.4568   8.579 8.36e-10 ***
monthOctober      4.0797     0.4217   9.675 5.06e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasipoisson family taken to be 84.10988)

    Null deviance: 10704.9  on 39  degrees of freedom
Residual deviance:  2346.4  on 32  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 11

My question is how to interpret the models without intercept/reference group? The results overall makes sense. That is, significantly more disease from July to October. What does the estimate for May (-13.3026) and November refer to in my case (-13.3026 )? If estimate represents count, how can count be negative? To provide some context, no infected leaves were recorded in May and November and the highest were recorded in October. I have attached raw data figure.

Details about the experiment: I collected positive count data, specifically the number of infected leaves per plant, as part of my experiment. To treat the plants, I placed them in the field for a week using four different treatments. Afterward, I brought them back to a controlled environment, counted the number of infected leaves, and DISCARDED the plants. I repeated this process with fresh plants in the following week. This is not a time series data. Treatments were applied for a week at both locations, so the duration of each treatment was the same. Plot size, treatment duration and sample material were identical.

enter image description here

Best Answer

I'm not surprised your output didn't make much sense with the intercept included. By default, glm uses the alphabetically first category as the base case. In your case that is May, which has few counts and a correspondingly large standard error (see more about this below). That uncertainty will infect all of the contrast coefficients. Basically, for each month you are trying to compare its mean rate to the mean rate of May, which you don't know very well. If you wanted to use an intercept in your model, you should force the model to use the month with the largest count (October, it looks like) as the base case; then you will get reasonable coefficients.

Without the intercept, your coefficients represent a fixed effect for each month. Basically, it is giving you an estimate of the mean rate for each month, transformed by the model's link function. You didn't specify a link function explicitly, so the model used the default, which for the quasipoisson family is log. So, the coefficients are the logs of the monthly mean rates. Another way to get the monthly counts is to use the model's predict method with type = 'response', which will return the values directly.

Finally, for your last question, the large negative coefficients for May and November indicate that the estimate of the mean rate for those months is much less than 1 (i.e., its log is much less than zero). I'm guessing you had no counts at all in those months, which also looks to be the case in your graph. When you try to compute an average rate without any counts, all the model can really say is that the rate must be tiny. Formally it gives you a number, but the number isn't very meaningful, which is what the large standard errors for those terms are telling you.