Solved – Negative binomial GLM with 2 factor variables: adding interaction completely changes effect of factor levels

categorical datageneralized linear modelinteractioninterpretationr

I am analysing some marine wildlife monitoring data from an offshore construction site. The response data are counts of animals (corrected for detection and survey effort), and the model has 2 covariates, both of which are factors:

  1. Season (with 4 levels related to the animals' seasonal migrations),
  2. Period of Construction (with 3 levels: 'Before', 'During' and 'After' construction).

I am using a negative binomial GLM model structure because the Poisson was overdispersed. I am working in R, using the glm.nb function in the MASS package.

When I model the count data as a function of the 2 factor variables, but without any interactions between the two factors, the model indicates that there was a significantly negative impact on animal abundance 'During' construction (i.e. the coefficient estimate for animal counts was significantly lower 'During' construction when compared to 'Before' construction, which is the base level for the 'Period of Construction' factor variable).

However, when I include an interaction between 'Season' and 'Construction Period', the coefficient estimate for 'During' construction changes to be positive (although non-significant). I know that by including interactions in the model, I am changing the model structure and I would expect some changes to coefficient estimates; however I am surprised by the magnitude of the change that occurs by adding the interaction between 'Season' and 'Period of Construction'.

I am obviously keen to make sure I haven't misunderstood/made some mistake! Below I have copied my Model summary tables and the Anova table for the model containing interactions (to show the covariate main effects).

P.S. The data are not balanced (i.e. not every pairwise combination of 'Season' and 'Construction Period' were sampled). I know that this should preclude assessing interactions, but I have been instructed to assess them anyway! I am wondering whether this could be causing the unexpected results??

  • NO INTERACTION MODEL SUMMARY TABLE:

    Coefficients:
                                  Estimate Std. Error z value Pr(>|z|)    
    (Intercept)                    1.78284    0.06203  28.741  < 2e-16 
    as.factor(Season)Migration     0.15741    0.05078   3.100 0.001935  
    as.factor(Season)Pre-Breeding  0.78840    0.07332  10.753  < 2e-16 
    as.factor(Season)Winter        0.57884    0.13065   4.430 9.41e-06 
    as.factor(Period)During       -0.37198    0.07126  -5.220 1.79e-07 
    as.factor(Period)After         0.19159    0.05621   3.409 0.000653 
    
  • INTERACTION MODEL SUMMARY TABLE:

    Coefficients: (3 not defined because of singularities)
                                               Estimate Std. Error z value Pr(>|z|)    
       (Intercept)                               1.3648     0.1255  10.878  < 2e-16 
       as.factor(Season)Migration                0.4515     0.1373   3.288  0.00101 
       as.factor(Season)Pre-Breeding             1.3620     0.1354  10.058  < 2e-16 
       as.factor(Season)Winter                   0.9969     0.1683   5.924 3.15e-09 
       as.factor(Period)During                   0.3294     0.1691   1.948  0.05139  
       as.factor(Period)After                    0.6119     0.1332   4.593 4.36e-06 
       as.factor(Season)Migration:
                as.factor(Period)During           -0.1849     0.2066  -0.895  0.37079    
       as.factor(Season)Pre-Breeding:
                as.factor(Period)During           -1.5185     0.2004  -7.578 3.50e-14 
       as.factor(Season)Winter:
                as.factor(Period)During             NA         NA      NA       NA    
       as.factor(Season)Migration:
                as.factor(Period)After            -0.2978     0.1488  -2.001  0.04535   
       as.factor(Season)Pre-Breeding:
                as.factor(Period)After              NA         NA      NA       NA    
       as.factor(Season)Winter:
                as.factor(Period)After              NA         NA      NA       NA    
    
  • ANOVA TABLE FOR MODEL WITH INTERACTION:

    Analysis of Deviance Table (Type II tests)
    
     Response: RU_density
                                               LR   Chisq Df  Pr(>Chisq)    
       as.factor(Season)                    121.927    3     < 2.2e-16
       as.factor(Period)                     64.713    2     8.865e-15
       as.factor(Season):as.factor(Period)   90.975    3     < 2.2e-16    
    

Best Answer

With the interaction, the main effect of "during" (or anything else, but this is the variable that you asked about) is only accurate for the case where the other variable in the interaction is 0: That is, where "season" is not "winter", "pre-breeding" or "migration")

Note that the parameter estimates for the interaction terms are negative (thus canceling some of the positive main effect) and one of them is large.

I would also be concerned about the singularities ...

Related Question