Solved – Interpret negative binomial coefficients and p-value

negative-binomial-distributionp-valueregression coefficients

I fit a negative binomial model for my dataset because of over-dispersion in the poisson model.

DOWFACAT and SSNVOLCLS are categorical. All other variables are numeric.

MDHabitat, MigCorr,WintRg, SumRg, Fence05, Fence16 and Cross05 are numeric with either a 0 or 1 to indicate absence or presence of that variable.

PostFFall is the count

AADTPostFal is the average annual daily traffic/1000

SpeedLmt is numeric and ranges from 0-75

Using the summary function I get these results:

glm.nb(formula = PostFFall ~ AADTPostFal + SpeedLmt + SumRg + 
    WintRg + MigCorr + MDHabitat + DOWFACAT + SSNVOLCLS + Fence05 + 
    Cross05 + Fence16, data = roads1, init.theta = 0.4313904984, 
    link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.38236  -0.05334   0.00000   0.00000   3.04227  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -3.523e+01  5.275e+06   0.000   1.0000    
AADTPostFal          -2.638e-01  2.944e-01  -0.896   0.3703    
SpeedLmt             -2.193e-02  1.226e-02  -1.789   0.0737 .  
SumRg                 4.842e-01  6.739e-01   0.719   0.4724    
WintRg                4.410e-01  6.148e-01   0.717   0.4732    
MigCorr               1.753e+00  6.825e-01   2.568   0.0102 *  
MDHabitat             3.225e+01  1.677e+06   0.000   1.0000    
DOWFACATSuburban     -3.389e-01  9.588e-01  -0.353   0.7238    
DOWFACATTransition    1.120e+00  1.192e+00   0.939   0.3475    
DOWFACATUrban         8.394e-01  1.326e+07   0.000   1.0000    
SSNVOLCLS1-LowVolume -2.809e+00  5.535e+06   0.000   1.0000    
SSNVOLCLS2-MidVolume  2.914e+00  5.535e+06   0.000   1.0000    
Fence05              -5.679e-01  1.190e+00  -0.477   0.6331    
Cross05              -3.945e+01  3.460e+07   0.000   1.0000    
Fence16               4.445e+00  1.066e+00   4.169 3.05e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.4314) family taken to be 1)

    Null deviance: 382.555  on 1641  degrees of freedom
Residual deviance:  82.308  on 1627  degrees of freedom
AIC: 243.56

Number of Fisher Scoring iterations: 1


              Theta:  0.431 
          Std. Err.:  0.143 

 2 x log-likelihood:  -211.560 

And then when I run an ANOVA test, most of the p-values are significant:

Analysis of Deviance Table

Model: Negative Binomial(0.4314), link: log

Response: PostFFall

Terms added sequentially (first to last)


            Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                         1641     382.55              
AADTPostFal  1   25.899      1640     356.66 3.598e-07 ***
SpeedLmt     1    0.793      1639     355.86  0.373231    
SumRg        1  109.769      1638     246.09 < 2.2e-16 ***
WintRg       1    8.415      1637     237.68  0.003721 ** 
MigCorr      1   29.217      1636     208.46 6.470e-08 ***
MDHabitat    1   79.120      1635     129.34 < 2.2e-16 ***
DOWFACAT     3    1.575      1632     127.77  0.664987    
SSNVOLCLS    2   27.064      1630     100.70 1.328e-06 ***
Fence05      1    0.740      1629      99.96  0.389572    
Cross05      1    1.947      1628      98.02  0.162953    
Fence16      1   15.708      1627      82.31 7.392e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Do I use the anova test to decide whether the model variables are significant? Also what does a negative coefficient intercept indicate? Also, my Dispersion parameter for Negative Binomial is (0.4314), does this mean my model is now under-dispersed?

Best Answer

It seems you are asking about the discrepancy between the table of coefficients from the summary() function and the analysis of variance table from anova(). In particular, you are wondering why the anova table gives small p-values to variables that have p-value close to 1 in the table of coefficients (e.g. variable sumRg).

The explanation may be that the table of coefficients tells you whether a variable or level of a factor is different from the reference baseline (i.e. from the intercept as explained by Isabella Ghement's answer) while the anova table tells you whether that variable contributes significantly to the quality of the fit.

I suspect that you are more interested in the output of the anova table since you want to select relevant variables. Note, however, that the anova function estimates the contribution of each variable by adding them sequentially in the same order as you listed them in the model. Usually, one wants to estimate the contribution of a variable independently from the order they enter into the model. For this you could use the Anova function in the car package.

As a comment, it looks some variables like MDHabitat have huge standard error relative to the estimated effect (e.g. est= 3.225e+01, se= 1.677e+06). I wonder whether it indicates some weird characteristic of the data that could/should be addressed.