I fit a negative binomial model for my dataset because of over-dispersion in the poisson model.
DOWFACAT and SSNVOLCLS
are categorical. All other variables are numeric.
MDHabitat, MigCorr,WintRg, SumRg, Fence05, Fence16 and Cross05
are numeric with either a 0 or 1 to indicate absence or presence of that variable.
PostFFall
is the count
AADTPostFal
is the average annual daily traffic/1000
SpeedLmt
is numeric and ranges from 0-75
Using the summary function I get these results:
glm.nb(formula = PostFFall ~ AADTPostFal + SpeedLmt + SumRg +
WintRg + MigCorr + MDHabitat + DOWFACAT + SSNVOLCLS + Fence05 +
Cross05 + Fence16, data = roads1, init.theta = 0.4313904984,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.38236 -0.05334 0.00000 0.00000 3.04227
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.523e+01 5.275e+06 0.000 1.0000
AADTPostFal -2.638e-01 2.944e-01 -0.896 0.3703
SpeedLmt -2.193e-02 1.226e-02 -1.789 0.0737 .
SumRg 4.842e-01 6.739e-01 0.719 0.4724
WintRg 4.410e-01 6.148e-01 0.717 0.4732
MigCorr 1.753e+00 6.825e-01 2.568 0.0102 *
MDHabitat 3.225e+01 1.677e+06 0.000 1.0000
DOWFACATSuburban -3.389e-01 9.588e-01 -0.353 0.7238
DOWFACATTransition 1.120e+00 1.192e+00 0.939 0.3475
DOWFACATUrban 8.394e-01 1.326e+07 0.000 1.0000
SSNVOLCLS1-LowVolume -2.809e+00 5.535e+06 0.000 1.0000
SSNVOLCLS2-MidVolume 2.914e+00 5.535e+06 0.000 1.0000
Fence05 -5.679e-01 1.190e+00 -0.477 0.6331
Cross05 -3.945e+01 3.460e+07 0.000 1.0000
Fence16 4.445e+00 1.066e+00 4.169 3.05e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.4314) family taken to be 1)
Null deviance: 382.555 on 1641 degrees of freedom
Residual deviance: 82.308 on 1627 degrees of freedom
AIC: 243.56
Number of Fisher Scoring iterations: 1
Theta: 0.431
Std. Err.: 0.143
2 x log-likelihood: -211.560
And then when I run an ANOVA test, most of the p-values are significant:
Analysis of Deviance Table
Model: Negative Binomial(0.4314), link: log
Response: PostFFall
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 1641 382.55
AADTPostFal 1 25.899 1640 356.66 3.598e-07 ***
SpeedLmt 1 0.793 1639 355.86 0.373231
SumRg 1 109.769 1638 246.09 < 2.2e-16 ***
WintRg 1 8.415 1637 237.68 0.003721 **
MigCorr 1 29.217 1636 208.46 6.470e-08 ***
MDHabitat 1 79.120 1635 129.34 < 2.2e-16 ***
DOWFACAT 3 1.575 1632 127.77 0.664987
SSNVOLCLS 2 27.064 1630 100.70 1.328e-06 ***
Fence05 1 0.740 1629 99.96 0.389572
Cross05 1 1.947 1628 98.02 0.162953
Fence16 1 15.708 1627 82.31 7.392e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Do I use the anova test to decide whether the model variables are significant? Also what does a negative coefficient intercept indicate? Also, my Dispersion parameter for Negative Binomial is (0.4314), does this mean my model is now under-dispersed?
Best Answer
It seems you are asking about the discrepancy between the table of coefficients from the
summary()
function and the analysis of variance table fromanova()
. In particular, you are wondering why the anova table gives small p-values to variables that have p-value close to 1 in the table of coefficients (e.g. variablesumRg
).The explanation may be that the table of coefficients tells you whether a variable or level of a factor is different from the reference baseline (i.e. from the intercept as explained by Isabella Ghement's answer) while the anova table tells you whether that variable contributes significantly to the quality of the fit.
I suspect that you are more interested in the output of the anova table since you want to select relevant variables. Note, however, that the
anova
function estimates the contribution of each variable by adding them sequentially in the same order as you listed them in the model. Usually, one wants to estimate the contribution of a variable independently from the order they enter into the model. For this you could use the Anova function in the car package.As a comment, it looks some variables like
MDHabitat
have huge standard error relative to the estimated effect (e.g. est= 3.225e+01, se= 1.677e+06). I wonder whether it indicates some weird characteristic of the data that could/should be addressed.