Solved – Interpret zero-inflated negative binomial regression

interpretationnegative-binomial-distributionregressionzero inflation

I am trying to estimate a zero-inflated negative binomial model with 11 predictor variables and the number of reported crimes as a response variable. The model seems to work OK, but I'm uncertain on how to interpret the results. Below is my model and the results:

#estimate zero-inflated NB model 
zinf.nbi <- zeroinfl(CRIME ~ VAR1 + VAR2 + VAR3 + VAR4
                + VAR5 + VAR6 + VAR7 + VAR8 + VAR9 + VAR10
                + VAR 11, data = mydata, dist = "negbin")
summary(zinf.nbi)

> summary(zinf.nbi)

Call:
zeroinfl(formula = CRIME ~ VAR1 + VAR2 + VAR3 + VAR4 + VAR5 
         + VAR6 + VAR7 + VAR8 + VAR9 + VAR10 + VAR 11, 
         data = mydata, dist = "negbin")

Pearson residuals:
    Min       1Q   Median       3Q      Max 
-0.47719 -0.17583 -0.08080 -0.02709 26.99868 

Count model coefficients (negbin with log link):
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.682578   0.269317  -9.961  < 2e-16 ***
VAR1          1.436770   0.249026   5.770 7.95e-09 ***
VAR2         -0.648535   0.268608  -2.414 0.015760 *  
VAR3         -0.130107   0.239543  -0.543 0.587029    
VAR4         -0.008985   0.267949  -0.034 0.973249    
VAR5         -0.807941   0.269470  -2.998 0.002715 ** 
VAR6         -1.396990   0.396299  -3.525 0.000423 ***
VAR7          0.314514   0.113696   2.766 0.005670 ** 
VAR8         -1.959792   0.207233  -9.457  < 2e-16 ***
VAR9          0.711452   0.338171   2.104 0.035394 *  
VAR10        -0.013628   0.132889  -0.103 0.918316    
VAR11         0.092719   0.034799   2.664 0.007712 ** 
Log(theta)   -1.429807   0.103981 -13.751  < 2e-16 ***

Zero-inflation model coefficients (binomial with logit link):
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.14267    0.46786   2.442 0.014593 *  
VAR1          1.13108    0.51718   2.187 0.028742 *  
VAR2         -0.68871    0.33832  -2.036 0.041781 *  
VAR3          0.16412    0.37019   0.443 0.657527    
VAR4          0.57907    0.42818   1.352 0.176241    
VAR5          0.83822    0.40451   2.072 0.038247 *  
VAR6          0.02991    0.73117   0.041 0.967368    
VAR7          0.01186    0.19025   0.062 0.950282    
VAR8         -1.33618    0.39677  -3.368 0.000758 ***
VAR9          1.40246    0.39349   3.564 0.000365 ***
VAR10        -0.14713    0.22707  -0.648 0.517000    
VAR11        -2.71317    0.64939  -4.178 2.94e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 0.2394 
Number of iterations in BFGS optimization: 42 
Log-likelihood: -2649 on 25 Df

As far as I understand, the first block (the count component) is a summary of the full model and can be interpreted as a standard negative binomial model. The second block (the zero component), on the other hand, predicts whether or not the outcome is a certain zero. Now, what I would like to know is:

a) How do I interpret the second block of the model in relation to the first block? As you can see in the results, some variables are significant in both the first and the second block.

b) Which block should I present in my final results? The first block or the second block?

Best Answer

a) Here https://rpubs.com/kaz_yos/pscl-2 is a nice example of how to interpret the results of a ZINB model.

b) Obviusly you have to present both blocks.

Note: ZINB regression model two separate processes so they produce two sets of coefficients: one for the count part of the model and the other for the logistic part of the model.

A common way of interpreting logistic regression models is to exponentiate the coefficients, which places the coefficients in an odds-ratio scale. With zero-inflated models the logistic part of the model predicts non-occurrence of the outcome.

Here you can fins another example https://stats.idre.ucla.edu/other/dae/.