Zero-Inflated Poisson Model – Comprehensive Understanding

poisson distributionrzero inflation

I am working to investigate association between environmental pollution and daily hospital admission due to various causes. This outcome data has excess zeros on days when there are no admissions for specific causes. I would like to adjust for temperature and humidity using smooth terms. Usually Poisson generalized additive models with smoothing parameters for time trend and meteorological variables are used to model such associations. However due to excessive zeros I was advised to consider zero inflated models. How can I implement this in R and address non parametric associations of temperature, humidity and time trend?

  1. What is the criterion to use zero inflated models for a count data?
  2. How do I run zero inflated model and how do I check the fit?

The following is a sample GAM model in R using mgcv package:

Log {E (hospital admission)} = α+β(pollutant)+s(time)+s(temperature) + s(Relative Humidity) + DOW + flu
Where: 
E (admission) = expected count of cause specific admissions on day t
β= regression coefficient of the pollutant
pollutant = air pollutant (PM10, ozone, NO2) level at time t
s = smooth function using natural or penalized spline
dow =  vector of regression coefficient associated with indicator variables for day of the week (DOW).
Flu= weekly influenza count
Time, temperature and relative humidity are covariates 

Thanks

Best Answer

  1. Criterion is based upon (informed) model comparisons. You are trying to account for over-dispersion.

    Poisson var(x) ~ mu

    Neg Binomial var(x) > mu

    "Extra" zeros

    ZIP var(x) ~ mu

    ZIPB var(x) > mu

  2. One active package that you can use is install.packages("pscl") You can then fit a number of models such as a hurdle model that uses a negative binomial for the counts and a binomial model for modeling the probability of zeros. This would be written something like:

    fit <- hurdle(Admission ~ Temperature + Humidity), dist="negbin", data = data)
    
     summary (fit)
    

Note that the output will have two sets of coefficients: one for the hurdle component and one for the count data. This output also provides an estimate of the theta parameter (overdispersion) of the negative binomial

Or you may want to look at the zero-inflation model

fit1<-zeroinfl(Admissions ~ Temperature + Humidity), data = data,dist="negbin",link="logit")

These models can be examined with AIC (also compare these models to your Poisson model...) AIC(fit,fit1)