Solved – How to test for Zero-Inflation in a dataset

poisson distributionzero inflation

I have a dataset which seems to have a lot of zeroes. I have already fit a poisson regression model as well as a negative binomial model. I would like to fit zero-inflated and hurdle models as well.

Before I do I would like to run a test to investigate whether my data really is zero inflated. What test(s) is/are there to determine whether my data are zero-inflated?

Best Answer

The score test (referenced in the comments by Ben Bolker) is performed by first calculating the rate estimate $\hat{\lambda}= \bar{x}$. Then count the number of observed 0s denoted $n_0$ and the total number of observations $n$. Calculate $\tilde{p}_0=\exp[-\hat{\lambda}]$. Then the test statistic is calculated by the formula: $\frac{(n_0 - n\tilde{p}_0 )^2}{n\tilde{p}_0(1-\tilde{p}_0) - n\bar{x}\tilde{p}_0^2}$. This test statistic has a $\chi^2_1$ distribution which can be looked up in tables or via statistical software.

Here is some R code that will do this:

pois_data <-rpois(100,lambda=1)
lambda_est <- mean(pois_data)

p0_tilde <- exp(-lambda_est)
p0_tilde
n0 <- sum(1*(!(pois_data >0)))
n <- length(pois_data)

# number of observtions 'expected' to be zero
n*p0_tilde

#now lets perform the JVDB score test 
numerator <- (n0 -n*p0_tilde)^2
denominator <- n*p0_tilde*(1-p0_tilde) - n*lambda_est*(p0_tilde^2)

test_stat <- numerator/denominator

pvalue <- pchisq(test_stat,df=1, ncp=0, lower.tail=FALSE)
pvalue