Solved – Implementing a hurdle/Zero-inflated Poisson model in R with right-censored count data

tobit-regressionzero inflation

I'm trying to fit a hurdle/zero-inflated (I haven't decided yet) model on microbiological water quality data that is also right-censored: either the water sample is contaminated with bacteria or not, and if contaminated, the number of colonies can go from 1 to 99 and "more than 100" (because it was not possible to count the number of colonies beyond 100 – often referred in microbiology as "TNTC = too numerous to count").

If I refer to the data example taken by Kleiber and Zeileis (https://www.statistik.uni-dortmund.de/useR-2008/slides/Kleiber+Zeileis.pdf) I'm in a situation where the "number of visits to the physician" is censored to let's say "30 visits or more".

Is there a way to combine a hurdle or zero-inflated model (from the package pscl for example) with this right censored data distribution ? A sort or combined hurdle / ZIP + tobit model ??

Thanks a lot for your help
Lily

Best Answer

First, don't forget that a large number of 0 values does not necessarily require a zero-inflated or hurdle model. If you have a similarly large number of cases whose predictor-variable values would be associated with low probability of contamination then your data might not be zero-inflated at all. For deciding between zero-inflated and hurdle models, don't miss this thread.

Second, if you do choose to use a hurdle model for the zeros then I understand that these are often fit in a two-step process: first the zero/positive dichotomous model and then the positive counts. In that case, after the zero/positive fit you would only have to deal with the right-censoring beyond 100 colonies in the second step, separately.

Third, the VGAM package in R includes a cens.poisson family function for censored count data, either or both right- or left-censored. Its handling of censoring is based on the survival package, so the outcome variable has to be a survival object with both the counts and a censoring indicator. I haven't used it and there may be some tricks in formatting data to take advantage of this defined family function; examine its help page for examples of how to use it properly.

Finally, you could do your own maximization of the likelihood function for a zero-inflated Poisson model with right censoring. I happened to find the formula in this paper. The maxLik package provides tools for such a purpose, although I haven't used it myself.

Related Question