Biostatistics – Expected Value and Model Expression for Zero Inflated Negative Binomial

biostatisticsnegative-binomial-distributionzero inflation

I've been working around Zero Inflated models. The data that I have, however, shows overdispersion so I am using a Zero Infalted Negative Binomial to model counts considering exposure.

The end goal that I am trying to reach is simply to show the expected value of children who consume a specific medication, considering their exposure to treatment.

Here's what I have so far:

  • The model, itself, should consist of two components: a zero component (logit): $P(y=j)=\pi + (1-\pi)f_{Count}(0) if j=0$
    and the actual count component (negative binomial in this case): $(1-\pi)f_{Count}(j), if j>0$
  • Assuming that $\pi$ represents the probability of a 0 occuring.

So the model equation would simply require the replacement of $f_{Count}$ with the point mass function of the Negative Binomial. But in that case, the exposure is not accounted for.

For this, GLM seems to be an appropriate alternative. But I am having trouble writing the model and the expected value of it.
I've read that in $Y \sim NegativeBinomial$, $E[Y] = (1-\pi)\mu$ where $\mu$ is the mean of the density of $f_{Count}$.

So should my link function be $log((1-\pi)\mu)$? If so, how do I proceed to, still, obtain the expected value of my data, given the fact that the model that I defined is defined by branches?

(I hope I got my point across. I am sorry if the terms are statistically incorrect, I tried my best.)

Best Answer

The log-likelihood for a zero-inflated negative-binomial model can be written as:

$$\mathcal{L} = \left\{ \begin{array}{ll} \sum_{i=1}^{n} \left[ ln(p_{i}) + (1 – p_i)\left(\frac{1}{1 + \alpha\mu_{i}}\right)^{\frac{1}{\alpha}} \right] &\mbox{if } y_{i} = 0 \\ \sum_{i=1}^{n} \left[ ln(p_{i}) + ln\Gamma\left(\frac{1}{\alpha} + y_i\right) – ln\Gamma(y_i + 1) – ln\Gamma\left(\frac{1}{\alpha}\right) + \left(\frac{1}{\alpha}\right)ln\left(\frac{1}{1 + \alpha\mu_{i}}\right) + y_iln\left(1 – \frac{1}{1 + \alpha\mu_{i}}\right) \right] &\mbox{if } y_{i} > 0 \end{array} \right. $$

where $y_i$ is the observed count, $p_i$ is the probability from the logistic zero-inflation part of the model, $\alpha$ is the dispersion parameter for the negative-binomial model, and $\mu_i$ is the mean conditional on covariate values for the negative-binomial model. The $\mu_i$ are typically modeled with a log link; see this page among others. Predictors for the logistic and negative-binomial parts of the model can differ.