Solved – Zero-inflated Poisson regression

poisson-regressionzero inflation

Suppose $ \textbf{Y} = (Y_1, \dots, Y_n)'$ are independent and

$$\eqalign{
Y_i = 0 & \text{with probability} \ p_i+(1-p_i)e^{-\lambda_i}\\
Y_i = k & \text{with probability} \ (1-p_i)e^{-\lambda_i} \lambda_{i}^{k}/k!
}$$

Also suppose the parameters $\mathbf{\lambda} = (\lambda_1, \dots, \lambda_n)'$ and $\textbf{p} = (p_1, \dots, p_n)$ satisfy

$$\eqalign{
\log(\mathbf{\lambda}) &= \textbf{B} \beta \\
\text{logit}(\textbf{p}) &= \log(\textbf{p}/(1-\textbf{p})) = \textbf{G} \mathbf{\lambda}.
}$$

If the same covariates affect $\mathbf{\lambda}$ and $\textbf{p}$ so that $\textbf{B} = \textbf{G}$, then why does zero inflated Poisson regression require twice as many parameters as Poisson regression?

Best Answer

In the zero-inflated Poisson case, if $\mathbf{B}=\mathbf{G}$, then $\beta$ and $\lambda$ both have the same length, which is the number of columns of $\mathbf{B}$ or $\mathbf{G}$. So the number of parameters is twice the number of columns of the design matrix ie twice the number of explanatory variables including the intercept (and whatever dummy coding was needed).

In a straight Poisson regression, there is no $\mathbf{p}$ vector to worry about, no need to estimate $\lambda$. So the number of parameters is just the length of $\beta$ ie half the number of parameters in the zero-inflated case.

Now, there's no particular reason why $\mathbf{B}$ has to equal $\mathbf{G}$, but generally it makes sense. However, one could imagine a data generating process where the chance of having any events at all is created by one process $\mathbf{G\lambda}$ and a completely different process $\mathbf{B\beta}$ drives how many events there are, given non-zero events. As a contrived example, I pick classrooms based on their History exam scores to play some unrelated game, and then observe the number of goals they score. In this case $\mathbf{B}$ might be quite different to $\mathbf{G}$ (if the things driving History exam scores are different to those driving performance in the game) and $\beta$ and $\lambda$ could have different lengths. $\mathbf{G}$ might have more columns than $\mathbf{B}$ or less. So the zero-inflated Poisson model in that case will have more parameters than a simple Poisson model.

In common practice I think $\mathbf{G} = \mathbf{B}$ most of the time.

Related Question