Solved – the difference between a zero-inflated and a zero-truncated poisson

poisson distributionzero inflation

I'm trying to make sense of a question which uses a zero-inflated poisson model given by:

$$
f(x; \lambda,\omega) = \begin{cases} \omega + (1-\omega)e^{-\lambda} &\mbox{if } x = 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1) \\
\frac{(1-\omega)e^{-\lambda}\lambda^x}{x!} & \mbox{if } x = 1,2,3,\dots \ \ \ \ \ \ \ (2)\end{cases}
$$

We are given a table of data, x = 0, 1, 2, 3, 4 and the number of occurrences of each.

In lectures we covered a very similar question, however it used a zero-truncated poisson, given by:

$$
g(x; \theta) = \frac{e^{-\theta}\theta^x}{x!(1-e^{-\theta})} x = 1,2,3,\dots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)
$$

Obviously if I equate $(2)$ with $(3)$ I get $(1-\omega) = (1-e^{-\theta})^{-1}$.

Now the question asks me to find the log-likelihood functions for $\lambda$ and $\omega$, then show that the MLE for $\hat{\lambda}$ is given by

$$\frac{\hat{\lambda}}{1-e^{-\hat{\lambda}}} = \bar{x}$$

Where $\bar{x}$ is the expected mean for $x_i \neq 0$.)

Now, this is the exact answer we got in lectures – and it's the answer I get here if I ignore (1) in my calculations. If I try to include both (1) and (2) in calculating the MLE, things get pretty messy.

My question is, what happens to (1)? Is there actually a difference between these two distributions? Also, once I calculate $\lambda$ and $\omega$, and use (1) to get a value for the frequency of $x=0$, I get a value very close to zero, whereas the frequency in the original table is 97 (all the other frequencies calculated by the model tally closely, as expected).

One other thing, if you are being kind enough to consider answering my question, is that I have no idea what a link-function is, and presume this question can be answered without resorting to such functions.

Best Answer

You have asked a number of questions here, but I will only answer the first (and the one in the title). The others you should separate into a different question, or work through on your own after you understand this answer. I think you need to know what a link function is to understand this answer. Let's say that you have process where $x$ is related to a function of $\theta$, $$x \sim f(\theta) $$ The link function is simply the empirical $f$ that you use to link $\theta$ and $x$. The most common link function is the linear link, $$ x = \beta\theta + \epsilon$$ This link is extremely versatile, but there are situations where it does not work very well. If you use an improper link function, your estimates of the parameter $\beta$ won't represent the real world situation (your estimate will be biased). The most common alternative process is the binary case, where $x$ can be one of only two values (0 and 1, for instance). In this situation, it makes sense to talk about the probability of $x =1$, and use instead the binomial logit link $\Lambda$, $$ P(x|\beta \theta) = \Lambda(\beta\theta) = {e^{\beta\theta}\over1 + e^{\beta\theta}}$$ If $x$ can only be some collection of small, ordered, discrete values, it makes sense to use the Poisson link. In this case, the probability that $x$ takes the value $i$ is $$ P(x | \beta \theta) = \dfrac{e^{\beta\theta}\beta\theta^x}{x!}$$ Say $x$ is the number of vehicles that a household owns, and $\theta$ are different socioeconomic variables. Some households own zero vehicles, some own one or two, but the probability of a household owning more than five or six is very low. This may be a good example of a Poisson process.

But there are other processes that might feel like a Poisson process, but where the number of zeros is much higher than a Poisson distribution allows. For example, let's say we are modeling the number of times a person visited a doctor in a year. For a large number of people, this is going to be zero. Thus the process is zero-inflated, and you should use your zero-inflated link function. If you use a basic Poisson link, your estimate of $\beta$ will be biased.

There are also Poisson-like processes where zeros are intuitively impossible, like the number of languages spoken by able humans. A few people speak many, many speak two or three, and everyone speaks at least one. In this case, you need to remove the zeros from your link distribution by using a zero-truncated link function. If you don't do this, your estimate of $\beta$ will again be biased (and most software won't even run if it doesn't see any zeros). Although the mathematical functions for both of your modified Poisson link functions look somewhat similar, they accomplish entirely opposite purposes.

If you have a process where the zeros are hyper-inflated (or hyper-deflated), you could combine the binary link and the zero-truncated Poisson link by using a hurdle model. One process models the probability of the outcome being positive, and another models the probability of each discrete outcome above zero. I am right now finishing a paper where we used a hurdle model to predict how many times people failed their vehicle emissions tests; 95% of people passed the first time, but others came back four or five times.