If you really meant log-likelihood, then the answer is: it's not always zero.
For example, consider Poisson data: $y_i \sim \text{Poisson}(\mu_i), i = 1, \ldots, n$. The log-likelihood for $Y = (y_1, \ldots, y_n)$ is given by:
$$\ell(\mu; Y) = -\sum_{i = 1}^n \mu_i + \sum_{i = 1}^n y_i \log \mu_i - \sum_{i = 1}^n \log(y_i!). \tag{$*$}$$
Differentiate $\ell(\mu; Y)$ in $(*)$ with respect to $\mu_i$ and set it to $0$ (this is how we obtain the MLE for saturated model):
$$-1 + \frac{y_i}{\mu_i} = 0.$$
Solve this for $\mu_i$ to get $\hat{\mu}_i = y_i$, substituting $\hat{\mu}_i$ back into $(*)$ for $\mu_i$ gives that the log-likelihood of the saturated model is:
$$\ell(\hat{\mu}; Y) = \sum_{i = 1}^n y_i(\log y_i - 1) -\sum_{i = 1}^n \log(y_i!) \neq 0$$
unless $y_i$ take very special values.
In the help page of the R
function glm
, under the item deviance
, the document explains this issue as follows:
deviance
up to a constant, minus twice the maximized log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.
Notice that it mentioned that the deviance, instead of the log-likelihood of the saturated model is chosen to be zero.
Probably, what you really wanted to confirm is that "the deviance of the saturated model is always given as zero", which is true, since the deviance, by definition (see Section 4.5.1 of Categorical Data Analysis (2nd Edition) by Alan Agresti) is the likelihood ratio statistic of a specified GLM to the saturated model. The constant
aforementioned in the R documentation is actually twice the maximized log-likelihood of the saturated model.
Regarding your statement "Yet, the way the formula for deviance is given suggests that sometimes this quantity is non zero.", it is probably due to the abuse of usage of the term deviance. For instance, in R, the likelihood ratio statistic of comparing two arbitrary (nested) models $M_1$ and $M_2$ is also referred to as deviance, which would be more precisely termed as the difference between the deviance of $M_1$ and the deviance of $M_2$, if we closely followed the definition as given in Agresti's book.
Conclusion
The log-likelihood of the saturated model is in general non-zero.
The deviance (in its original definition) of the saturated model is zero.
The deviance output from softwares (such as R) is in general non-zero as it actually means something else (the difference between deviances).
The following are the derivation for the general exponential-family case and another concrete example. Suppose that data come from exponential family (see Modern Applied Statistics with S, Chapter $7$):
$$f(y_i; \theta_i, \varphi) = \exp[A_i(y_i\theta_i - \gamma(\theta_i))/\varphi + \tau(y_i, \varphi/A_i)]. \tag{1}$$
where $A_i$ are known prior weights and $\varphi$ are dispersion/scale parameter (for many cases such as binomial and Poisson, this parameter is known, while for other cases such as normal and Gamma, this parameter is unknown). Then the log-likelihood is given by:
$$\ell(\theta, \varphi; Y) = \sum_{i = 1}^n A_i(y_i \theta_i - \gamma(\theta_i))/\varphi + \sum_{i = 1}^n \tau(y_i, \varphi/A_i). $$
As in the Poisson example, the saturated model's parameters can be estimated by solving the following score function:
$$0 = U(\theta_i) = \frac{\partial \ell(\theta, \varphi; Y)}{\partial \theta_i} = \frac{A_i(y_i - \gamma'(\theta_i))}{\varphi}$$
Denote the solution of the above equation by $\hat{\theta}_i$, then the general form of the log-likelihood of the saturated model (treat the scale parameter as constant) is:
$$\ell(\hat{\theta}, \varphi; Y) = \sum_{i = 1}^n A_i(y_i \hat{\theta}_i - \gamma(\hat{\theta}_i))/\varphi + \sum_{i = 1}^n \tau(y_i, \varphi/A_i). \tag{$**$}$$
In my previous answer, I incorrectly stated that the first term on the right side of $(**)$ is always zero, the above Poisson data example proves it is wrong. For a more complicated example, consider the Gamma distribution $\Gamma(\alpha, \beta)$ given in the appendix.
Proof of the first term in the log-likelihood of saturated Gamma model is non-zero: Given
$$f(y; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}e^{-\beta y}y^{\alpha - 1}, \quad y > 0, \alpha > 0, \beta > 0,$$
we must do reparameterization first so that $f$ has the exponential family form $(1)$. It can be verified if letting
$$\varphi = \frac{1}{\alpha},\, \theta = -\frac{\beta}{\alpha},$$
then $f$ has the representation:
$$f(y; \theta, \varphi) = \exp\left[\frac{\theta y - (-\log(-\theta))}{\varphi}+ \tau(y, \varphi)\right],$$
where
$$\tau(y, \varphi) = -\frac{\log \varphi}{\varphi} + \left(\frac{1}{\varphi} - 1\right)\log y - \log\Gamma(\varphi^{-1}).$$
Therefore, the MLEs of the saturated model are $\hat{\theta}_i = -\frac{1}{y_i}$.
Hence
$$\sum_{i = 1}^n \frac{1}{\varphi}[\hat{\theta}_iy_i - (-\log(-\hat{\theta}_i))] = \sum_{i = 1}^n \frac{1}{\varphi}[-1 - \log(y_i)] \neq 0, $$
unless $y_i$ take very special values.
The deviance is a GLM concept, ZIP and ZINB models are not glms but are formulated as finite mixtures of distributions which are GLMs and therefore can be solved easily via EM algorithm.
These notes describe the theory of deviance concisely. If you read those notes you'll see the proof that the saturated model for the Poisson regression has log-likelihood
$$\ell(\lambda_s)= \sum_{i=1, \forall y_i\neq 0}^n \left[ y_ilog(y_i)-y_i -log(y_i!)\right]$$
which results from the plug-in estimates $y_i =\hat{\lambda}_i$.
I'll proceed now with the ZIP likelihood because the math is simpler, similar results hold for the ZINB. Unfortunately for the ZIP, there is no simple relationship like in the Poisson. The $i$th observations log-likelihood is
$$\ell_i(\phi, \lambda)=Z_ilog(\phi+(1-\phi)e^{-\lambda})+ (1-Z_i)\left[-\lambda +y_ilog(\lambda) -log(y_i!)\right].$$
the $Z_i$ are not observed so to solve this you'd need to take partial derivatives w.r.t. both $\lambda$ and $\phi$, set the equations to 0 and then solve for $\lambda$ and $\phi$. The difficulty here are the $y_i=0$ values, these can go into a $\hat{\lambda}$ or into a $\hat{\phi}$ and it isn't possible without observing $Z_i$ which to put the $y_i=0$ observations into. However, if we knew the $Z_i$ value we wouldn't need a ZIP model because we would have no missing data. The observed data corresponds to the "complete data" likelihood in the EM formalism.
One approach that might be reasonable is to work with the expectation w.r.t. $Z_i$ of the complete data log-likelihood, $\mathbb{E}(\ell_i(\phi, \lambda))$ which removes the $Z_i$ and replaces with an expectation, this is part of what the EM algorithm calculates (the E step) with the most recent updates. I'm unaware of any literature that has studied this approach to $expected$ deviance though.
Also, this question was asked first so I answered this post. However, there is another question on the same topic with a nice comment by Gordon Smyth here:
deviance for zero-inflated compound poisson model, continuous data (R)
where he mentioned the same response (this is an elaboration of that comment I'd say) plus they mentioned in the comments to the other post a paper which you may want to read. (disclaimer, I have not read the paper referenced)
Best Answer
You would need the deviances and the degrees of freedom to perform a deviance test. If the model fits the data well then $D_1 \sim \chi^2(n-p)$ and $D_2 \sim \chi^2(n-q)$. $D_1$ and $D_2$ are deviances for model 1 and model 2. $n$ is the number of parameters for saturated model. $p$ and $q$ are the number of parameters for given models ($q < p < n$)
Deviance test:
$$D_1 - D_2 \sim \chi^2(p-q)$$
If value is greater than expected from chi squared, reject model 1.