Method 0: The lazy statistician.
Note that for $y \neq 0$ we have $f(y) = (1-\pi) p_y$ where $p_y$ is the probability that a Poisson random variable takes value $y$. Since the term corresponding to $y = 0$ does not affect the expected value, our knowledge of the Poisson and the linearity of expectation immediately tells us that
$$
\mu = (1-\pi) \lambda
$$
and
$$
\mathbb E Y^2 = (1-\pi) (\lambda^2 + \lambda) \> .
$$
A little algebra and the identity $\mathrm{Var}(Y) = \mathbb E Y^2 - \mu^2$ yields the result.
Method 1: A probabilistic argument.
It's often helpful to have a simple probabilistic model for how a distribution arises. Let $Z \sim \mathrm{Ber}(1-\pi)$ and $Y \sim \mathrm{Poi}(\lambda)$ be independent random variables. Define
$$
X = Z \cdot Y \>.
$$
Then, it is easy to see that $X$ has the desired distribution $f$. To check this, note that $\renewcommand{\Pr}{\mathbb P}\Pr(X = 0) = \Pr(Z=0) + \Pr(Z=1, Y=0) = \pi + (1-\pi) e^{-\lambda}$ by independence. Similarly $\Pr(X = k) = \Pr(Z=1, Y=k)$ for $k \neq 0$.
From this, the rest is easy, since by the independence of $Z$ and $Y$,
$$
\mu = \mathbb E X = \mathbb E Z Y = (\mathbb E Z) (\mathbb E Y) = (1-\pi)\lambda \>,
$$
and,
$$
\mathrm{Var}(X) = \mathbb E X^2 - \mu^2 = (\mathbb E Z)(\mathbb E Y^2) - \mu^2 = (1-\pi)(\lambda^2 + \lambda) - \mu^2 = \mu + \frac{\pi}{1-\pi}\mu^2 \> .
$$
Method 2: Direct calculation.
The mean is easily obtained by a slight trick of pulling one $\lambda$ out and rewriting the limits of the sum.
$$
\mu = \sum_{k=1}^\infty (1-\pi) k e^{-\lambda} \frac{\lambda^k}{k!} = (1-\pi) \lambda e^{-\lambda} \sum_{j=0}^\infty \frac{\lambda^j}{j!} = (1-\pi) \lambda \> .
$$
A similar trick works for the second moment:
$$
\mathbb E X^2 = (1-\pi) \sum_{k=1}^\infty k^2 e^{-\lambda} \frac{\lambda^k}{k!} = (1-\pi)\lambda e^{-\lambda} \sum_{j=0}^\infty (j+1) \frac{\lambda^j}{j!} = (1-\pi)(\lambda^2 + \lambda) \>,
$$
from which point we can proceed with the algebra as in the first method.
Addendum: This details a couple tricks used in the calculations above.
First recall that $\sum_{k=0}^\infty \frac{\lambda^k}{k!} = e^\lambda$.
Second, note that
$$
\sum_{k=0}^\infty k \frac{\lambda^k}{k!} = \sum_{k=1}^\infty k \frac{\lambda^k}{k!} = \sum_{k=1}^\infty \frac{\lambda^k}{(k-1)!} = \sum_{k=1}^\infty \frac{\lambda \cdot \lambda^{k-1}}{(k-1)!} = \lambda \sum_{j=0}^\infty \frac{\lambda^j}{j!} = \lambda e^{\lambda} \>,
$$
where the substitution $j = k-1$ was made in the second-to-last step.
In general, for the Poisson, it is easy to calculate the factorial moments $\mathbb E X^{(n)} = \mathbb E X(X-1)(X-2)\cdots(X-n+1)$ since
$$
e^\lambda \mathbb E X^{(n)} = \sum_{k=n}^\infty k(k-1)\cdots(k-n+1) \frac{\lambda^k}{k!} = \sum_{k=n}^\infty \frac{\lambda^n \lambda^{k-n}}{(k-n)!} = \lambda^n \sum_{j=0}^\infty \frac{\lambda^j}{j!} = \lambda^n e^\lambda \>,
$$
so $\mathbb E X^{(n)} = \lambda^n$. We get to "skip" to the $n$th index for the start of the sum in the first equality since for any $0 \leq k < n$, $k(k-1)\cdots(k-n+1) = 0$ since exactly one term in the product is zero.
You have asked a number of questions here, but I will only answer the first (and the one in the title). The others you should separate into a different question, or work through on your own after you understand this answer. I think you need to know what a link function is to understand this answer. Let's say that you have process where $x$ is related to a function of $\theta$,
$$x \sim f(\theta) $$
The link function is simply the empirical $f$ that you use to link $\theta$ and $x$. The most common link function is the linear link,
$$ x = \beta\theta + \epsilon$$
This link is extremely versatile, but there are situations where it does not work very well. If you use an improper link function, your estimates of the parameter $\beta$ won't represent the real world situation (your estimate will be biased). The most common alternative process is the binary case, where $x$ can be one of only two values (0 and 1, for instance). In this situation, it makes sense to talk about the probability of $x =1$, and use instead the binomial logit link $\Lambda$,
$$ P(x|\beta \theta) = \Lambda(\beta\theta) = {e^{\beta\theta}\over1 + e^{\beta\theta}}$$
If $x$ can only be some collection of small, ordered, discrete values, it makes sense to use the Poisson link. In this case, the probability that $x$ takes the value $i$ is
$$ P(x | \beta \theta) = \dfrac{e^{\beta\theta}\beta\theta^x}{x!}$$
Say $x$ is the number of vehicles that a household owns, and $\theta$ are different socioeconomic variables. Some households own zero vehicles, some own one or two, but the probability of a household owning more than five or six is very low. This may be a good example of a Poisson process.
But there are other processes that might feel like a Poisson process, but where the number of zeros is much higher than a Poisson distribution allows. For example, let's say we are modeling the number of times a person visited a doctor in a year. For a large number of people, this is going to be zero. Thus the process is zero-inflated, and you should use your zero-inflated link function. If you use a basic Poisson link, your estimate of $\beta$ will be biased.
There are also Poisson-like processes where zeros are intuitively impossible, like the number of languages spoken by able humans. A few people speak many, many speak two or three, and everyone speaks at least one. In this case, you need to remove the zeros from your link distribution by using a zero-truncated link function. If you don't do this, your estimate of $\beta$ will again be biased (and most software won't even run if it doesn't see any zeros). Although the mathematical functions for both of your modified Poisson link functions look somewhat similar, they accomplish entirely opposite purposes.
If you have a process where the zeros are hyper-inflated (or hyper-deflated), you could combine the binary link and the zero-truncated Poisson link by using a hurdle model. One process models the probability of the outcome being positive, and another models the probability of each discrete outcome above zero. I am right now finishing a paper where we used a hurdle model to predict how many times people failed their vehicle emissions tests; 95% of people passed the first time, but others came back four or five times.
Best Answer
This works:
which gives a likelihood of -7853.122
So @Ben Bolker is correct.
It doesn't work even if I specify lower to be 0, as it would try to evaluate at sigma = 0, which is not supported for
ZIP
.