In the zero-inflated Poisson case, if $\mathbf{B}=\mathbf{G}$, then $\beta$ and $\lambda$ both have the same length, which is the number of columns of $\mathbf{B}$ or $\mathbf{G}$. So the number of parameters is twice the number of columns of the design matrix ie twice the number of explanatory variables including the intercept (and whatever dummy coding was needed).
In a straight Poisson regression, there is no $\mathbf{p}$ vector to worry about, no need to estimate $\lambda$. So the number of parameters is just the length of $\beta$ ie half the number of parameters in the zero-inflated case.
Now, there's no particular reason why $\mathbf{B}$ has to equal $\mathbf{G}$, but generally it makes sense. However, one could imagine a data generating process where the chance of having any events at all is created by one process $\mathbf{G\lambda}$ and a completely different process $\mathbf{B\beta}$ drives how many events there are, given non-zero events. As a contrived example, I pick classrooms based on their History exam scores to play some unrelated game, and then observe the number of goals they score. In this case $\mathbf{B}$ might be quite different to $\mathbf{G}$ (if the things driving History exam scores are different to those driving performance in the game) and $\beta$ and $\lambda$ could have different lengths. $\mathbf{G}$ might have more columns than $\mathbf{B}$ or less. So the zero-inflated Poisson model in that case will have more parameters than a simple Poisson model.
In common practice I think $\mathbf{G} = \mathbf{B}$ most of the time.
The root of the difficulty you are having lies in the sentence:
Then using the EM algorithm, we can maximize the second
log-likelihood.
As you have observed, you can't. Instead, what you maximize is the expected value of the second log likelihood (known as the "complete data log likelihood"), where the expected value is taken over the $z_i$.
This leads to an iterative procedure, where at the $k^{th}$ iteration you calculate the expected values of the $z_i$ given the parameter estimates from the $(k-1)^{th}$ iteration (this is known as the "E-step",) then substitute them into the complete data log likelihood (see EDIT below for why we can do this in this case), and maximize that with respect to the parameters to get the estimates for the current iteration (the "M-step".)
The complete-data log likelihood for the zero-inflated Poisson in the simplest case - two parameters, say $\lambda$ and $p$ - allows for substantial simplification when it comes to the M-step, and this carries over to some extent to your form. I'll show you how that works in the simple case via some R code, so you can see the essence of it. I won't simplify as much as possible, since that might cause a loss of clarity when you think of your problem:
# Generate data
# Lambda = 1, p(zero) = 0.1
x <- rpois(10000,1)
x[1:1000] <- 0
# Sufficient statistic for the ZIP
sum.x <- sum(x)
# (Poor) starting values for parameter estimates
phat <- 0.5
lhat <- 2.0
zhat <- rep(0,length(x))
for (i in 1:100) {
# zhat[x>0] <- 0 always, so no need to make the assignment at every iteration
zhat[x==0] <- phat/(phat + (1-phat)*exp(-lhat))
lhat <- sum.x/sum(1-zhat) # in effect, removing E(# zeroes due to z=1)
phat <- mean(zhat)
cat("Iteration: ",i, " lhat: ",lhat, " phat: ", phat,"\n")
}
Iteration: 1 lhat: 1.443948 phat: 0.3792712
Iteration: 2 lhat: 1.300164 phat: 0.3106252
Iteration: 3 lhat: 1.225007 phat: 0.268331
...
Iteration: 99 lhat: 0.9883329 phat: 0.09311933
Iteration: 100 lhat: 0.9883194 phat: 0.09310694
In your case, at each step you'll do a weighted Poisson regression where the weights are 1-zhat
to get the estimates of $\beta$ and therefore $\lambda_i$, and then maximize:
$\sum (\mathbb{E}z_i\log{p_i} + (1-\mathbb{E}z_i)\log{(1-p_i)})$
with respect to the coefficient vector of your matrix $\mathbf{G}$ to get the estimates of $p_i$. The expected values $\mathbb{E}z_i = p_i/(p_i+(1-p_i)\exp{(-\lambda_i)})$, again calculated at each iteration.
If you want to do this for real data, as opposed to just understanding the algorithm, R packages already exist; here's an example http://www.ats.ucla.edu/stat/r/dae/zipoisson.htm using the pscl
library.
EDIT: I should emphasize that what we are doing is maximizing the expected value of the complete-data log likelihood, NOT maximizing the complete-data log likelihood with the expected values of the missing data/latent variables plugged in. As it happens, if the complete-data log likelihood is linear in the missing data, as it is here, the two approaches are the same, but otherwise, they aren't.
Best Answer
Method 0: The lazy statistician.
Note that for $y \neq 0$ we have $f(y) = (1-\pi) p_y$ where $p_y$ is the probability that a Poisson random variable takes value $y$. Since the term corresponding to $y = 0$ does not affect the expected value, our knowledge of the Poisson and the linearity of expectation immediately tells us that $$ \mu = (1-\pi) \lambda $$ and $$ \mathbb E Y^2 = (1-\pi) (\lambda^2 + \lambda) \> . $$
A little algebra and the identity $\mathrm{Var}(Y) = \mathbb E Y^2 - \mu^2$ yields the result.
Method 1: A probabilistic argument.
It's often helpful to have a simple probabilistic model for how a distribution arises. Let $Z \sim \mathrm{Ber}(1-\pi)$ and $Y \sim \mathrm{Poi}(\lambda)$ be independent random variables. Define $$ X = Z \cdot Y \>. $$ Then, it is easy to see that $X$ has the desired distribution $f$. To check this, note that $\renewcommand{\Pr}{\mathbb P}\Pr(X = 0) = \Pr(Z=0) + \Pr(Z=1, Y=0) = \pi + (1-\pi) e^{-\lambda}$ by independence. Similarly $\Pr(X = k) = \Pr(Z=1, Y=k)$ for $k \neq 0$.
From this, the rest is easy, since by the independence of $Z$ and $Y$, $$ \mu = \mathbb E X = \mathbb E Z Y = (\mathbb E Z) (\mathbb E Y) = (1-\pi)\lambda \>, $$ and, $$ \mathrm{Var}(X) = \mathbb E X^2 - \mu^2 = (\mathbb E Z)(\mathbb E Y^2) - \mu^2 = (1-\pi)(\lambda^2 + \lambda) - \mu^2 = \mu + \frac{\pi}{1-\pi}\mu^2 \> . $$
Method 2: Direct calculation.
The mean is easily obtained by a slight trick of pulling one $\lambda$ out and rewriting the limits of the sum. $$ \mu = \sum_{k=1}^\infty (1-\pi) k e^{-\lambda} \frac{\lambda^k}{k!} = (1-\pi) \lambda e^{-\lambda} \sum_{j=0}^\infty \frac{\lambda^j}{j!} = (1-\pi) \lambda \> . $$
A similar trick works for the second moment: $$ \mathbb E X^2 = (1-\pi) \sum_{k=1}^\infty k^2 e^{-\lambda} \frac{\lambda^k}{k!} = (1-\pi)\lambda e^{-\lambda} \sum_{j=0}^\infty (j+1) \frac{\lambda^j}{j!} = (1-\pi)(\lambda^2 + \lambda) \>, $$ from which point we can proceed with the algebra as in the first method.
Addendum: This details a couple tricks used in the calculations above.
First recall that $\sum_{k=0}^\infty \frac{\lambda^k}{k!} = e^\lambda$.
Second, note that $$ \sum_{k=0}^\infty k \frac{\lambda^k}{k!} = \sum_{k=1}^\infty k \frac{\lambda^k}{k!} = \sum_{k=1}^\infty \frac{\lambda^k}{(k-1)!} = \sum_{k=1}^\infty \frac{\lambda \cdot \lambda^{k-1}}{(k-1)!} = \lambda \sum_{j=0}^\infty \frac{\lambda^j}{j!} = \lambda e^{\lambda} \>, $$ where the substitution $j = k-1$ was made in the second-to-last step.
In general, for the Poisson, it is easy to calculate the factorial moments $\mathbb E X^{(n)} = \mathbb E X(X-1)(X-2)\cdots(X-n+1)$ since $$ e^\lambda \mathbb E X^{(n)} = \sum_{k=n}^\infty k(k-1)\cdots(k-n+1) \frac{\lambda^k}{k!} = \sum_{k=n}^\infty \frac{\lambda^n \lambda^{k-n}}{(k-n)!} = \lambda^n \sum_{j=0}^\infty \frac{\lambda^j}{j!} = \lambda^n e^\lambda \>, $$ so $\mathbb E X^{(n)} = \lambda^n$. We get to "skip" to the $n$th index for the start of the sum in the first equality since for any $0 \leq k < n$, $k(k-1)\cdots(k-n+1) = 0$ since exactly one term in the product is zero.