This uses some nice properties of $\epsilon$. Let $Y_{ni}$ be iid and take on values in $\{0,1,2,\ldots\}$ with respective probabilities $\{p_0,p_1,\ldots\}$. Then the branching process looks something like $X_n=\sum_{i=1}^{X_{n-1}}Y_{ni}$ where by convention the sum is zero if $X_{n-1}=0$.
Conditioning on $Z_{n-1}$ effectively determines $X_{n-1}$, so that
$\mathbb{E}[Z_n|Z_{n-1}]= \mathbb{E}[\epsilon^{\sum_{i=1}^{X_{n-1}}Y_{ni}}|Z_{n-1}]=\mathbb{E}[\epsilon^{Y_{n1}}]^{X_{n-1}}$
where I used independence of $Y_{ni}$ and that $X_{n-1}$ is determined by $Z_{n-1}$ to get rid of conditioning. So we need to show $\mathbb{E}[\epsilon^{Y_{n1}}]=\epsilon$ for this to be a martingale. Notice that we can explicitely write:
$\mathbb{E}[\epsilon^{Y_{n1}}]=\sum_{k=0}^\infty \epsilon^kp_k$
Call $D$ the event that extinction occurs. Notice that for $D$ to occur every single subtree has to die for each first generation member. So we should look at what happens for $X_1=k$ for all $k$. If $k=0$ then extinction has occured. We write,
$\epsilon:=P(D|X_0=1)\\
=\sum_{k=0}^\infty P(D\cap (X_1=k)|X_0=1)\\
=\sum_{k=0}^\infty P(D|X_1=k \cap X_0=1)P(X_1=k|X_0=1)\\
=\sum_{k=0}^\infty \epsilon^kp_k$
where we used a number of facts: $P(A\cap B|C) = P(A| B\cap C)P(B|C)$ and the fact that extinction probability at the first level is the same as the zeroth level. More precisely, if we have $X_1\geq 1$, then the Galton Watson process started on the $X_1$ subtree is independent of what happened in $X_0$. In short, the Galton Watson process is Markov in that conditioning on the past reduces to just what the previous generation was at.
I will try to summarize the derivation found in Harrison (1990)*:
Let $M_t\equiv \sup \left\{ X_s,0 \le s \le t \right\}$.
Basically the derivation is accomplished in 2 steps:
- Find the joint distribution $F_t(x,y)\equiv P \left\{X_t\leq x,M_t\leq y\right\}$, where $X_t$ is the standard Brownian motion;
- Use the change of measure to calculate the joint distribution of $X_t$ and $M_t$ in generality.
The first step goes like this:
Let $X_t$ be the standard Brownian motion, i.e. $\mu=0$ and $\sigma=1$. Then we have:
$$
\begin{align}
F_t(x,y) & = P \left\{X_t\leq x,M_t\leq y\right\} \\
& =P \left\{X_t\leq x\right\}-P \left\{X_t\leq x,M_t>y\right\}\\
& =\Phi\left(\frac{x}{\sqrt{t}}\right)-P \left\{X_t\leq x,M_t>y\right\}
\end{align}
$$
with $\Phi(\cdot)$ denoting the standard normal distribution.
Now, the term $P\left\{X_t\le x, M_t > y \right\}$ can be calculated heuristically using the so-called reflection principle (note that the restriction $\mu=0$ is critical here): for every sample path of $X$ that hits level $y$ before time $t$ but finishes below level $x$
at time $t$, there is another equally probable path (shown by the dotted line
in Figure 1) that hits $y$ before $t$ and then travels upward at least $y - x$ units to finish above level $y + (y - x) = 2y - x$ at time $t$.
Thus
$$
\begin{align}
P\left\{X_t\le x, M_t > y \right\} & = P\left\{X_t\ge 2y-x \right\} \\
& = P\left\{X_t\le x-2y \right\} = \Phi\left(\frac{x-2y}{\sqrt{t}}\right)
\end{align}
$$
(This argument can be made rigorous using the strong Markov property)
So we have:
$$
P \left\{X_t\leq x,M_t\leq y\right\}=\Phi\left(\frac{x}{\sqrt{t}}\right)-\Phi\left(\frac{x-2y}{\sqrt{t}}\right)
$$
and as a corollary:
$$
P \left\{X_t\in d x,M_t\le y\right\}=g_t(x,y) d x
$$
where
$g_t(x,y)=\frac{1}{\sqrt{t}}\left(\phi\left(\frac{x}{\sqrt{t}}\right)-\phi\left(\frac{x-2y}{\sqrt{t}}\right)\right)$ and $\phi(\cdot)$ is the standard normal density.
Remembering the last result from above we can proceed to the next step where we want to calculate the joint distribution of $X_t$ and $M_t$ for general values $\mu$ and $\sigma$.
For this we will need to perform the change of measure:
$$
P\left\{X_t\in d x,M_t\le y\right\}=f_t(x,y) d x
$$
where $f_t(x,y)$ is obtained through the change of measure using the Radon-Nikodym derivative (through a necessary rescaling):
$$
f_t(x,y)=\frac{1}{\sigma}\exp\left(\frac{\mu x}{\sigma^2}-\frac{\mu^2 t}{2\sigma^2} \right) g_t\left(\frac{x}{\sigma},\frac{y}{\sigma}\right)
$$
and $g_t(\cdot,\cdot)$ being defined above.
If we employ some simplifications we can get the $f_t(x,y)$ to look like the density from the question.
From there we can perform the integration to get the following result:
$$
F_t(x,y) = \int_{-\infty}^xf_t(z,y) dz = \Phi\left(\frac{x-\mu t}{\sigma\sqrt{t}}\right)-e^{2\mu y/\sigma^2}\Phi\left(\frac{x-2y-\mu t}{\sigma \sqrt{t}}\right)
$$
It should be straightforward to refine it for the process starting at a level $\alpha$.
If we define $\tau_y$ as the first $t$ at which $X_t=y$ (possibly $+\infty$ if $\mu<0$), then obviously $\tau_y > t \iff M_t < y$. Letting $x \nearrow y$ gives:
$$
\begin{align}
P\left\{\tau_y > t\right\} &= P \left\{M_t < y \right\} = F_t(y, y)\\
&= \Phi\left(\frac{y-\mu t}{\sigma\sqrt{t}}\right)-e^{2\mu y/\sigma^2}\Phi\left(\frac{-y-\mu t}{\sigma \sqrt{t}}\right)
\end{align}
$$
for $y>0$.
* Harrison, J. M. (1985). Brownian motion and stochastic flow systems (pp. 89-91). New York: Wiley.
Best Answer
I've found a solution that uses the law of total variance conditioning on $Z_{t-1}$:
$$ \begin{align} \text{Var}(Z_t)&=\mathbb{E}[\text{Var}(Z_t|Z_{t-1})]+\text{Var}(\mathbb{E}[Z_t|Z_{t-1}])\\ &=\mathbb{E}[Z_{t-1}\cdot\text{Var}(\xi)]+\text{Var}(Z_{t-1}\cdot\mathbb{E}[\xi])\\ &=\sigma^2\mu^{t-1}+\mu^2\text{Var}(Z_{t-1})\\ \end{align} $$
from which we multiply through by $\mu^{-2t}$ to get
$$ \mu^{-2t}\text{Var}(Z_t)=\frac{\sigma^2}{\mu^{t+1}}+\mu^{-2(t-1)}\text{Var}(Z_{t-1}). $$
Taking the exponents of $\mu$ inside the variances yields
$$ \text{Var}(\mu^{-t}Z_t)=\frac{\sigma^2}{\mu^{t+1}}+\text{Var}(\mu^{-(t-1)}Z_{t-1}) $$
which can be written in terms of $W_t=\mu^{-t}Z_t$ as
$$ \text{Var}(W_t)=\frac{\sigma^2}{\mu^{t+1}}+\text{Var}(W_{t-1}). $$