Expected Value in Stochastic Processes – Calculating Time Spent in an Absorbing Markov Chain State

expected valuemarkov-processstochastic-processes

It's well known that, if $Q$ is the matrix of transient state transition probabilities, and
$$ N = \sum_{n=0}^{\infty} Q^n = (I – Q)^{-1}$$
then $N_{ij}$ describes the expected number of times the chain is in state $j$, given that it starts in state $i$. (source: Wiki absorbing markov chain).

I'm looking for the expected number of times the chain is in state $j$, given that it starts in state $i$ $\textbf{and}$ eventually absorbs in state $k$.

Motivation:

I'm trying to model the spread of a mutant gene throughout a population, and to do this, I'm using a markov chain. The absorbing states are $0$ and $M$ to represent the gene dying or existing in every member of the population. I want to calculate the number of times a specific amount of people have this gene before it dies out, given that it dies out.

Best Answer

Using the Wikipedia notation, the full matrix of transition probabilities is written as $$P = \left(\begin{array}{cc} Q & R \\ 0 & I \end{array}\right)$$ with the states organized so that the last states are all absorbing. In the OP's example from population genetics, the states would be organized as $1, 2, \ldots, M-1, 0, M$.

If $\tau$ denotes the first time the Markov chain enters the set of absorbing states, the absorption probability for state $k$ given that the Markov chain starts in state $i$ is $$B_{ik} := P(X_{\tau} = k \mid X_0 = i).$$ Observe that if $X_n = j$ for a transient state $j$ then $\tau > n$, and by the time homogeneity of the Markov chain $$P(X_{\tau} = k \mid X_n = j) = B_{jk}.$$ From Wikipedia it follows that $$B_{ik} = (NR)_{ik}.$$

The expected number of visits to state $j$ conditionally on the chain starting in $i$ and being absorbed in $k$ is $$\xi_{ik}(j) = E\left( \sum_{n=0}^{\infty} 1(X_n = j) \ \middle| \ X_0 = i, X_{\tau} = k\right) = \sum_{n=0}^{\infty} P(X_n = j \mid X_0 = i, X_{\tau} = k) .$$ Now the probabilities in the infinite sum can be rewritten as follows \begin{align*} P(X_n = j \mid X_0 = i, X_{\tau} = k) & = \frac{P(X_{\tau} = k, X_n = j \mid X_0 = i)}{P(X_{\tau} = k \mid X_0 = i)} \\ & = \frac{P(X_{\tau} = k \mid X_n = j, X_0 = i)P(X_n = j \mid X_0 = i)}{P(X_{\tau} = k \mid X_0 = i)} \\ & = \frac{P(X_{\tau} = k \mid X_n = j)}{P(X_{\tau} = k \mid X_0 = i)} (Q^n)_{ij} \\ & = \frac{B_{jk}}{B_{ik}}(Q^n)_{ij}. \end{align*} From this we obtain the formula \begin{align*} \xi_{ik}(j) & = \frac{B_{jk}}{B_{ik}}\sum_{n=0}^{\infty} (Q^n)_{ij} = \frac{B_{jk}}{B_{ik}}N_{ij}, \end{align*} where the absorption probabilities $B_{ij}$ and $B_{ik}$ can be computed from $N$ and $R$ using the formula above.

Related Question