[Math] Probability of visiting state $s_1$ of a Markov chain more than $N$ times in $L$ steps.

combinatoricsmarkov chainsprobabilitystochastic-processes

Assume we have a two-state Markov chain, with $s_1$ and $s_0$ denoting the two states. The initial state of the Markov chain is either $s_1$ or $s_0$ with probability $p_1$ or $p_0$, respectively. The transition probability between states $s_i$ and $s_j$ is $p_{ij}$.

My question is, given the initial state probability $p_1$ for state $s_1$, which is the probability that after $L$ steps, state $s_1$ has been visited more than $N$ times? Notice that remaining in state $s_1$ counts as an other visit to this state.

I thought that this was a quite straight forward question, but after some research in probability and stochastic processes books I found no theorem with this result. I tried to proof it my self, but I the solution looks like being quite tedious.

I would really appreciate if somebody can point me the right direction in how to calculate this probability.

cheers

Pol

Best Answer

This is my answer to my own question. I wanted to get an expression easy to work with, but this is the best I got.

The question is to compute the probability $P(N_1 \geq N | L)$ that is the probability that the number of visits to state $s_1$ (denoted by $N_1$) is larger than $N$ after $L$ state transitions. If state $s_1$ is revisited from state $s_1$ this is counted as an another visit. I count the visits to state $s_1$ as the number of state transitions that start from $s_1$; that is, the number of times that transitions $s_1 \rightarrow s_0$ and $s_1 \rightarrow s_0$ occur.

I denote by $P_1(n,l|s_i)$ the probability of visiting exactly $n$ times state $s_1$ in $l$ state transitions when the initial state is $s_i$. Then $P(N_1 \geq N | L)$ is \begin{equation} P(N_1 \geq N | L)=p_1\sum_{i=N_1}^{L}P_1(i,L|s_1)+p_0\sum_{i=N_1}^{L-1}P_1(i,L|s_0), \end{equation}

where the second term sums up to $L-1$ only because the initial state is $s_0$. I now compute $P_1(i,l|s_1)$

\begin{equation} P_1(i,l|s_1)=\sum_{j=1}^{l-i+1} \underbrace{p_{11}^{i-j}p_{10}^{j}\binom{i-1}{j-1}}_{A0} \underbrace{p_{01}^{j-1}p_{00}^{l-i-j+1}\binom{l-i}{j-1}}_{B0}+ \sum_{j=1_{l>i}}^{l-i} \underbrace{p_{11}^{i-j}p_{10}^{j}\binom{i}{j}}_{A1}\underbrace{p_{01}^{j}p_{00}^{l-i-j}\binom{l-i-1}{j-1}}_{B_1}, \end{equation}

where in $A0$-$B0$ and $A1$-$B1$ the final state is assumed to be $s_0$ and $s_1$, respectively. First I explain $A0$-$B0$. To visit state $s_1$ a total of $i$ times is it necessary to have exactly a total of $i$ $s_1 \rightarrow s_0$ and/or $s_1 \rightarrow s_1$ state transitions. In particular, there are $i-j$ and $j$ $s_1\rightarrow s_1$ and $s_1 \rightarrow s_0$ state transitions, respectively. Since the final state is $s_0$, and $s_1 \rightarrow s_0$ transition occurs $j$ times, there must be a total of $j-1$ $s_0 \rightarrow s_1$ state transitions. The total number of state transitions is $l$, then the total number of $s_0 \rightarrow s_0$ state transitions must be $l-j-i+1$. Now lets focus on $A0$, for each $s_0 \rightarrow s_1$ transition a set of $s_1\rightarrow s_1$ state transitions may occur (note 1), the total number of $s_1\rightarrow s_1$ transition sets is $j$ (i.e., number of $s_0 \rightarrow s_1$ transitions plus one). In total there must be $i-j$ $s_1\rightarrow s_1$ state transitions spread over $j$ $s_1\rightarrow s_1$ transition sets, the total number of possible combinations is the is the weak composition (note 2) of $j$ naturals that sum up to $i-j$ (note 3). $B0$ is obtained similarly, for each $s_1\rightarrow s_0$ transition there might be a set of $s_0\rightarrow s_0$ transitions. Hence there are a total $l-i-j+1$ $s_0\rightarrow s_0$ transitions spread over $j$ $s_0\rightarrow s_0$ transitions sets. Hence, the total number of possible combinations is the weak composition of $j$ natural numbers that sum up to $l-i-j+1$.

Now I will explain $A1$-$B1$. In this case the final state is $s_1$, and the number of $s_1 \rightarrow s_0$ state transitions is the same as the number of $s_0 \rightarrow s_1$ state transitions. The total number of $s_1 \rightarrow s_0$ and $s_0 \rightarrow s_1$ state transitions is $j$, the total number of $s_1 \rightarrow s_1$ is $i-j$, and the total number of $s_0 \rightarrow s_0$ is $l-j-i$. The total number of $s_1 \rightarrow s_1$ state transition sets is $j+1$, and the total number of $s_0 \rightarrow s_0$ state transition sets is $j$. The total number possible combinations of having $i-j$ $s_1 \rightarrow s_1$ state transitions spread over $j+1$ transitions sets is the weak combination of $j+1$ natural numbers that add to $i-j$. Following the same rationale, the total number possible combinations of having $l-j-i$ $s_0 \rightarrow s_0$ state transitions spread over $j$ transitions sets is the weak combination of $j$ natural numbers that add to $l-j-i$.

To compute $P_1(i,l|s_0)$ we can make use of $P_1(i,l|s_1)$,

\begin{equation} P_1(i,l|s_0)=\sum_{j=1}^{l-i}p_{00}^{j-1}p_{01}P_1(i,l-j|s_1), \end{equation}

where the term $p_{00}^{j-1}p_{01}$ corresponds to first transition probability from $s_0$ to $s_1$.

It is still needed to check all possible cases, and the limits of the sums. However, I think that this is pretty much it.

Notes

  1. (note 1)A state transition set, is a group of transitions that occur one after the other, the sets can be empty.
  2. (note 2)The weak composition is the total number of possible combinations of summing up any $j$ natural number (including $0$) so that they sum up to $m$, and is given by $\binom{m+j-1}{j-1}$.
  3. (note 3)An explanation for that is the following: each time we go from $s_0$ to state $s_1$ we can stay in $s_1$ for $n$ transitions or to go to $s_0$ directly. In total we must have $i-j$ $s_1\rightarrow s_1$ state transitions, so the sum of all the $n$'s (one for each time we go from $s_0$ to $s_1$) must be equal to $i-j$, the total number of possible combinations of $n$'s is the weak composition of $j$ natural numbers that sum up to $i-j$.