This is my answer to my own question. I wanted to get an expression easy to work with, but this is the best I got.
The question is to compute the probability $P(N_1 \geq N | L)$ that is the probability that the number of visits to state $s_1$ (denoted by $N_1$) is larger than $N$ after $L$ state transitions. If state $s_1$ is revisited from state $s_1$ this is counted as an another visit. I count the visits to state $s_1$ as the number of state transitions that start from $s_1$; that is, the number of times that transitions $s_1 \rightarrow s_0$ and $s_1 \rightarrow s_0$ occur.
I denote by $P_1(n,l|s_i)$ the probability of visiting exactly $n$ times state $s_1$ in $l$ state transitions when the initial state is $s_i$. Then $P(N_1 \geq N | L)$ is
\begin{equation}
P(N_1 \geq N | L)=p_1\sum_{i=N_1}^{L}P_1(i,L|s_1)+p_0\sum_{i=N_1}^{L-1}P_1(i,L|s_0),
\end{equation}
where the second term sums up to $L-1$ only because the initial state is $s_0$. I now compute $P_1(i,l|s_1)$
\begin{equation}
P_1(i,l|s_1)=\sum_{j=1}^{l-i+1} \underbrace{p_{11}^{i-j}p_{10}^{j}\binom{i-1}{j-1}}_{A0} \underbrace{p_{01}^{j-1}p_{00}^{l-i-j+1}\binom{l-i}{j-1}}_{B0}+ \sum_{j=1_{l>i}}^{l-i} \underbrace{p_{11}^{i-j}p_{10}^{j}\binom{i}{j}}_{A1}\underbrace{p_{01}^{j}p_{00}^{l-i-j}\binom{l-i-1}{j-1}}_{B_1},
\end{equation}
where in $A0$-$B0$ and $A1$-$B1$ the final state is assumed to be $s_0$ and $s_1$, respectively. First I explain $A0$-$B0$. To visit state $s_1$ a total of $i$ times is it necessary to have exactly a total of $i$ $s_1 \rightarrow s_0$ and/or $s_1 \rightarrow s_1$ state transitions. In particular, there are $i-j$ and $j$ $s_1\rightarrow s_1$ and $s_1 \rightarrow s_0$ state transitions, respectively. Since the final state is $s_0$, and $s_1 \rightarrow s_0$ transition occurs $j$ times, there must be a total of $j-1$ $s_0 \rightarrow s_1$ state transitions. The total number of state transitions is $l$, then the total number of $s_0 \rightarrow s_0$ state transitions must be $l-j-i+1$. Now lets focus on $A0$, for each $s_0 \rightarrow s_1$ transition a set of $s_1\rightarrow s_1$ state transitions may occur (note 1), the total number of $s_1\rightarrow s_1$ transition sets is $j$ (i.e., number of $s_0 \rightarrow s_1$ transitions plus one). In total there must be $i-j$ $s_1\rightarrow s_1$ state transitions spread over $j$ $s_1\rightarrow s_1$ transition sets, the total number of possible combinations is the is the weak composition (note 2) of $j$ naturals that sum up to $i-j$ (note 3). $B0$ is obtained similarly, for each $s_1\rightarrow s_0$ transition there might be a set of $s_0\rightarrow s_0$ transitions. Hence there are a total $l-i-j+1$ $s_0\rightarrow s_0$ transitions spread over $j$ $s_0\rightarrow s_0$ transitions sets. Hence, the total number of possible combinations is the weak composition of $j$ natural numbers that sum up to $l-i-j+1$.
Now I will explain $A1$-$B1$. In this case the final state is $s_1$, and the number of $s_1 \rightarrow s_0$ state transitions is the same as the number of $s_0 \rightarrow s_1$ state transitions. The total number of $s_1 \rightarrow s_0$ and $s_0 \rightarrow s_1$ state transitions is $j$, the total number of $s_1 \rightarrow s_1$ is $i-j$, and the total number of $s_0 \rightarrow s_0$ is $l-j-i$. The total number of $s_1 \rightarrow s_1$ state transition sets is $j+1$, and the total number of $s_0 \rightarrow s_0$ state transition sets is $j$. The total number possible combinations of having $i-j$ $s_1 \rightarrow s_1$ state transitions spread over $j+1$ transitions sets is the weak combination of $j+1$ natural numbers that add to $i-j$. Following the same rationale, the total number possible combinations of having $l-j-i$ $s_0 \rightarrow s_0$ state transitions spread over $j$ transitions sets is the weak combination of $j$ natural numbers that add to $l-j-i$.
To compute $P_1(i,l|s_0)$ we can make use of $P_1(i,l|s_1)$,
\begin{equation}
P_1(i,l|s_0)=\sum_{j=1}^{l-i}p_{00}^{j-1}p_{01}P_1(i,l-j|s_1),
\end{equation}
where the term $p_{00}^{j-1}p_{01}$ corresponds to first transition probability from $s_0$ to $s_1$.
It is still needed to check all possible cases, and the limits of the sums. However, I think that this is pretty much it.
Notes
- (note 1)A state transition set, is a group of transitions that occur
one after the other, the sets can be empty.
- (note 2)The weak composition is the total number of possible combinations of summing
up any $j$ natural number (including $0$) so that they sum up to
$m$, and is given by $\binom{m+j-1}{j-1}$.
- (note 3)An explanation for that is the following: each time we go from $s_0$ to state $s_1$ we can stay in $s_1$ for $n$ transitions or to go to $s_0$ directly. In total we must have $i-j$ $s_1\rightarrow s_1$ state transitions, so the sum of all the $n$'s (one for each time we go from $s_0$ to $s_1$) must be equal to $i-j$, the total number of possible combinations of $n$'s is the weak composition of $j$ natural numbers that sum up to $i-j$.
First we can do some reasoning about why we use $\pi Q = 0$ to get the stationary distribution. According to the definition of stationary distribution, the equation we are to solve is $\pi P(t)=\pi$, where $P(t)$ is the transition matrix of the process.
One thing to note is that the $\mathbf{P}$ you mentioned is not the same as $P(t)$, where the latter is a function of $t$. $\mathbf{P}$ you mentioned is the jump matrix. It is kind of like what you said, an underlying discrete-time transition matrix.
Let's get back to the equation $\pi P(t)=\pi$. Obviously it is not easy to solve. Hence we can take derivative with respect to $t$, and the LHS becomes $\pi Q$, and the RHS becomes $0$.
To answer your second question. The question asked explicitly for $\mathbb{P}(X(t)=1)$ as $t\rightarrow \infty$. Hence the distribution we want is actually the limiting distribution. Under the condition that $X$ is irreducible with a standard semigroup $\{\mathbb{P}(t),t\geq 0\}$ of transition probabilities, we can say that the stationary distribution is also the limiting distribution.
Also, for this question the full balance equation $\pi Q = 0$ is also the detailed balance equation, indicating the process is reversible.
Best Answer
Define $A_k = \{X_k \in S_k, X_{k-1} \in S_{k-1}, \ldots, X_1 \in S_1\}$. Assume that $Pr[X_1=j]$ is known for all $j \in S_1$.
For $k>1$, note that $Pr[A_k|A_{k-1}] = \frac{Pr[A_k, A_{k-1}]}{Pr[A_{k-1}]} = \frac{Pr[A_k]}{Pr[A_{k-1}]}$.
For $k>1$ define:
\begin{align} q[k] &= Pr[A_k|A_{k-1}]\\ r_j[k] &= Pr[X_k=j|A_{k-1}] \: \: \forall j \in S_k \end{align} Define $q[0]=1$, $r_j[1] = Pr[X_1=j]$. Define $Pr[A_0]=1$. Iterate on the following for $k\in\{1, 2, 3, \ldots\}$:
On iteration $k$:
Assume the following is known: $q[k-1]$, $Pr[A_{k-1}]$, and $r_j[k]$ for $j \in S_{k}$ (these are indeed known for $k=1$).
Then do the following steps:
1) Compute $q[k]$ in terms of the knowns.
2) For each $j \in S_{k+1}$, compute $r_j[k+1]$ in terms of the knowns.
3) $Pr[A_k]=q[k]Pr[A_{k-1}]$.
Each iteration has roughly the same complexity. On each iteration, the complexity of step 2 is about the same as the complexity of multiplying a matrix by a vector.
Another method would be to augment the state space by adding a "state 0" that is a "trapping state" from which, if entered, you can never leave. Then appropriately adjust the transition probabilities.