The simplest examples come from stochastic matrices. Consider a finite set of possible states. Say that the probability of transitioning from state $i$ to state $j$ is $p_{ij}$. For fixed $i$, these probabilities need to add to $1$, so
$$\sum_j p_{ij} = 1$$
for all $i$. So the matrix $P$ whose entries are $p_{ij}$ needs to be right stochastic, which means that $P$ has non-negative entries and $P 1 = 1$ where $1$ is the vector all of whose entries are $1$.
By considering all the possible ways to transition between two states, you can prove by induction that the probability of transitioning from state $i$ to state $j$ after $n$ transitions is given by $(P^n)_{ij}$. So the problem of computing these probabilities reduces to the problem of computing powers of a matrix. If $P$ is diagonalizable, then this problem in turn reduces to the problem of computing its eigenvalues and eigenvectors.
Computing the expected time to get from state $i$ to state $j$ is a little complicated to explain in general. It will be easier to explain in examples.
Example. Let $0 \le p \le 1$ and let $P$ be the matrix
$$\left[ \begin{array}{cc} 1-p & p \\\ p & 1-p \end{array} \right].$$
Thus there are two states. The probability of changing states is $p$ and the probability of not changing states is $1-p$. $P$ has two eigenvectors:
$$P \left[ \begin{array}{c} 1 \\\ 1 \end{array} \right] = \left[ \begin{array}{c} 1 \\\ 1 \end{array} \right], P \left[ \begin{array}{c} 1 \\\ -1 \end{array} \right] = (1 - 2p) \left[ \begin{array}{c} 1 \\\ -1 \end{array} \right].$$
It follows that
$$P^n \left[ \begin{array}{c} 1 \\\ 1 \end{array} \right] = \left[ \begin{array}{c} 1 \\\ 1 \end{array} \right], P^n \left[ \begin{array}{c} 1 \\\ -1 \end{array} \right] = (1 - 2p)^n \left[ \begin{array}{c} 1 \\\ -1 \end{array} \right]$$
and transforming back to the original basis we find that
$$P^n = \left[ \begin{array}{cc} \frac{1 + (1 - 2p)^n}{2} & \frac{1 - (1 - 2p)^n}{2} \\\ \frac{1 - (1 - 2p)^n}{2} & \frac{1 + (1 - 2p)^n}{2} \end{array} \right].$$
Thus the probability of changing states after $n$ transitions is $\frac{1 - (1 - 2p)^n}{2}$ and the probability of remaining in the same state after $n$ transitions is $\frac{1 + (1 - 2p)^n}{2}$.
The expected number of transitions needed to change states is given by
$$\sum_{n \ge 1} n q_n$$
where $q_n$ is the probability of changing states after $n$ transitions. This requires that we do not change states for $n-1$ transitions and then change states, so
$$q_n = p (1 - p)^{n-1}.$$
Thus we want to compute the sum
$$\sum_{n \ge 1} np (1 - p)^{n-1}.$$
Verify however you want the identity
$$\frac{1}{(1 - z)^2} = 1 + 2z + 3z^2 + ... = \sum_{n \ge 1} nz^{n-1}.$$
This shows that the expected value is
$$\frac{p}{(1 - (1 - p))^2} = \frac{1}{p}.$$
An alternative approach is to use linearity of expectation. To compute the expected time $\mathbb{E}$ to changing states, we observe that with probability $p$ we change states (so we can stop) and with probability $1-p$ we don't (so we have to start all over and add an extra count to the number of transitions). This gives
$$\mathbb{E} = p + (1 - p) (\mathbb{E} + 1).$$
This gives $\mathbb{E} = \frac{1}{p}$ as above.
Let $X=\{X_n:n\in\mathbb N_0\}$ be a Markov chain with state space $S$ and transition probabilities $p_{ij}$. A vector $\pi\in\mathbb R^{\#S}$ is a stationary distribution for $X$ iff
- $\pi_j\geqslant0$ for all $j\in S$,
- $\pi_j = \sum_{i\in S}\pi_j p_{ij}$ for all $j\in S$,
- $\sum_{j\in S} \pi_j = 1$.
For each state $i\in S$, let $C_i = \{j\in S : i\leftrightarrow j\}$ be the communicating class of $i$. Assume that each class is positive recurrent, that is, for each $i\in S$, the return time $$\tau_{ii} = \inf\{n>0: X_n = i\mid X_0=i\} $$ has finite expectation. Then there exists a unique stationary distribution $\pi^{(i)}$ for $X$ conditioned on $X_0\in C_i$, with $$\pi^{(i)}_j = \frac1{\mathbb E[\tau_{jj}]},\ j\in C_i. $$ Let $\bar\pi^{(i)}$ be the extension of $\pi^{(i)}$ to $S$ with $\bar\pi^{(i)}_j=0$ for $j\in S\setminus C_i$, then the set of stationary distributions for $X$ is the convex hull of $\{\bar\pi^{(i)}:i\in S\}$. If there is a class $C_i$ which is not positive recurrent, then no stationary distribution exists for that class, and hence no stationary distribution exists for $X$.
In the case where $X$ is absorbing, however, the communicating classes are simply the singletons $\{i\}$ for the absorbing states with $p_{ii}=1$. Let $A\subset S$ be the set of absorbing states, then $\pi$ is a stationary distribution iff $\pi_j=0$ for $j\in S\setminus A$.
Best Answer
I found some relevant results in Kemeny & Snell's book "Finite Markov Chains", Chapter IV, Theorem 4.3.4.
Define $$ Z:=I+\sum_{i=1}^\infty (P^i-A) $$ where $I$ is the identity matrix, $P$ is the transition probability matrix, and $A$ is a matrix with the stationary distribution $s$ in each row. Then, assume that you start the chain with an initial distribution $t$ over the states, where $t$ might not be identical to $s$.
Let $M(b,a,n)$ be the expected number of times the Markov chain hits the state $a$ in the first $n$ steps, given that it is started in state $b$. Let $M(n)$ be the matrix with elements $M(b,a,n)$, where $b$ is the row and $a$ the column index. Then, you can show that
$$ M(n) - n A = \sum_{i=0}^{n-1} (P^i-A). $$
In other words, the expected number of times you hit $a$ depends not only on the initial state $b$ (and hence on the initial distribution), but also on the the fundamental matrix $Z$ of the markov chain. While the effect of the initial distribution becomes negligible for large $n$, the effect of the fundamental matrix remains.
Note that if the Markov chain is in fact an iid process, then $P=A$ and $Z=I$. Then $M(a)=nA$, and you get exactly an expected value of $ns_a$, as you expected. The difference is thus due to the temporal statistical dependence of the Markov chain.