Your statements are slightly confusing, since a transient chain could be considered "non null recurrent" (after all, it is not null recurrent). So you should replace your statements with "positive recurrent":
if all states of an irreducible Markov chain are positive recurrent, then the MC has a unique stationary distribution
if an irreducible Markov chain is finite, then all of its states are positive recurrent.
You can also say that if a Markov chain is irreducible and positive recurrent, then the (unique) stationary distribution has strictly positive components.
Every finite state irreducible Markov chain $\{M(t)\}_{t=0}^{\infty}$ has a unique stationary distribution $\pi =(\pi_i)_{i \in S}$ (where $S$ denotes the finite state space). When you simulate, with probability 1, the sample path fractions of time converge to this distribution, so that:
$$ \lim_{T\rightarrow\infty} \frac{1}{T}\sum_{t=0}^{T-1} 1\{M(t)=i\} = \pi_i \quad, \forall i \in S \quad \mbox{(with prob 1)}$$
regardless of the initial state $M(0)$. Taking expectations of both sides and using the bounded convergence theorem together with $E[1\{M(t)=i\}]=P[M(t)=i]$ we also get:
$$ \lim_{T\rightarrow\infty} \frac{1}{T}\sum_{t=0}^{T-1} P[M(t)=i] = \pi_i \quad, \forall i \in S$$
If the chain is finite state, irreducible, and also aperiodic, you can further say
$$ \lim_{n\rightarrow\infty} P[M(t)=i|M(0)=j] = \pi_i \quad, \forall i \in S$$
regardless of the initial state $j \in S$. So if the chain is finite state, irreducible, but limiting probabilities do not converge, then the chain cannot be aperiodic.
If $M(t)$ is finite state, irreducible, and periodic with period $d>1$, then limiting probabilities cannot converge (assuming we start in a particular state with probability 1). This is because:
\begin{align*}
\lim_{k\rightarrow\infty} P[M(kd)=i|M(0)=i] > 0 \\
\lim_{k\rightarrow\infty} P[M(kd+1)=i|M(0)=i] = 0
\end{align*}
This is because the Markov chain $\{Z(k)\}_{k=0}^{\infty}$ defined by $Z(k)=M(kd)$ is irreducible and aperiodic (over an appropriately reduced state space) and so all states it can reach have positive steady state values.
With this reasoning, it can be shown that $P[M(t)=i|M(0)=j]$ converges (as $t\rightarrow\infty$) to a periodic function with period $d$. The particular $d$-periodic function it converges to depends on the intial state.
This should be pretty clear if you do a few matlab examples: Plot $\vec{\pi}(t) = \vec{\pi}(0)P^t$ versus $t \in \{0, 1, 2, ...\}$ for some examples with $\vec{\pi}(0)=[1 , 0, 0, ...]$ or $\vec{\pi}(0)=[0, 1, 0, 0, ...]$.
Theorem 45 in chapter 4 of Richard Serfozo's Basics of Applied Stochastic Processes says that any ergodic CTMC $X(t)$ with stationary distribution $p$, and any function $f:S\to\mathbb R$,
$$\lim_{t\to\infty}\frac1t\int_0^t f(X(s))\,\mathrm ds = \sum_j f(j)p_j$$
a.s. provided the sum is absolutely convergent.
Since the text defines an ergodic CTMC as one which is irreducible and positive recurrent, in the language of the question this says that as long as the chain is irreducible and $\mathbb E_\pi[|X(0)|]<\infty$ then
$$\lim_{t\to\infty}\frac1t\int_0^t X(s)\,\mathrm ds = \mathbb E_\pi[X(0)],$$
which is what I wanted.
(I think I had the assumptions of irreducibility and finite mean in mind when I wrote the question, but I clearly didn't include them. I'll edit the question.)
Best Answer
All of your statements are true. For the first, you can use a purely linear algebraic fact, that a stochastic matrix always has an eigenvalue of 1 with a left eigenvector whose entries are nonnegative.
For the second, let's say the chain has $m$ communicating classes $C_j$. If the initial condition is entirely in a single communicating class $C_j$ (i.e. $\sum_{i \in C_j} \pi^0_i = 1$), then the time average of the distributions converge to the unique stationary distribution of the Markov chain restricted to $C_j$. This follows from the ergodic theorem. So if you find the time average limit of each subchain, then you can manage the whole chain by writing
$$\mathbb{P} \left ( X_k = i \right ) = \sum_{j=1}^m \mathbb{P} \left ( \left. X_k = i \right | X_0 \in C_{j} \right ) \mathbb{P} \left ( X_0 \in C_{j} \right ) \\ = \mathbb{P} \left ( \left. X_k = i \right | X_0 \in C_{j(i)} \right ) \mathbb{P} \left ( X_0 \in C_{j(i)} \right )$$
where $C_{j(i)}$ is the communicating class of state $i$. Then the second term is constant and the time average of the first term is convergent, so the time average of the whole thing is also convergent.
For the third, you can use linearity:
$$\left ( \frac{\sum_{k=1}^t \pi^k}{t} \right ) A = \frac{\sum_{k=1}^t \pi^k A}{t} = \frac{\left ( \sum_{k=1}^t \pi^k \right ) + \pi^{t+1} - \pi^1}{t}.$$
Taking $t \to \infty$ and exploiting continuity of $A$, we get
$$\left ( \lim_{t \to \infty} \frac{\sum_{k=1}^t \pi^k}{t} \right ) A = \lim_{t \to \infty} \frac{\sum_{k=1}^t \pi^k}{t}.$$