For a Markov chain with $N<\infty$ states, the set $I$ of invariant probability vectors is a non-empty simplex in ${\mathbb R}^N$ whose extreme points correspond to the recurrent classes of the chain. Thus, the vector is unique iff there is exactly one recurrent class; the transient states (if any) play absolutely no role (as in Jens's example). The set $I$ is a point, line segment, triangle, etc. exactly when there are one, two, three, etc. recurrent classes.
If the invariant vector $\pi$ is unique, then there is only one recurrent class and the chain will eventually end up there. The vector $\pi$ necessarily puts zero mass on all transient states. Letting $\phi_n$ be the law of $X_n$, as you say, we have $\phi_n\to \pi$ only if the recurrent class is aperiodic. However, in general we have Cesàro convergence:
$${1\over n}\sum_{j=1}^n \phi_j\to\pi.$$
An infinite state space Markov chain need not have any recurrent states, and may have the zero measure as the only invariant measure, finite or infinite. Consider the chain on the positive integers which jumps to the right at every time step.
Generally, a Markov chain with countable state space has invariant probabilities iff there are positive recurrent classes. If so, every invariant probability vector $\nu$ is a convex combination of
the unique invariant vector $m_j$ corresponding to each positive recurrent class $j\in J$, i.e.,
$$\nu=\sum_{j\in J} c_j m_j,\qquad c_j\geq 0,\quad \sum_{j\in J}c_j=1.$$
This result is Corollary 3.23 in Wolfgang Woess's Denumerable Markov Chains.
A second order Markov chain is a random process $(X_n)_n$ on an alphabet $A$, whose distribution is specified by its transition probabilities $Q(x\mid y,z)=P[X_n=x\mid X_{n-1}=y,X_{n-2}=z]$, for every $(x,y,z)$ in $A\times A\times A$ (and by an initial distribution on $A\times A$).
A stationary distribution of $(X_n)$ is a probability measure $\pi$ on $A\times A$ such that, if $\pi(x,y)=P[X_n=x,X_{n-1}=y]$ for every $(x,y)$ in $A\times A$ and some $n$, then $\pi(x,y)=P[X_{n+1}=x,X_{n}=y]$ for every $(x,y)$ in $A\times A$.
Thus, one asks that, for every $(x,y)$ in $A\times A$,
$$
\pi(x,y)=\sum_{z\in A}Q(x\mid y,z)\pi(y,z).
$$
As in the first order case, this linear system, together with the normalizing condition
$$
\sum_{(x,y)\in A\times A}\pi(x,y)=1,
$$
fully determines $\pi$ as soon as $(X_n)_n$ is irreducible.
A new feature, absent of the first order case, is that every stationary distribution $\pi$ has identical marginals, that is, for every $x$ in $A$,
$$
\varrho(x)=\sum_{y\in A}\pi(x,y)=\sum_{y\in A}\pi(y,x).
$$
Finally, the MLE of $\pi$ based on $(X_k)_{0\leqslant k\leqslant n}$ is $\hat\pi_n$ defined by
$$
\hat\pi_n(x,y)=\frac1n\sum_{k=1}^n\mathbf 1_{X_k=x,X_{k-1}=y}.
$$
The MLE is consistent, that is, $\hat\pi_n(x,y)\to\pi(x,y)$ almost surely, for every $(x,y)$ in $A\times A$, when $n\to\infty$. In particular, the frequency of $x$ in $A$ stabilizes, since
$$
\frac1n\sum_{k=1}^n\mathbf 1_{X_k=x}=\sum_{y\in A}\hat\pi_n(x,y)\to\varrho(x).
$$
Best Answer
The result follows immediately from applying the elementary renewal theorem to delayed renewal processes.
Here's a more elementary algebraic proof, using telescoping.
(problem 16, page 468 of Grinstead and Snell's free book https://math.dartmouth.edu/~prob/prob/prob.pdf )
for stochastic, $\text{m x m}$ matrix $P$
$\mathbf \pi^T P = \mathbf \pi^T$ and $ P\mathbf 1 = \mathbf 1$,
$W:= \mathbf 1 \mathbf \pi^T$ and $\text{trace}\big(W\big) = 1$
consider the following telescope
$\Big(I+P+P^2+....+ P^{n-1}\Big)\Big(I-P+W\Big) = I -P^n +nW$
thus
$\frac{1}{n}\Big(I+P+P^2+....+ P^{n-1}\Big)$
$= \frac{1}{n}\big(I -P^n +nW\big)\Big(I-P+W\Big)^{-1} $
$= \frac{1}{n}\Big\{\Big(I-P+W\Big)^{-1}\Big\} -\frac{1}{n}\Big\{P^n\Big(I-P+W\Big)^{-1}\Big\} +\frac{1}{n}\Big\{nW\Big(I-P+W\Big)^{-1}\Big\} $
$= \frac{1}{n}\Big(I-P+W\Big)^{-1} -\frac{1}{n}P^n\Big(I-P+W\Big)^{-1} +W$
now pass limits
$\lim_{n\to\infty}\Big\{ \frac{1}{n}\Big(I-P+W\Big)^{-1} -\frac{1}{n}P^n\Big(I-P+W\Big)^{-1} +W\Big\}$
$= \Big\{\lim_{n\to\infty}\frac{1}{n}\Big(I-P+W\Big)^{-1}\Big\} -\Big\{\lim_{n\to\infty} \frac{1}{n}P^n\Big(I-P+W\Big)^{-1}\Big\} +W$
$=0+0+W$
so
$W=\lim_{n\to\infty} \frac{1}{n}\Big(I+P+P^2+....+ P^{n-1}\Big)$
That's the argument in its entirety. I've left three book-keeping details for the end.
re: the third term simplification $W\Big(I-P+W\Big)^{-1}=W$
suppose $\Big(I-P+W\Big)^{-1}$ exists, then consider the inverse problem
$W\Big(I-P+W\Big) = W-WP +W^2 = W-W + W = W$
now multiply both sides on the right by $\Big(I-P+W\Big)^{-1}$
re: the second limit
observe that
$\Big\Vert \frac{1}{n}P^n\Big(I-P+W\Big)^{-1} - \mathbf 0\Big\Vert_F$
$ = \frac{1}{n}\Big\Vert P^n\Big(I-P+W\Big)^{-1}\Big\Vert_F$
$ \leq \frac{1}{n}\Big\Vert P^n\Big\Vert_F\cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F $
$ \leq \frac{1}{n} \mathbf 1^T P^n \mathbf 1 \cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F $
$ = \frac{1}{n} \mathbf 1^T \mathbf 1 \cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F $
$ = \frac{m}{n} \cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F$
$ \lt \epsilon $
for large enough n
(The second to last inequality follows from triangle inequality)
re: the invertibility of $\Big(I-P+W\Big)$
we prove $\det\Big(I-P+W\Big)=\prod_{j=2}^n (1-\lambda_j)$ and hence the matrix is invertible.
the nicest proof involves (partial) symmetrization:
using Perron Frobenius theory, we know that $\lambda_1 =1 $ is simple since $P$ is irreducibile.
$\mathbf v_1 := \mathbf \pi^\frac{1}{2}\cdot \frac{1}{\big \Vert \mathbf \pi^\frac{1}{2}\big \Vert_2}$
(where the square root is interpretted to be taken component-wise)
diagonal matrix $D:=\text{diag}\big(\mathbf v_1\big)$
Consider the similar matrix
$D\Big(I-P+W\Big)D^{-1} = I- (DPD^{-1}) +DWD^{-1} = I - B + \mathbf v_1\mathbf v_1^T$
$B$ has $\mathbf v_1$ as its left and right eigenvectors (check!).
Working over $\mathbb C$ and applying Schur Triangularization to $B$:
$V := \bigg[\begin{array}{c|c|c|c}\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}\end{array}\bigg]$
$B = VRV^{-1} = VRV^{*} =V\begin{bmatrix} 1 & \mathbf x_{m-1}^*\\ \mathbf 0 & \mathbf R_{m-1} \end{bmatrix}V^* =V\begin{bmatrix} 1 & \mathbf 0^T\\ \mathbf 0 & \mathbf R_{m-1} \end{bmatrix}V^*$
note $\mathbf x_{m-1} = \mathbf 0$ because $ \mathbf v_1^T = \mathbf v_1^* =\mathbf v_1^* B = 1\cdot \mathbf v_1^* + \sum_{j} x_j\cdot \mathbf v_j^*$
and the columns of $\mathbf V$ (or rows of $\mathbf V^*$) are linearly independent so every $x_j =0$
By simplicity of the Perron root: $\mathbf R_{m-1}$ does not have eigenvalues of 1, so
$I -B + \mathbf v_1 \mathbf v_1^T = V\big(I-R + \mathbf e_1\mathbf e_1^T\big)V^{*} =V\begin{bmatrix} 1 & \mathbf 0^T \\ \mathbf 0 & I_{m-1} -\mathbf R_{m-1} \end{bmatrix}V^*$
hence the determinant is $1\cdot \prod_{j=2}^n (1-\lambda_j) \neq 0$.