For a Markov chain with $N<\infty$ states, the set $I$ of invariant probability vectors is a non-empty simplex in ${\mathbb R}^N$ whose extreme points correspond to the recurrent classes of the chain. Thus, the vector is unique iff there is exactly one recurrent class; the transient states (if any) play absolutely no role (as in Jens's example). The set $I$ is a point, line segment, triangle, etc. exactly when there are one, two, three, etc. recurrent classes.
If the invariant vector $\pi$ is unique, then there is only one recurrent class and the chain will eventually end up there. The vector $\pi$ necessarily puts zero mass on all transient states. Letting $\phi_n$ be the law of $X_n$, as you say, we have $\phi_n\to \pi$ only if the recurrent class is aperiodic. However, in general we have Cesàro convergence:
$${1\over n}\sum_{j=1}^n \phi_j\to\pi.$$
An infinite state space Markov chain need not have any recurrent states, and may have the zero measure as the only invariant measure, finite or infinite. Consider the chain on the positive integers which jumps to the right at every time step.
Generally, a Markov chain with countable state space has invariant probabilities iff there are positive recurrent classes. If so, every invariant probability vector $\nu$ is a convex combination of
the unique invariant vector $m_j$ corresponding to each positive recurrent class $j\in J$, i.e.,
$$\nu=\sum_{j\in J} c_j m_j,\qquad c_j\geq 0,\quad \sum_{j\in J}c_j=1.$$
This result is Corollary 3.23 in Wolfgang Woess's Denumerable Markov Chains.
Introduction: Natural language represented using Markov model
A very simple and practical example would be that of natural language.
Now first of all, natural language is just a terminology for any language you speak and have acquired or learned, such as English. When you communicate, you have a set of words {$W$} and a set of operations you can perform on these words. You then proceed further to arrange the words that you know in a certain order that produces meaningful elements of communication/semantics -- "sentences".
When you use words sentences, they are obviously not arbitrarily placed words, they have some order. In fact, sentence formation can be well modeled as a Markov chain model with order "$k$"$=1$ (reference).
Sentence formation and conditional probability
Now establish a relation between your possible states $[1, 6]$ and certain six words. It can be experimentally determined that $P(C | B) \geq P(C|A,B)$ where $A,B$ is an ordered occurrence of the words $A$ and $B$ in a sentence. This relation always holds in case of a natural language as $P (C | B)$ already includes all possible trigrams that can be formed as "$X B C$" where $X$ is a general word of the language. In computational linguistics, this is also termed as the transition probability.
I'll give you an example of that as well:
Let let us, however, consider the backward probability $P(A|$"$A,B$"$)$ i.e., the probability that $A$ is the word processing $B$. Now let $A$ be "chocolate" and $B$ be "chip". Now there will exist a certain probability in the historical literature of the language for $P(A|$"$A,B$"$) = p_{k=1}$. However, let us now consider another word $C$, that represents "cookie". When you compute the new probability, $P(A|``A,B,C")=p_{k=2}$, then you find that $p_{k=2} \leq p_{k=1}$, because there are more words that you can say after saying "chocolate chip" other than simply "cookie". I have denoted it as "$\leq$" because theoretically it is possible that only one such meaningful triplet can be formed in a natural language, although that's rarely ever the case. You cohld simply replace is with a strict inequality relation for all practical purposes.
I think this provides a sufficient as well as intuitive example for the problem thay you have stated. Let me know if something from the answer is unclear.
Best Answer
The key phrase is "given the present". If past and future are independent given the present, it doesn't follow that past and future are unconditionally independent.
For example, consider the simple random walk that takes a step either left or right with equal probability. If you know where I am today, then the knowledge of where I was yesterday won't affect where you think I'll be tomorrow. OTOH if you don't know where I am today, then knowing where I was yesterday will affect where you think I will be tomorrow, since tomorrow I can be at most two steps away from where I was yesterday.