I prefer the first definition by far. I relate the question to ergodic theory, as seems appropriate, and assume that the chain hass finitely many possible values, so as to not bother with positive recurrence.
Let us consider a finite state space $A$, and denote all the possible sequences of element in $A$ by $X:=A^{\mathbb{N}}$. Let us define a transformation $\sigma$ on $X$ by $(\sigma x)_n = x_{n+1}$ on $X$. For $x \in X$, we have $x_n = (\sigma^n x)_0$. In other words, by applying the transformation $\sigma$, I can read the successive values of a given sequence.
Now, let us take some probability measure $\mu$ on $A$ with full support (so as to see everything), and a stochastic matrix $P$ (the transition kernel). Using $\mu$ as the distribution of $X_0$ and the matrix $P$ to define transitions, we get a Markov chain $(X_n)_{n \geq 0} = x = ((\sigma^n x)_0)_{n \geq 0}$, which is a stochastic process with values in $A$. The distribution of $(X_n)_{n \geq 0}$ is a measure $\overline{\mu}$ on $A^{\mathbb{N}}$ which satisfies the usual conditions on cylinders, and whose first marginal is $\mu$.
The construction may look a bit confusing. However, if you forget about $\sigma$, it is what is done more or less informally when one defines Markov chains (that is the construction may be hidden, but it is there).
Hence, we can consider a Markov chain as a dynamical system $(X, \sigma)$ together with a probability measure $\overline{\mu}$. We can use the definitions of ergodic theory, and what we get in the end is that:
- the system $(X, \sigma, \overline{\mu})$ is measure-preserving if and only if $\mu$ is stationnary for $P$;
- the system $(X, \sigma, \overline{\mu})$ is ergodic (in the sense of ergodic theory) if and only if the Markov chain is irreducible;
- the system $(X, \sigma, \overline{\mu})$ is mixing if and only if the Markov chain is irreducible and aperiodic.
So these are two very different conditions, and aperiodicity does not correspond to ergodicity. As a corollary, one can apply ergodic theorems to Markov chains with no need for aperiodicity.
What you're confusing is the idea of the existance invariant measure and convergence to the invariant distribution.
A lot of Markov chains have invariant measures (in fact, on a finite state space you always have at least one, and in general for irreducibility a null recurrent chain will have an invariant measure and positive recurrent will have invariant distributions) but don't converge to the invariant measure from any measure other than the invariant measure. In the example you give, they are both irreducible, not aperiodic so you don't have convergence to the invariant distribution (but it does have an invariant distribution).
I'd recommend reading Ch. 1 of Norris' Markov Chains for more details.
Best Answer
Yes, the Markov chain you gave is aperiodic. To see this, you can try proving that whenever the underlying graph is strongly connected (or, in other words, the MC is irreducible) and contains at least one self-loop, then the Markov chain is aperiodic.