I prefer the first definition by far. I relate the question to ergodic theory, as seems appropriate, and assume that the chain hass finitely many possible values, so as to not bother with positive recurrence.
Let us consider a finite state space $A$, and denote all the possible sequences of element in $A$ by $X:=A^{\mathbb{N}}$. Let us define a transformation $\sigma$ on $X$ by $(\sigma x)_n = x_{n+1}$ on $X$. For $x \in X$, we have $x_n = (\sigma^n x)_0$. In other words, by applying the transformation $\sigma$, I can read the successive values of a given sequence.
Now, let us take some probability measure $\mu$ on $A$ with full support (so as to see everything), and a stochastic matrix $P$ (the transition kernel). Using $\mu$ as the distribution of $X_0$ and the matrix $P$ to define transitions, we get a Markov chain $(X_n)_{n \geq 0} = x = ((\sigma^n x)_0)_{n \geq 0}$, which is a stochastic process with values in $A$. The distribution of $(X_n)_{n \geq 0}$ is a measure $\overline{\mu}$ on $A^{\mathbb{N}}$ which satisfies the usual conditions on cylinders, and whose first marginal is $\mu$.
The construction may look a bit confusing. However, if you forget about $\sigma$, it is what is done more or less informally when one defines Markov chains (that is the construction may be hidden, but it is there).
Hence, we can consider a Markov chain as a dynamical system $(X, \sigma)$ together with a probability measure $\overline{\mu}$. We can use the definitions of ergodic theory, and what we get in the end is that:
- the system $(X, \sigma, \overline{\mu})$ is measure-preserving if and only if $\mu$ is stationnary for $P$;
- the system $(X, \sigma, \overline{\mu})$ is ergodic (in the sense of ergodic theory) if and only if the Markov chain is irreducible;
- the system $(X, \sigma, \overline{\mu})$ is mixing if and only if the Markov chain is irreducible and aperiodic.
So these are two very different conditions, and aperiodicity does not correspond to ergodicity. As a corollary, one can apply ergodic theorems to Markov chains with no need for aperiodicity.
For a finite MC it holds that
aperiodic + irreducible $\Leftrightarrow$ ergodic $\Leftrightarrow$ regular
as you expected. For an infinite MC it holds that
aperiodic + irreducible + positive recurrent $\Leftrightarrow$ ergodic,
and being "regular" in the infinite setting would require a more precise definition.
................................ explanations following ................................
For every finite or inifinite Markov chain (MC) it holds that
$aperiodic + irreducible + positive~recurrent \Leftrightarrow ergodic$.
See for example here for a proof. For every finite MC, irreducibility already implies positive recurrence, see here for a proof.
Further, for every finite MC we have that
$aperiodic + irreducible \Leftrightarrow regular$.
Proof sketch: the definition of a finite irreducible MC gives that $\forall i, j \in \Omega : \exists k > 0 : P^k[i,j] > 0$.
However, there might be no $k$ such that all entries are simultaneously positive - due to periodicities. But if the chain is additionally aperiodic, it follows that
$\exists k > 0 : \forall i, j \in \Omega : P^k[i,j] > 0$,
which matches your definition of being regular.
Finally, I don't see a canonical way how you would generalize the property "regular" to infinite Markov chains. So, I just ignore the term "regular" for infinite chains here.
Best Answer
For a finite Markov chain, the nicest proof that I know of goes through under the weaker assumption that for any two states, there is a common third state they can both reach with positive probability, after some number of steps. And that it's aperiodic.
(This proof involves starting one Markov chain from a stationary distribution, another one from an arbitrary state, and coupling them so that if they're ever in the same state, they continue to be in the same state. Then the probability that the two Markov chains get stuck together approaches $1$ with time, so the second one converges to the stationary distribution.)
It's easy to see that both conditions are also necessary, so that answers the question for finite Markov chains.
For infinite Markov chains, this condition needs to be stronger for the same proof to work: that there exist $N, \epsilon$ such that for any two states, there is a common third state that they can both reach with probability $\epsilon$ after $N$ steps. We get this for free in the finite case, but it's not a good hypothesis to take in the infinite case: it's false for many perfectly well-behaved Markov chains.