It's useful to consider finite-state Markov chains with states $\{ 1, \ldots, N \}$. Such a Markov chain is defined by its transitions matrix $P = (P_{ij})_{i,j=1}^N$. We require that $0 \leq P_{ij} \leq 1$ for each $i, j = 1, \ldots, N$ and that $\sum_{j=1}^N P_{ij} = 1$. Thus, we can think of $P_{ij}$ as the probability of jumping from state $i$ to state $j$. We initialize the Markov chain in a state $X_0$ and let $X_n$ be the state at time $n$ (so $X_n$ is a random variable in $\{ 1, \ldots, N \}$).
A natural requirement is that the Markov chain be irreducible, which essentially means that we can get from any state to any other state with positive probability.
A finite-state Markov chain is said to be ergodic if it is irreducible and has an additional property called aperiodicity. The ergodic theorem for Markov chains says (roughly) that an ergodic Markov chain approaches its "stationary distribution" (see the previous link) as time $n \to \infty$.
Now in the case of physical systems, an additional assumption is usually that the system be reversible. It turns out that the stationary distribution of a finite-state irreducible reversible Markov chain is the uniform distribution, which assigns equal probability $1/N$ to each of the possible states.
Putting all this together, we see that a finite-state reversible ergodic Markov chain converges to the uniform distribution (i.e. reaches an equilibrium as time goes to infinity in which all states are equally likely).
The notion of ergodic dynamical system you asked about is a vast generalization of this idea.
A stochastic process is a collection $\{X_t\}_t$ of real-valued (for this discussion) random variables defined on a common probability space $(\Omega,\mathcal{F},P)$. For this discussion, we'll have the indexing be over the natural numbers $\mathbb{N}$: $(X_n)_{n \ge 1}$. For the definitions we consider, we insist that each $X_n$ has the same mean, denoted $E[X]$. A process $(X_n)_n$ is stationary if, for any $m \ge 1$ and any Borel sets $A_1,\dots,A_m$, it holds that $P(X_1 \in A_1, \dots, X_m \in A_m) = P(X_2 \in A_1,\dots,X_{m+1} \in A_m)$; in other words, $(X_n)_n$ is stationary if the law of $(X_1,X_2,X_3,\dots)$ is the same as the law of $(X_2,X_3,\dots)$ (where the value space is $\mathbb{R}^\mathbb{N}$ with product sigma-algebra).
We say that $(X_n)_n$ is ergodic if $(X_n)_n$ is stationary and $(\mathbb{R}^\mathbb{N},\mathcal{B}^\mathbb{N},P^\mathbb{N},T)$ is ergodic as a measure preserving system, where $P^\mathbb{N}(A) := P((X_1(\omega),X_2(\omega),\dots) \in A)$ and $T$ is the left shift: $T((x_1,x_2,x_3,\dots)) := (x_2,x_3,\dots)$ (note that $(X_n)_n$ stationary implies $T$ preserves $P^\mathbb{N}$). A consequence of this notion of ergodicity is that for each measurable $f: \mathbb{R} \to \mathbb{R}$, it holds that $\frac{1}{N}\sum_{n \le N} f(X_n(\omega)) \to E[f(X)]$ with probability $1$.
We say that $(X_n)_n$ is mean-ergodic if $E\left[|\frac{1}{N}\sum_{n \le N} X_n - E[X]|^2\right] \to 0$ as $N \to \infty$. Note that $(X_n)_n$ need not be stationary.
We say that $(X_n)_n$ is pointwise-ergodic if $\frac{1}{N}\sum_{n \le N} X_n(\omega) \to E[X]$ with probability $1$. Note that ergodic implies pointwise-ergodic.
We say that $(X_n)_n$ is LLN-applicable if the $X_n's$ are i.i.d. and the set of $\omega \in \Omega$ satisfying $\frac{1}{N}\sum_{n \le N} X_n(\omega) \to E[X]$ has measure/probability $1$.
It is not clear what is usually meant by "ergodic". Wikipedia uses "ergodic" to mean mean-ergodic.
As the comments indicate, there are $(X_n)_n$ that are pointwise-ergodic but not i.i.d. and thus not LLN-applicable. For example, we can take any $Y_n$'s that are i.i.d. and then consider $(X_1,X_2,\dots) = (Y_1,Y_1,Y_2,Y_3,\dots)$.
This answer gives an example of $(X_n)_n$ that are stationary, pointwise-ergodic, but not ergodic. The example is also mean-ergodic if $E[Y_i] = 0$ and $E[Y_i^2] < \infty$.
Let $Y_1,Y_2,\dots$ be independent with, for each $i$, $P(Y_i = 1) = \frac{1}{2}$ and $P(Y_i = -1) = \frac{1}{2}$. Let $X_n = \frac{Y_1+\dots+Y_n}{\sqrt{n\log\log\log n}}$. Then indeed $E[X_n]$ is the same for each $n$ (the common mean is $0$) and $(X_n)_n$ is mean-ergodic, since $E[|X_n-0|^2] = E[X_n^2] = \frac{n}{n\log\log\log n} \to 0$. However, $(X_n)_n$ is not pointwise-ergodic, since with probability $1$, $\limsup_{n \to \infty} \frac{Y_1+\dots+Y_n}{\sqrt{2n\log\log n}} = 1$. (It's obviously not ergodic, since it's not stationary).
If the $X_n$'s are bounded, then the dominated convergence theorem implies that pointwise-ergodic implies mean-ergodic.
A Markov chain, for this discussion, is a stochastic process $(X_n)_{n \ge 1}$ taking values in a common, finite set $X$ such that (1) $P(X_n=x) > 0$ for each $n \ge 1$ and $x \in X$ and (2) for each $n,x_1,\dots,x_n$, it holds that $P(X_n = x_n | X_{n-1} = x_{n-1},\dots,X_1=x_1) = P(X_n = x_n | X_{n-1} = x_{n-1})$. We will make the common assumption/imposition that the Markov chain is time homogeneous, meaning $P(X_n = x_n | X_{n-1} = x_{n-1})$ is independent of $n$ and also that $P(X_1 = x) = P(X_2 = x)$ for each $x \in X$. An easy exercise shows that the time homogeneity assumption implies that $(X_1,X_2,\dots)$ is stationary (hint: use induction).
Associated to a Markov chain, we can define a weighted, directed graph on $X$ with the weight of the directed edge from $x$ to $y$ being $P(X_2 = y | X_1 = x)$ if the probability is positive (if it's not, don't draw a directed edge).
A Markov chain is strongly connected if one can get from one vertex to any other vertex, travelling along the graph.
.
$\bf{Claim}$: A Markov chain is ergodic if and only if it is strongly connected.
Proof: Suppose the Markov chain is ergodic. Take any $x_0,x_1 \in X$. Let $f = 1_{x_1}$. Then, with probability $1$, $\frac{1}{N}\sum_{n \le N} f(X_n(\omega)) \to E[f(X)] = P(X = x_1)$. Since $P(X=x_1) > 0$, $P(\exists n \ge 2 : X_n = x_1 | X_1 = x_0) > 0$, which, by countable additivity, means there is some $n \ge 2$ with $P(X_n = x_1 | X_1 = x_0) > 0$, but $P(X_n = x_1 | X_1 = x_0) = \sum_{y_2,\dots,y_{n-1}} P(X_n = x_1 | X_{n-1} = y_{n-1})\dots P(X_2 = y_2 | X_1 = x_0)$, so there is some $y_2,\dots,y_{n-1}$ with each of the terms in the product being nonzero, which means there is a path from $x_0$ to $x_1$. Therefore, the Markov chain is strongly connected. The other direction uses Perron-Frobenius and some ergodic theory, which I omit.
Best Answer
There is a theorem in dynamical systems known as the pointwise ergodic theorem. What it says (in part) is that if $T$ is a measure theoretic transformation of some probability space, and if $f$ is a random variable with finite expectation $\int f$, i.e. if $f$ is integrable, then the time average $\hat f(x) = \lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^n f(T^n(x))$ exists for almost all $x$, the function $\hat f$ is integrable, and its integral $\int\!\hat f$ is equal to the expectation of $f$, namely to $\int\!f$.
Ergodicity of $T$, combined with the pointwise ergodic theorem, implies that $\hat f=\int\!f$ almost everywhere.
If you now translate this conclusion into the language of stochastic processes by taking $X(n) = f \circ T^n$, you'll see that ergodicity of $T$ implies ergodicity of $X(n)$ as defined in this part of the wikipedia page you linked.