It works in more general dynamical systems, not only when $X$ is compact metrizable. Define for each real number $c$
$$A_c:=\{x\mid g(x)\geqslant c\}.$$
Then $f^{-1}(A_c)=A_c$. Then take $c_0:=\inf\{c,\mu(A_c)=0\}$ and show that $g=c_0$ almost everywhere.
Characterization?
If you want a characterization that is of hard analysis flavor, you could characterize ergodic measures as those measures where distribution of subwords in a larger word (say of length $N$) does not depend much on the choice of the word of length $N$ if you ignore some exceptional words of length $N$ (where the probability of the exception goes to zero). The precise statement of this characterization appears for example in the paper with the title: Entropy is the Only Finitely Observable Invariant.
I wonder if there is a theorem that gives you some class of measures parametrized by finite descriptions of them and then says that there is no algorithm to determine ergodicity for this class.
As for intuition, if you have a measure that is easily described and easy to reason about, then if you cannot come up with any way to extract some information from a $\mu$ typical sample binary sequence in a shift-invariant way, then the measure is going to be ergodic.
Examples
A simple (maybe too simple) example is the invariant measure you would obtain by concatenating 00 or 11 infinitely many times randomly and independently (each with probability 1/2), and then apply the shift map one or zero times, each with probability 1/2. This example is not very interesting and may feel like an easy way out. The reason this is not Markov (of any step) is because the support of this measure is not an SFT (subshift of finite type).
A more interesting example would be a measure in which distances between occurrences of 1 are independent. It's like a renewal process except discrete and invariant under the shift map. Let $(p_i)_{i = 1}^{\infty}$ be a sequence of nonnegative numbers whose sum is 1 and also such that $\sum_i i p_i < \infty$, then you can build such an invariant measure $\mu$ where $\mu([10^{i-1}1] | [1]) = p_i$ for all $i$ and $\mu([1]) = \sum_i i p_i > 0$. To construct this measure, you first build the conditional one $\mu_{[1]}$ (a measure defined on the cylinder $[1] \subset \{0, 1\}^{\mathbb N}$) which corresponds to the discrete-time renewal process which has $(p_i)_{i = 1}^{\infty}$ as its distribution of holding times. This $\mu_{[1]}$ is not invariant under the shift map, but it is invariant under its first return map, so you can use Kakutani skyscraper construction to construct $\mu$ defined on $\{0,1\}^{\mathbb N}$. This measure is ergodic because its restriction to $[1]$ is ergodic with respect to the first return map. You can then choose $(p_i)_{i = 1}^{\infty}$ in a way that it guarantees $\mu$ not being Markov. For example, you can make its support be the even shift (i.e. $p_i > 0$ if and only if $i$ is even).
Another interesting class of measures: measures corresponding to hidden Markov chains. These are measures that are factors of Markov measures.
More general class: measures from thermodynamic formalism, but things get very technical there.
Best Answer
Take a biased coin with probabilities $\{p_, 1-p_i\}$ in the $i$-th flip, where $0 < \delta < p_i < 1 - \delta < 1$ and represent heads buy $0$ and tails by $1$.
You can identify the space given by fliping that coin infinitely many times with $([0,1], \mu)$ just as you do in the case of a non-biased coin, i.e: associate $\{0\}\times \{0,1\}^\mathbb{N}$ with $[0,0.5)$ and $\{1\}\times \{0,1\}^\mathbb{N}$ with $[0,5,1)$ and continue that process further sub-diving $[0,1]$ into dyadic intervals. Both $\mu$ and the Lebesgue measure $m$ have the same null sets. The action on $[0,1]$ induced by "shifting" the sequence of flips is ergodic, but not measure preserving.
This transformations are called "Bernoulli shifts". It is still reasonable to ask whether there are ergodic transformations that do not admit any invariant measure.