My feeling is that there exists an ergodic measure $\mu$ for which $G_\mu \setminus Z_\mu$ is nonempty. It is sufficient to find a uniquely ergodic subsystem which admits exceptional points for the Shannon-McMillan-Breiman theorem. I think that one can be constructed symbolically without too much difficulty by the following method.
Pick a real number $h$ lying strictly between 0 and $\log 2$, and consider a sequence $x$ in the 2-shift with the following properties:
1) For every $n \geq 1$, the sequence contains precisely $e^{nh + o(n)}$ distinct words of length $n$. (For reasons of subadditivity the $o(n)$ term is necessarily positive).
2) Every word which occurs in $x$ occurs with a well-defined frequency which is not equal to 0 or 1.
The orbit closure $X$ of such a sequence is then a uniquely ergodic subsystem of the shift with topological entropy equal to $h$. An explicit procedure for constructing such a sequence was given by Grillenberger in the 1970s (in my opinion it's not particularly hard). In particular, $X$ supports a unique invariant measure $\mu$ and $G_\mu$ includes the whole of $X$. Now, suppose that the word $x$ also satisfies the property:
3) There exists a nested sequence of subwords of $x$ such that the frequency of each of these words is less than $e^{-n(h+\varepsilon)}$ for some $\varepsilon>0$.
This implies that there is a nested sequence of cylinder sets in $X$, containing some point, such that the measures of these cylinder sets decrease at a rate faster than the "standard" local entropy $h$, and hence the point in the intersection of the cylinders belongs to $G_\mu$ but not to $Z_\mu$.
I think that there shouldn't be any problem in reconciling all three of these criteria with one another, but I will admit that I haven't attempted to write a proof of that. I think it sounds reasonable that for a larger class of measures than Gibbs measures we should have $G_\mu \subseteq Z_\mu$, but I don't have much to contribute to that end of the question...
The answer to both questions is 'no', both for maps and for flows.
For concreteness let $M=\{0,1\}^\mathbb{Z}$ be the set of bi-infinite sequences of $0$s and $1$s, and let $\Phi\colon M\to M$ be the shift map given by $\Phi(x)_j = x_{j+1}$ for $x=(x_j)_{j\in\mathbb{Z}}$.
Q1. Topological transitivity of $\Phi$ only depends on $\Phi$ and $M$, not on the measure $\mu$. In particular the system $(M,\Phi)$ defined above is topologically transitive, but there are many (many!) regular Borel probability measures that are preserved by $\Phi$, and not all of them are ergodic. See this question for some discussion of how intricate this space is. In particular, let $p$ and $q$ be fixed points for $\Phi$, and let $\mu$ be the atomic measure that gives weight $\frac 12$ to each of $p$ and $q$. Then $\mu$ is $\Phi$-invariant but not ergodic.
Q2. The pointwise time averages do not need to exist for every $x$. In fact it is quite typical that they do not exist. Let me make this last statement a little more precise, again using the example of $(M,\Phi)$ from above.
Consider the continuous real valued function $f\colon M\to \mathbb{R}$ defined by $f(x) = x_0$. That is, $f$ is simply the value of the symbol in the $0$ position in the sequence $x$. Then $a_N(x) := \frac 1N \sum_{j=1}^N f(\Phi^j(x))$ is the frequency of the symbol $1$ in the string $x_1 x_2 \cdots x_N$.
The pointwise time averages of $f$ along the orbit of $x$ exist if and only if $a_N(x)$ converges as $N\to \infty$ -- in other words, if and only if the lower and upper asymptotic frequencies of the symbol $1$ are equal. It is straightforward to construct examples of sequences $x\in M$ such that the lower and upper asymptotic frequencies disagree and the limit does not exist.
In fact, one can say some more about how large the set of such points are. Given $x\in M$, let $\lambda(x) = \liminf a_N(x)$ and $\Lambda(x) = \limsup a_N(x)$. Note that $0\leq \lambda(x)\leq \Lambda(x)\leq 1$ for all $x\in M$. Given $0\leq r\leq s\leq 1$, let $K_{r,s}$ be the set of $x\in M$ such that $\lambda(x) = r$ and $\Lambda(x) = s$. The study of the various sets $K_{r,s}$ is called multifractal analysis, and quite a lot is known. I'll state just a few results addressing your question.
Let $K^\neq = \bigcup_{r<s} K_{r,s}$ be the set of points for which $\lambda(x) \neq \Lambda(x)$, so that the limit doesn't exist. Then the following are true (at least for the system I described above -- determining for which general classes of systems these statements hold is a more subtle question):
- $K^\neq$ has zero measure for every $\Phi$-invariant measure.
- $K^\neq$ has Hausdorff dimension equal to the Hausdorff dimension of $M$. (The more honest way of saying this is that they have equal topological entropies, but Hausdorff dimension is a more familiar concept and the statement with dimension is true if you use an appropriate metric.)
- $K^\neq$ is residual -- that is, it is a countable intersection of open and dense subsets of $M$.
- In fact, one can show that $K_{0,1}$ is residual, but that it has Hausdorff dimension $0$. (The fact that $K^\neq$ has full Hausdorff dimension is due to the fact that the Hausdorff dimension of $K_{r,s}$ approaches the Hausdorff dimension of $M$ as $r,s\to \frac 12$.)
So there's an assortment of facts for you illustrating how large the set of points is where convergence fails. In particular the last fact can be interpreted as saying that from a topological point of view, for a generic point $x$ the limit fails to exist as strongly as it can possibly fail. This highlights the fact that ergodic theory is really about measures, not topology. (I will note that the limit exists everywhere if your map is uniquely ergodic, that is, if there is only one invariant probability measure. Such systems are quite different from the systems I was describing, which should be thought of as hyperbolic, or informally, chaotic.)
Best Answer
This can be found in Walters' book An introduction to ergodic theory where it occurs as Theorem 6.19.
The proof is not terribly difficult and is in my opinion quite instructive. I will deal with (3). Firstly let us assume (c): if $\mu$ and $\nu$ are distinct invariant measures then we can choose a continuous function $f$ such that $\int f\,d\mu \neq \int f\,d\nu$. The integral of $\frac{1}{n}\sum_{i=0}^{n-1}f \circ T^i$ with respect to $\mu$ is $\int f\,d\mu$ for every $n \geq 1$, but this sequence of functions converges uniformly to $c(f)$ so the associated sequence of integrals with respect to $\mu$ converges to $c(f)$ also. The same holds for integrals with respect to $\nu$, so $\int f\,d\mu = c(f) = \int f\,d\nu$, a contradiction. We conclude that if (c) holds then there can be only one invariant measure, so (c) implies (a).
Suppose that (c) does not hold. Then there exist a continuous function $f$, two sequences $(x_k)$, $(y_k)$ of points in $X$, and sequences $(n_k)$, $(m_k)$ of natural numbers such that the averages $\frac{1}{n_k}\sum_{i=0}^{n_k-1}f(T^ix_k)$ and $\frac{1}{m_k}\sum_{i=0}^{m_k-1}f(T^iy_k)$ do not converge to the same value. By taking finer subsequences if necessary we can assume that they converge to distinct values (since both sequences are bounded by $|f|_\infty$). Let $\mu_k:=\frac{1}{n_k}\sum_{i=0}^{n_k-1} \delta_{T^ix_k}$ and $\nu_k:=\frac{1}{m_k}\sum_{i=0}^{m_k-1} \delta_{T^iy_k}$. By taking further subsequences if required suppose that these sequences of probability measures converge to limit probability measures $\mu$ and $\nu$ respectively. These limit measures are invariant (by a similar calculation to that in the most common proof of the Krylov-Bogolioubov theorem) and they are different because they assign different integrals to $f$, so (a) does not hold. Thus not-(c) implies not-(a) and therefore (a) implies (c). Clearly (c) implies (b) and the proof that (a) implies (b) is most similar to the proof that (a) implies (c).
I believe that the above result originates with John Oxtoby in the 1950s. There are some useful variations on this result which are considerably sharper and have essentially the same proof as the above. For example, the above argument can be easily adapted to show that if $f$ is an upper semi-continuous function such that $\int f\,d\mu \leq \lambda$ for every $T$-invariant measure $\mu$, then $\limsup_{n \to \infty} \sup_{x \in X}\frac{1}{n}\sum_{k=0}^{n-1}f(T^kx) \leq \lambda$. This result is usually attributed to Michel Hermann in the late seventies. More powerful versions of this argument treat the case of a subadditive family of functions $f_n$ rather than a sequence of Birkhoff sums: see the papers On growth rates of subadditive functions for semiflows by Sebastian Schreiber and Semi-uniform ergodic theorems and applications to forced systems by Sturman and Stark.