Entropy of a Measure Preserving Transformation

entropyergodic-theoryinformation theoryintuitionmeasure-theory

I am reading the concept of entropy from Peter Walters' An Introduction to Ergodic Theory and I am having trouble understanding the notion of the entropy of a measure preserving transformation.

Definitions:

Let $(X, \mathcal{F}, \mu)$ be a probability space.
For a partition $\xi=\{A_1 , \ldots, A_m\}$ of $X$ (where each $A_i$ is measurable) the entropy of $\xi$ is defined as:

$$
H(\xi) = -\sum_{i=1}^m \mu(A_i)\log(\mu(A_i))
$$

If $T:X\to X$ is a measure preserving transformation, we write $T^{-1}\xi$ to denote the set $\set{T^{-1}(A_i):\ 1\leq i\leq m}$.
Thus $H(T^{-1}\xi)=H(\xi)$.

Now the entropy of a measure preserving transformation $T:X\to X$ with respect to $\xi$ is defined as (see Def. 4.9 in the aforementioned text)

$$
h(T, \xi) = \lim_{n\to \infty} \frac{1}{n} H\left(\bigvee_{i=0}^{n-1} T^{-i}\xi\right),
$$

where $\bigvee_{i=0}^{n-1} T^{-1}\xi$ is the coarsest common refinement of the partitions $T^{-i}\xi$.

The Problem:

Just after giving the definition, the author writes

This means that if we think of an application of $T$ as a passage of one day of time, then $\bigvee_{i=1}^{n-1}T^{-i}\xi$ represents the combined experiment of performing the original experiment, represented by $\xi$, on $n$ consecutive days.
Then $h(T, \xi)$ is the average information per day that one gets from performing the original experiment.

I do not entirely follows this. If the application of $T$ is the passage of one day, that is, it takes us one day into the future, why is the expression $\bigvee_{i=1}^{n-1} T^{-i}\xi$ is the combined experiment (wait, what is intuitive meaning of 'combined experiment'?) for the next $n$-days. We are taking backward images of $T$ in this expression, not the forward images.

At any rate, I do not have any intuition for the last definition presented above.
Can someone please try to give some insight.

Thanks.

Best Answer

Ah, the use of backwards image in ergodic theory, an unending source of confusion for learners...

By definition, the set $T^{-n} A$ is $\{x \in X: \ T^n (x) \in A\}$, so really, it is about the forward orbit of the system!

Now, fix a partition $\xi$. An element $[a]_n \in \bigvee_{k=0}^{n-1} T^{-k} \xi$ is a subset of the form $a_0 \cap T^{-1} a_1 \cap \ldots \cap T^{-(n-1)} a_{n-1}$, where each $a_i$ belongs to $\xi$. In other words, knowing that a point belongs to $[a]_n$ means that you know that $x \in a_0$, $T(x) \in a_1$, $\ldots$, $T^{(n-1)} (x) \in a_{n-1}$.

If the result of your experiment has finitely many possible values, let the partition $\xi$ be generated by these values; then knowing $\bigvee_{k=0}^{n-1} T^{-k} \xi$ means knowing the result of the experiment until day $n-1$.

The entropy for the partition $\xi$ is then the rate of exponential growth of possible results until time $n$. As for the mention of "average information", you could look up Shannon entropy in a separate reference -- it explains this formulation, and I don't think I have seen this subject deal with satisfactorily in an ergodic theory book.

Related Question