Solved – A dynamical systems view of the Central Limit Theorem

central limit theoremconvergenceconvolutionmathematical-statisticsprobability

(Originally posted on MSE.)

I have seen many heuristic discussions of the classical central limit theorem speak of the normal distribution (or any of the stable distributions) as an "attractor" in the space of probability densities. For example, consider these sentences at the top of Wikipedia's treatment:

In more general usage, a central limit theorem is any of a set of weak-convergence theorems in probability theory. They all express the fact that a sum of many independent and identically distributed (i.i.d.) random variables, or alternatively, random variables with specific types of dependence, will tend to be distributed according to one of a small set of attractor distributions. When the variance of the i.i.d. variables is finite, the attractor distribution is the normal distribution.

This dynamical systems language is very suggestive. Feller also speaks of "attraction" in his treatment of the CLT in his second volume (I wonder if that is the source of the language), and Yuval Flimus in this note even speaks of the "basin of attraction." (I don't think he really means "the exact form of the basin of attraction is deducible beforehand" but rather "the exact form of the attractor is deducible beforehand"; still, the language is there.) My question is: can these dynamical analogies be made precise? I don't know of a book in which they are — though many books do make a point of emphasizing that the normal distribution is special for its stability under convolution (as well as its stability under the Fourier transform). This is basically telling us that the normal is important because it is a fixed point. The CLT goes further, telling us that it is not just a fixed point but an attractor.

To make this geometric picture precise, I imagine taking the phase space to be a suitable infinite-dimensional function space (the space of probability densities) and the evolution operator to be repeated convolution with an initial condition. But I have no sense of the technicalities involved in making this picture work or whether it is worth pursuing.

I would guess that since I can't find a treatment that does pursue this approach explicitly, there must be something wrong with my sense that it can be done or that it would be interesting. If that is the case, I would like to hear why.

EDIT: There are three similar questions throughout Math Stack Exchange and MathOverflow that readers may be interested in:

Best Answer

After doing some digging in the literature, encouraged by Kjetil's answer, I've found a few references that do take the geometric/dynamical systems approach to the CLT seriously, besides the book by Y. Sinai. I'm posting what I've found for others who may be interested, but I hope still to hear from an expert about the value of this point of view.

The most significant influence seems to have come from the work of Charles Stein. But the most direct answer to my question seems to be from Hamedani and Walter, who put a metric on the space of distribution functions and show that convolution generates a contraction, which yields the normal distribution as the unique fixed point.


ADDED October 19, 2018.

Another source for this point of view is Oliver Knill's Probability and Stochastic Processes with Applications, p. 11 (emphasis added):

Markov processes often are attracted by fixed points of the Markov operator. Such fixed points are called stationary states. They describe equilibria and often they are measures with maximal entropy. An example is the Markov operator $P$, which assigns to a probability density $f_y$ the probability density of $f_{\overline{Y+X}}$ where $\overline{Y+X}$ is the random variable $Y + X$ normalized so that it has mean $0$ and variance $1$. For the initial function $f= 1$, the function $P^n(f_X)$ is the distribution of $S^{*}_n$ the normalized sum of $n$ IID random variables $X_i$. This Markov operator has a unique equilibrium point, the standard normal distribution. It has maximal entropy among all distributions on the real line with variance $1$ and mean $0$. The central limit theorem tells that the Markov operator $P$ has the normal distribution as a unique attracting fixed point if one takes the weaker topology of convergence in distribution on $\mathcal{L}^1$. This works in other situations too. For circle-valued random variables for example, the uniform distribution maximizes entropy. It is not surprising therefore, that there is a central limit theorem for circle-valued random variables with the uniform distribution as the limiting distribution.