(Originally posted on MSE.)
I have seen many heuristic discussions of the classical central limit theorem speak of the normal distribution (or any of the stable distributions) as an "attractor" in the space of probability densities. For example, consider these sentences at the top of Wikipedia's treatment:
In more general usage, a central limit theorem is any of a set of weak-convergence theorems in probability theory. They all express the fact that a sum of many independent and identically distributed (i.i.d.) random variables, or alternatively, random variables with specific types of dependence, will tend to be distributed according to one of a small set of attractor distributions. When the variance of the i.i.d. variables is finite, the attractor distribution is the normal distribution.
This dynamical systems language is very suggestive. Feller also speaks of "attraction" in his treatment of the CLT in his second volume (I wonder if that is the source of the language), and Yuval Flimus in this note even speaks of the "basin of attraction." (I don't think he really means "the exact form of the basin of attraction is deducible beforehand" but rather "the exact form of the attractor is deducible beforehand"; still, the language is there.) My question is: can these dynamical analogies be made precise? I don't know of a book in which they are — though many books do make a point of emphasizing that the normal distribution is special for its stability under convolution (as well as its stability under the Fourier transform). This is basically telling us that the normal is important because it is a fixed point. The CLT goes further, telling us that it is not just a fixed point but an attractor.
To make this geometric picture precise, I imagine taking the phase space to be a suitable infinite-dimensional function space (the space of probability densities) and the evolution operator to be repeated convolution with an initial condition. But I have no sense of the technicalities involved in making this picture work or whether it is worth pursuing.
I would guess that since I can't find a treatment that does pursue this approach explicitly, there must be something wrong with my sense that it can be done or that it would be interesting. If that is the case, I would like to hear why.
EDIT: There are three similar questions throughout Math Stack Exchange and MathOverflow that readers may be interested in:
Best Answer
After doing some digging in the literature, encouraged by Kjetil's answer, I've found a few references that do take the geometric/dynamical systems approach to the CLT seriously, besides the book by Y. Sinai. I'm posting what I've found for others who may be interested, but I hope still to hear from an expert about the value of this point of view.
The most significant influence seems to have come from the work of Charles Stein. But the most direct answer to my question seems to be from Hamedani and Walter, who put a metric on the space of distribution functions and show that convolution generates a contraction, which yields the normal distribution as the unique fixed point.
ADDED October 19, 2018.
Another source for this point of view is Oliver Knill's Probability and Stochastic Processes with Applications, p. 11 (emphasis added):