Probability – Central Limit Theorem via Maximal Entropy

calculus-of-variationspr.probability

Let $\rho(x)$ be a probability density function on $\mathbb{R}$ with prescribed variance $\sigma^2$, so that:
$$\int_\mathbb{R} \rho(x)\, dx = 1$$
and
$$\int_\mathbb{R} x^2 \rho(x), dx = \sigma^2$$
Fact: the density function which maximizes the entropy functional
$$S(\rho) = -\int_\mathbb{R} \rho(x) \log \rho(x)\, dx$$
with the constraints above is the normal distribution
$$\rho(x) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{x^2}{2\sigma^2}}$$
This can be proved using basic techniques from the calculus of variations.

My question: can this be used to prove the central limit theorem? In other words, can one show directly that the limiting distribution of the average of a sequence of i.i.d. random variables maximizes entropy?

Actually, I don't care too much about entropy. I'm mainly interested in the possibility of a variational proof of the central limit theorem.

Best Answer

There is a book on the subject: "Information Theory and The Central Limit Theorem" by Oliver Johnson. The article by Anshelevich mentioned by Yemon considers the operator $T$ acting on probability densities and corresponding to going from the law of a random variable $X$ to that of $(X+Y)/\sqrt{2}$ where $Y$ is an independent copy of $X$. The entropy is a Lyapunov function for this transformation which is the simplest example of a renormalization group transformation. The $N(0,1)$ is a fixed point and it is easy to diagonalize the linearization of $T$ near this fixed point using Wick monomials, i.e., Hermite polynomials. The directions corresponding to 0-th, 1-st and 2-nd moments are expanding (relevant operators) or neutral (marginal operators) while all others are contracting (irrelevant operators). Therefore if one makes the necessary arrangements (renormalization conditions) to fix these moments (e.g. subtracting $N$ times the mean and dividing by $\sqrt{N}$) then one lies on the stable manifold of the Gaussian fixed point. See the textbook on probability theory by Koralov and Sinai for more details. The generalization of the $T$ map for joint probability distributions of dependent variables, i.e., the renormalization group is explained in the book "A Renormalization Group Analysis of the Hierarchical Model in Statistical Mechanics" by Collet and Eckmann. The issue with using this type of nonlinear transformations is that the above diagonalization at a fixed point only gives information about the vicinity of that fixed point. To get results far away, having a Lyapunov function like the entropy is of great importance. This is an active area in physics which investigates generalizations of Zamolodchikov's $c$-"theorem" in conformal field theory. See for instance this article for a recent review. Entanglement entropy seems to be the Lyapunov function in this setting.

Related Question