Solved – Information theoretic central limit theorem

central limit theoreminformation theorymathematical-statistics

The simplest form of the information theoretic CLT is the following:

Let $X_1, X_2,\dots$ be iid with mean $0$ and variance $1$. Let $f_n$ be the density of the normalized sum $\frac{\sum_{i=1}^n X_i}{\sqrt{n}}$ and $\phi$ be the standard Gaussian density. Then the information theoretic CLT states that, if $D(f_n\|\phi)=\int f_n \log(f_n/\phi) dx$ is finite for some $n$, then $D(f_n\|\phi)\to 0$ as $n\to \infty$.

Certainly this convergence, in a sense, is "stronger" than the well establised convergences in the literature, convergence in distribution and convergence in $L_1$-metric, thanks to Pinsker's inequality $\left(\int |f_n-\phi|\right)^2\le 2\cdot \int f_n \log(f_n/\phi)$. That is, convergence in KL-divergence implies convergence in distribution and convergence in $L_1$ distance.

I would like to know two things.

  1. What is so great about the result $D(f_n\|\phi)\to 0$?

  2. Is it just because of the reason stated in the third paragraph we say convergence in KL-divergence (i.e., $D(f_n\|\phi)\to 0$) is stronger?

NB: I asked this question sometime ago in math.stackexchange where I didn't get any answer.

Best Answer

One thing which is great with this theorem is that it suggests limit theorems in some settings where the usual central limit theorem do not apply. For instance, in situations where the maximum entropy-distribution is some nonnormal distribution, such as for distributions on the circle, it suggests convergence to a uniform distribution.

Related Question