In Shannon's paper on information theory, found here, he asserts the entropy power inequality in appendix 6, found on page 52. I was reading his proof and it seems like there is a gap. Through his method, I believe one can only conclude that the Gaussian is a local minimum for his calculus of variations problem, rather than a global minimum.
Since then, has anyone been able to fill in the gap of Shannon's proof? I am aware that there are other proofs of the inequality, but would like to know if someone was able to use his methods to show the Gaussian is a global minimum.
Best Answer
Amir Dembo, Thomas Cover and Joy Thomas talk about (and prove) entropy power inequality towards the end of this paper in two different ways:
Liyao Wang and Mokshay Madiman prove Entropy-Power inequality using
Olivier Rioul proves using mutual information
Any of these three proofs - while probably legit - risk the same gaps as you are indicating. Of these I recommend Dembo-Cover-Thomas since they offer a fairly complete discussion of entropy inqualites, including their connection to Brunn-Minkowski
Entropy is $h(X) = \log \# \{\text{microstates}\}$ and entropy power of (vector-valued) random variable: $X: \Omega \to \mathbb{R}^n$ is $N(X) = e^{\frac{1}{n}h(X)}$ so this is counting the $n$-th root of the number of microstates.
Shannon's definition of entropy power involves an integral over frequency space, and he may have been dealing with time dependent signals $x(t)$ instead of random variables $X$. Since he says that white noise $x(t) = W(t)$ maximizes entropy power. Classically the notion of "power" as work / time makes sense here. Then he uses ergodicity to approximate $x(t)$ by its average distribution $ \langle x(t) \rangle = X$.
Statement of Entropy Power Inequality
Let $X : \Omega \to \mathbb{R}^n$ be a vector-valued random variable. It has a probability density over $\mathbb{R}^n$ $$h(X) = - \int_{\mathbb{R}^n} dx\; p(x) \log p(x) = \# \{ \text{ microstates } \}$$
I speculate Shannon's notion of Entropy Power (of a probability distribution) is related to the notion of power spectral density (of a time series) in signal processing but I can't write an exact derivation.
$$ e^{\frac{2}{n}h(X)}+ e^{\frac{2}{n}h(Y)} \leq e^{\frac{2}{n}h(X+Y)}$$
Proof of Entropy Power Inequlaity
You really have to stick to your guns here that entropy log-counts the number of microstates. The formal result is called the
Using LLN or AEP we can say although $X$ ranges over all $\mathbb{R}^n$, usually $X$ stays in the "weakly typical set" or "strongly typical set".
Using Brunn-Minkowski inequality we can say the weak/strong typical sets of two random variables $X$ and $Y$ combine to get the volume of $X+Y$.
$$ \mathrm{Vol}(X)^{2/n} + \mathrm{Vol}(Y)^{2/n} \leq \mathrm{Vol}(X \stackrel{E}{+} Y)^{2/n}$$
where $\stackrel{E}{+}$ is a variant of Minkowski sum $ X \stackrel{E}{+} Y = \{ x + y: (x,y) \in E \} $