Entropy of (a binary random variable plus gaussian random variable) increases with distance between binary values

entropygaussianinformation theorymutual information

Let $X\in\{x_1,x_2\}$ be a real binary random variable with probability mass function $\{p_1,p_2=1-p_1\}$. Let $N$ be a real random variable with standard normal distribution (mean = $0$, std = $1$). Let $Y=X+N$. Let $H(Y)$ denote the differential entropy of $Y$.

It seems like it should be easy to prove that $H(Y)$ increases as $|x_1-x_2|$ increases, that is, as the distance between the two sample points of the binary random variable increases. This is motivated by the mutual information of the output of an additive Gaussian noise channel with binary input; for a given noise power, the mutual information is maximized by spreading the two points as far as possible (subject to whatever constraints one has). The mutual information is

$$I(X;Y) = H(Y) – H(Y|X) = H(Y)-H(N)$$

Since the mutual information is increased by increasing the distance between $x_1$ and $x_2$, and since $H(N)$ is independent of $X$, $H(Y)$ must increase as $|x_1-x_2|$ increases.

I have tried to prove this from the definition of entropy and so far have been unsuccessful. We have

$$H(y) = -\int{p(y)\log\left(p(y)\right)dy} = -\int{\sum_{i=1}^2{p_i p_N(y-x_i)\log{\left(\sum_{i=1}^2{p_i p_N(y-x_i)}\right
)}}dy}$$

I have tried using the fact that relative entropy is nonnegative and also the log sum inequality (Theorems 2.7.1 and 8.6.1 in Elements of Information Theory, Cover and Thomas, 2nd Edition). This seems like it should be simple, but I have had no luck. Do I need to use calculus of variations? Thank you in advance.

Best Answer

I worked out the proof of this, which was straightforward when I took a more direct approach.

Let $x_2=x_1+\Delta$. Let $g(n)$ be the pdf of a standard normal random variable. $$g(n)=\frac{1}{\sqrt{2\pi}}e^{-n^2/2}$$ The pdf of $Y=X+N$ is given by $$\begin{aligned} p(y) & = p_1 g(y-x1)+(1-p_1)g(y-x_2) \\ & = p_1 g(y-x1)+(1-p_1)g(y-(x_1+\Delta)) \end{aligned}$$ The entropy of $Y$ is given by $$H(Y)=-\int_{-\infty}^\infty{p(y)\log(p(y))dy}$$ So $$ \begin{aligned} \frac{d H(Y)}{d \Delta}& = -\int_{-\infty}^\infty{ \frac{d p(y)}{d \Delta}(\log(p(y))+1)dy } \\&=-(1-p_1)\int_{-\infty}^\infty{\bigl(y-(x_1+\Delta)\bigr)g\bigl(y-(x_1+\Delta)\bigr)\bigl(1+\log(p(y))\bigr)dy} & (1) \\&=-(1-p_1)\int_{-\infty}^\infty{yg(y)\bigl(1+\log(p(y+x_1+\Delta)\bigr)dy} & (2) \\&=-(1-p_1)\int_{-\infty}^\infty{yg(y)\log\bigl(p(y+x_1+\Delta)\bigr)dy} & (3) \\&=-(1-p_1)\int_{0}^\infty{yg(y)\log\bigl(p(y+x_1+\Delta)\bigr)-yg(-y)\log\bigl(p(-y+x_1+\Delta)\bigr)dy} & (4) \\&=(1-p_1)\int_{0}^\infty{yg(y)\log\left(\frac{p(-y+x_1+\Delta)}{p(y+x_1+\Delta)}\right)dy} & (5) \end{aligned} $$ (2) is a change of variables, (3) follows because $yg(y)$ is odd, (4) follows by changing the limits of integration, and (5) utilizes the fact that $g(y)$ is even.

Now $$\frac{p(-y+x_1+\Delta)}{p(y+x_1+\Delta)}= \frac{(1-p_1)g(y)+p_1 g(y-\Delta)}{(1-p_1)g(y)+p_1 g(y+\Delta)}$$

Further $$g(y+\Delta)=g(y-\Delta)e^{-2y\Delta}$$ Hence $$ \frac{d H(Y)}{d \Delta} = (1-p_1)\int_{0}^\infty{yg(y)\log\left(\frac{(1-p_1)g(y)+p_1 g(y-\Delta)}{(1-p_1)g(y)+p_1 g(y-\Delta)e^{-2y\Delta}}\right)dy} $$

For $\Delta=0$ (i.e., $x_1=x_2$), the argument of the $\log$ function is $1$ and the derivative is $0$. This is expected, since $H(y)=0$ (and therefore minimized) when $x_1=x_2$. Noting that the integral is only over nonnegative $y$, the argument of the log function is greater than $1$ for all $\Delta>0$, and it is less than $1$ for all $\Delta<0$. The integrand is therefore strictly positive for $\Delta>0$ and strictly negative for $\Delta<0$ over the range of integration for positive $y$. So the derivative has the same sign as $\Delta$, and $H(Y)$ increases monotonically as $|\Delta|$ increases.