Conditional entropy of linear transformation of random variables

entropyinformation theory

Let $X,Y,Z$ be $n-$dimensional random vectors such that $Z=AX+AY$ where $A$ is a $n\times n$ linear transformation matrix. Let $\det(A)=1$ where $\det(A)$ stands for determinant of $A$. Let $X\perp \!\!\! \perp Y$ and let $H(X)$ be the entropy of $X$. Then:

Is this true: $H(AX|Z)=H(AX|AX+AY)=H(X|X+Y)\tag{1}$?

I know that $H(AX)=H(X)+\log(\det(A))$. So I was thinking of writing conditional entropy in terms of total entropy:

\begin{align*}
H(AX,Z) &= H(Z) + H(AX|Z) \text{ By chain rule } \\
&= H(AX+AY) + H(AX|AX+AY) \\
\implies H(AX|AX+AY) &= H(AX,AX+AY)-H(AX+AY) \\
&= H(AX,AX+AY)-H(X+Y)-\log(\det(A)) \\
&= H(AX,AX+AY)-H(X+Y) \text{ Since $\det(A)=1$}
\end{align*}

But I am not sure if the joint entropy $H(AX,AX+AY)=H(X,X+Y)$? If the previous line is true, then $(1)$ follows. Any ideas?

Best Answer

This is true quite generically. Let $(X,Z)$ be an arbitrary pair of random variables with a reasonably nice joint density $p_{XZ}$. Recall that $$ h(X|Z) = -\int p_{XZ}(x,z) \log \frac{p_{XZ}(x,z)}{p_Z(z)} \mathrm{d}x \mathrm{d}z$$

By the standard transformation of random variables, for any invertible $A$, if $(U,V) = (AX, AZ) = B(X,Z),$ where $B = \mathrm{diag}(A,A)$ is a block diagonal matrix with blocks $A$, then $$ p_{UV}(u,v) = |\det B|^{-1} p_{XZ}(A^{-1} u, A^{-1}v).$$ Further, due to block diagonality, observe that $\det B = \det^2 A$.

By an idential calculation, $p_V(v) = |\det A|^{-1} p_Z(A^{-1}v)$

Thus, \begin{align} h(U|V) &= -\int p_{UV}(u,v) \log \frac{p_{UV}(u,v)}{p_V(v)} \mathrm{d}u \mathrm{d}v \\ &\overset{(u,v) = B(x,z)}{=} -\int p_{UV}(Ax, Az) \log \frac{p_{UV}(Ax, Az)}{p_V(Az)} |\det B| \, \mathrm{d}x \mathrm{d}z\\ &= -\int p_{XY}(x,z) \log \frac{p_{XY}(x,z) |\det B|^{-1}}{p_Z(z) | \det A|^{-1}} \mathrm{d}x \mathrm{d}z \\ &= h(X|Z) + \log |\det A|,\end{align} where I've used that $|\det B| = |\det A|^2$.

Your case follows on setting $X = X$ and $Z = (X+Y)$ and $\det A = 1$ in the above calculation (I chose slightly wrong notation in the beginning, because I thought that you defined $Z = X+Y$ instead of $Z = A(X+Y)$. There was too much to edit when I realised the notational snafu, so we get some mismatch. My apologies).

Notice that we made no assumptions about the joint law of $(X,Z)$ (besides regularity assumptions like $p(x,z)/p(z)$ have integrable logarithm), or about $A$ besides the fact that it is invertible. All that is really happening in this situation is that you're blowing up two random vectors with $A$, which is changing the common scale of the problem by a factor of $A$.